NTT Develops Real-Time Voice Conversion GenAI Tech That Can Instantly Change Your Voice and Speaking Style

NTT Corporation has recently announced that it has developed a remarkable real-time voice conversion technology based on deep learning that achieves both high sound quality and low latency.

This technology enables voice conversion in a variety of voice communications, whether face-to-face or remotely, and contributes to the realization of communication that is free from physical, intellectual, and psychological constraints, for example, converting the intonation and voice quality of a speaker into easy-to-understand speech at a call center.

High Sound Quality and Low Latency: The technology achieves both high sound quality and low latency. Unlike conventional methods, it doesn't require a buffer for future speech signals, resulting in real-time conversion without delays.

Voice Feature Extraction: A newly devised voice feature extraction process ensures high sound quality. It flexibly converts voice quality, intonation, and rhythm using paired data of the same utterance of the source and target speakers.

Applications

This breakthrough enables voice conversion in various scenarios, whether face-to-face or remote. Imagine converting the intonation and voice quality of a speaker into easy-to-understand speech at a call center or during web conferencing.

NTT Develops Real-Time Voice Conversion GenAI Tech That Can Instantly Change Your Voice and Speaking Style
Communication Enhancement through Voice Conversion

The technology opens doors for web conferencing, live streaming, and smartphone applications. It contributes to communication free from physical, intellectual, and psychological constraints.

This technology is expected to enrich speech communication in various business and real-life situations, whether face-to-face or remote, such as the use of this technology for dysphonia, fluent English pronunciation close to native English, persuasive speech, and removing of nervousness-induced voice tremors, etc.

In the future, NTT says that it will work to improve noise-resistance and stability in real environments, as well as countermeasures against impersonation, with the aim of creating a future in which users can communicate with their favorite voices more securely.
Advertisements

Post a Comment

Previous Post Next Post
Like this content? Sign up for our daily newsletter to get latest updates.