This technology enables voice conversion in a variety of voice communications, whether face-to-face or remotely, and contributes to the realization of communication that is free from physical, intellectual, and psychological constraints, for example, converting the intonation and voice quality of a speaker into easy-to-understand speech at a call center.
High Sound Quality and Low Latency: The technology achieves both high sound quality and low latency. Unlike conventional methods, it doesn't require a buffer for future speech signals, resulting in real-time conversion without delays.
Voice Feature Extraction: A newly devised voice feature extraction process ensures high sound quality. It flexibly converts voice quality, intonation, and rhythm using paired data of the same utterance of the source and target speakers.
Applications
This breakthrough enables voice conversion in various scenarios, whether face-to-face or remote. Imagine converting the intonation and voice quality of a speaker into easy-to-understand speech at a call center or during web conferencing.Communication Enhancement through Voice Conversion |
The technology opens doors for web conferencing, live streaming, and smartphone applications. It contributes to communication free from physical, intellectual, and psychological constraints.
This technology is expected to enrich speech communication in various business and real-life situations, whether face-to-face or remote, such as the use of this technology for dysphonia, fluent English pronunciation close to native English, persuasive speech, and removing of nervousness-induced voice tremors, etc.
In the future, NTT says that it will work to improve noise-resistance and stability in real environments, as well as countermeasures against impersonation, with the aim of creating a future in which users can communicate with their favorite voices more securely.
This technology is expected to enrich speech communication in various business and real-life situations, whether face-to-face or remote, such as the use of this technology for dysphonia, fluent English pronunciation close to native English, persuasive speech, and removing of nervousness-induced voice tremors, etc.
In the future, NTT says that it will work to improve noise-resistance and stability in real environments, as well as countermeasures against impersonation, with the aim of creating a future in which users can communicate with their favorite voices more securely.
Advertisements