Nvidia has developed new artificial intelligence (AI) models to make artificial voices emulate with greater expressiveness and realism human speech such as its rhythm, intonation or timbre, and that they are even capable of carrying out narrations and voiceovers like a voice actor–
Nvidia’s new tools bring automated tools closer to the human being through new models of speech synthesis, as the company has announced and how it will present during the Interspeech 2021 event focused on speech technologies.
Nvidia technology has been optimized to work efficiently on graphics units (GPU) of the company, and it has also been developed using the open source tools of the NeMo kit.
For its development, the company has managed to make its AI carry out the narration of a series of videos that deal precisely with the potential of technology, I am IA, and which was originally narrated by a real person.
The system is based on the use of a model, RAD.TTS, which converts the text into speech using a audio of a person speaking, converting the text to the person’s voice but intoning expressively, like a voice actor.
According to Nvidia, the functions of the model can also be used in video games to help people with disabilities or to translate one’s voice into another language–
The company claims the technology can even be reproduce the voice of people singing, not only with the melody but also with his emotion when interpreting a song.