The border between intrinsically human qualities and the ability of technology to imitate them is increasingly narrowing thanks to (or because of) artificial intelligence. After creating complex texts, artistic paintings or even abstract elements, now comes a new step forward: the voice.
The example is the video created thanks to the voice cloning technology of Eleven Labs, an American company created by two former Google workers. Fundamentally dedicated to speech synthesis, they have managed to create software that is not only capable of converting text into speech, but also has managed to fit the voices of any person in speeches already made.
To understand it, they have used a well-known public appearance: that of Leonardo DiCaprio before the UN Assembly in 2014. The actor, in his role as United Nations Messenger of Peace for climate change, warned that day of the dangers of contamination, asked to be aware of the crisis that is already present and He advocated ending subsidies to companies dedicated to fossil fuels, among other issues.
That speech had a strong social impact, not only because of what he said, but also because of who said it. Would it have turned out the same if they had said it, for example, Steve Jobs o Bill Gates? And another actor like Robert Downey Jr.? And someone from a much more showbiz world like Kim Kardashian? The answer is the video that the company Eleven Labs itself has shared.
The video achieves that, without modifying the intonation, the pauses to breathe or the ups and downs in his speech. Directly the voices of the famous quoted in the words of DiCaprio fit, with a very successful result. So much so that, if you close your eyes, it is not the protagonist of Titanic who speaks, but Iron Man or the inventor of the iPhone.
The dangers of the ‘deep voice’
The dangers that this technology can pose are obvious. The combination of technology ‘deep fake’ (the substitution of the protagonist’s face during a speech or action) and that of voice cloning can directly produce a totally invented event.
To give an example, Steve Jobs can be placed speaking before the UN, with his face and his voice, about the dangers of climate change, but it can also be used to show public statements of political leaders declaring war terrorist groups claiming responsibility for an attack or just about any event you can think of. With the accuracy that artificial intelligence is achieving, it will become increasingly difficult to differentiate what is real from what is created.