27th April 2017
AI uses machine learning to mimic human voices
A Canadian startup company has developed a new algorithm capable of replicating any human voice, based on only a 60 second audio sample.
Montreal-based startup, Lyrebird, is named after the ground-dwelling Australian bird which has the ability to mimic natural and artificial sounds from its surrounding environment. The company has this week unveiled a new voice-imitation algorithm that can mimic a person's speech and have it read any text with a given emotion, based on the analysis of just a few dozen seconds of audio recording. In the sample above, a recreation of Barack Obama can be heard alongside Donald Trump and Hillary Clinton.
Lyrebird claims this innovation can take AI software a step further by offering new speech synthesis solutions to developers. Users will be able to generate entire dialogues with the voice of their choice, or design from scratch completely new and unique voices tailored for their needs. Suited to a wide range of applications, the algorithm can be used for personal assistants, reading of audio books with famous voices, speech synthesis for people with disabilities, connected devices of any kind, animated movies or video game characters.
Lyrebird relies on deep learning models developed at the MILA lab in the University of Montréal, where its three founders are currently PhD students: Alexandre de Brébisson, Jose Sotelo and Kundan Kumar. The startup is advised by three of the most prolific professors in the field: Pascal Vincent, Aaron Courville and Yoshua Bengio. The latter, director of the MILA and AI pioneer, wants to make Montréal a world leader in artificial intelligence and this new startup is part of that vision.
While the quality and flow may seem a little distorted in the above clip, the overall recreation is uncanny. Given how quickly information technology tends to improve these days, even better versions with near-perfect mimicry will surely emerge within the next few years. The implications are both amusing and, at the same time, rather alarming: when combined with real-time face capture software, such as Face2Face, it could be relatively easy to depict famous people making statements they never actually said in the real world.
"The situation is comparable to Photoshop," says de Brébisson. "People are now aware that photos can be faked. I think in the future, audio recordings are going to become less and less reliable [as evidence]."
• Follow us on Twitter
• Follow us on Facebook