Do you know President Joe Biden’s voice well enough to tell it apart from an AI imposter?
That was the challenge audience members were tasked with during a live taping of the Radio Atlantic podcast at this year’s Cascade PBS Ideas Festival for an episode about artificial intelligence’s potential impact on elections.
Audience members listened to three clips of Biden speaking, two real and one generated by AI audio software capable of mimicking anyone’s voice. Asked to identify the imposter, the crowd was divided fairly evenly among the three clips, meaning about two-thirds of attendees were fooled by the robot president.
The exercise wasn’t just theoretical. In January, just two days before the presidential primary, about 5,000 New Hampshire voters received a robocall purporting to be Biden and urging them to save their vote until November. An investigation revealed that a Democratic political consultant named Steve Kramer had paid a New Orleans magician to produce the call. Kramer claimed he did it to sound the alarm about the dangers of such technology.
“The call went out on a Sunday morning. Are you really going to question, especially if you’re not paying close attention to technology, if that’s your president calling? The software is working out the kinks, but the believability has crossed a threshold, which is alarming,” said Atlantic Magazine tech reporter Charlie Warzel at Saturday’s talk.
Warzel recently published a story about ElevenLabs, a small British software startup that’s leading the charge on AI audio technology. Long-term, the company wants to enable seamless, real-time language translation. Shorter-term, they want to improve voice dubbing for television and movies.
Right now, for $22/month, anyone can upload audio samples into the program and spit out clips that sound convincingly like a person. Some authors are using it to record audiobooks. The Atlantic uses it to produce audio versions of its articles.
Journalist Hanna Rosin, host of Radio Atlantic, used ElevenLabs software for the Biden test as well as another in which she played two clips of herself saying the same phrase, one she recorded and one produced by ElevenLabs AI. Once again, the audience was divided on which was which.
The implications of this readily available technology are frightening. In the political sphere, bad actors could use it to make deepfakes of candidates saying damaging statements or for robocalls a la the New Hampshire primary. Elsewhere, scammers are already using it to extort ransom money in fake kidnappings and defraud banks.
Warzel said that some big tech companies have pledged to work on potential defenses, such as a digital watermarking that would help people discern the origins of an audio clip, and that the founders of ElevenLabs have acknowledged the potential pitfalls of their technology. But to date there isn’t any real regulation of AI audio that might mitigate its harm.
“The only bulwark against this stuff is I do think people are generally pretty dubious of most things. … We have a little more of a defense now than we did in 2016. A lot of people are just kind of beaten down by the misinformation in the world. They’re less willing to pick up the robocall,” said Warzel.
But, he acknowledged, skepticism and tech savvy go only so far.
“It only takes one: some person in late October or early November that puts out something that’s just good enough and it’s the last thing someone sees before they go to the polls,” Warzel explained. “Those are the things that make you nervous. I don’t think we’re at a godlike ability to change reality. But it’s somewhere in between.”
If you want to watch this whole session, it will air on Cascade PBS on May 18 at 7 p.m. and stream on cascadepbs.org and crosscut.com the next day. Listen to all sessions on the Cascade PBS Ideas Festival podcast.