Silberberg started sending videos like it to a colleague, who discovered the app, by a startup called ElevenLabs, being used to make them. Silberberg’s colleague bought the two of them a basic ElevenLabs account to use for a month at the low price of $1. “We just spent the day generating funny things that Joe Biden was saying, and sending them back and forth to each other,” Silberberg said. Within the day, they’d used up the month’s credits.
In the end, the pair decided to split the cost of moving up to the next subscription level, which put them back $10 each. The first thing Silberberg used it for was to make a video of Joe Biden addressing the nation while being trapped in the house from the 2022 indie horror film Skinamarink. (Silberberg used other software for the video that the audio is overlaid onto.)
Silberberg has since made dozens of videos that use ElevenLabs tech — a process that isn’t always easy, as the app is often, as he put it, “pretty janky.” Many of the clips involve Biden getting into improbable situations, while others poke fun at the vapidity of Joe Rogan and Ben Shapiro, creating conversations between them that reference the movies Ratatouille and Old.
It takes around two hours for Pade to make each of the videos, which range from 30 seconds to one minute in length. He records gameplay and then generates the voices using AI. “I actually start with the TikTok or vertical video then I make the YouTube video after, and maybe add a couple more jokes that go on that video, since YouTube likes longer content,” he said.
He has some theories as to why the videos work. “I think the absurdity of seeing famous figures in any random gaming session is genuinely hilarious,” he said. “Surprisingly I see a lot of love for the ones where figures that might not get along in real life — like Trump and Biden — are having wholesome moments together. I think part of it is a relief from seeing figures that are always embroiled in controversy in a different light, even if it’s fake.”
It’s not just the leaders of the free world and the leading light in podcasting who are being mimicked using the power of AI. Joe Marotta has been using the tech to give new life to his personal interest: professional wrestling. Marotta, a 37-year-old podcaster from New Jersey, came across a use of AI-generated audio in early February on Twitter. He thought the tech would be a fun way to promote
his retro pop culture podcast,Acid Washed Memories. The idea was to get 1980s pro wrestling commentators Gorilla Monsoon and Bobby “The Brain” Heenan to hawk it.
“I signed up for [ElevenLabs] and put Gorilla and Bobby’s voices in there to do a promo for Acid Washed Memories, and was happy with how it came out,” he wrote via Twitter DM. The success of the podcast promo skit pushed Marotta to test the tech further. “I figured, ‘Okay, well, what if Gorilla Monsoon had a podcast? What would he say?’”
The resulting foul-mouthed AI parody, first posted on Twitter on Feb. 6, has since been viewed nearly 320,000 times. It worked because it pokes fun at the gap between Monsoon’s friendly, laid-back onscreen demeanor and his short-tempered tendencies when the camera is off, and because of a recent trend of faded names from the history of wrestling launching their own podcasts.
Marotta is now on Part 25 of the Monsoon podcast clip series, with each two-minute video taking around an hour to produce. “I try to take real-life situations and play upon them with fiction, or just make up stories that sound plausible,” he said. “I think the fact that so many wrestling fans know the character and mannerisms of Gorilla, Bobby, and Gene [Okerlund], for example, makes it easy to imagine it’s them really saying these things.”
But both Marotta and Silberberg draw the line at using the power of these fake AI audio tools for nefarious means. “While ElevenLabs has really driven an explosion of this kind of memeable content, it’s not particularly new,” said Henry Ajder, a UK-based deepfake expert. But what the software has done is put high-quality AI audio generation into the hands of ordinary users. “It’s led to this trend of certain people being targeted,” Ajder said.
Ajder believes that for as long as the conceits of the audio snippets remain obviously outlandish, and target well-known people, the slippery slope toward extreme disinformation content — for instance, the kind of deepfakes that could spark a war — can largely be avoided.
“This is targeting Joe Biden, one of the most famous people in the world,” he said. “This kind of content is quite clearly fake, based on the context and how well-known the individual is. What interests me is when we think about slightly less well-known politicians or private individuals.”
Silberberg said he hopes that these audio deepfakes stay “in the realm of harmless bullshit, and it doesn't deviate from that.” But he’s also a realist: “I know that's not going to happen and already isn't happening.”