Google DeepMind’s new AI tool uses video pixels and text prompts to generate soundtracks

5 months ago 56

Google DeepMind has taken the wraps off of a caller AI instrumentality for generating video soundtracks. In summation to utilizing a substance punctual to make audio, DeepMind’s instrumentality besides takes into relationship the contents of the video.

By combining the two, DeepMind says users tin usage the instrumentality to make scenes with “a play score, realistic dependable effects oregon dialog that matches the characters and code of a video.” You tin spot immoderate of the examples posted connected DeepMind’s website — and they dependable beauteous good.

For a video of a car driving done a cyberpunk-esque cityscape, Google utilized the punctual “cars skidding, car motor throttling, angelic physics music” to make audio. You tin spot however the sounds of skidding lucifer up with the car’s movement. Another example creates an underwater soundscape utilizing the prompt, “jellyfish pulsating nether water, marine life, ocean.”

Even though users tin see a substance prompt, DeepMind says it’s optional. Users besides don’t request to meticulously lucifer up the generated audio with the due scenes. According to DeepMind, the instrumentality tin besides make an “unlimited” fig of soundtracks for videos, allowing users to travel up with an endless watercourse of audio options.

That could assistance it basal retired from different AI tools, similar the sound effects generator from ElevenLabs, which uses substance prompts to make audio. It could besides marque it easier to brace audio with AI-generated video from tools similar DeepMind’s Veo and Sora (the second of which plans to yet incorporated audio).

DeepMind says it trained its AI instrumentality connected video, audio, and annotations containing “detailed descriptions of dependable and transcripts of spoken dialogue.” This allows the video-to-audio generator to lucifer audio events with ocular scenes.

The instrumentality inactive has immoderate limitations. For example, DeepMind is trying to amended its quality to synchronize articulator question with dialogue, arsenic you tin spot successful this video of a claymation family. DeepMind besides notes that its video-to-audio strategy is babelike connected video quality, truthful thing that’s grainy oregon distorted “can pb to a noticeable driblet successful audio quality.”

DeepMind’s instrumentality isn’t mostly disposable conscionable yet, arsenic it volition inactive person to acquisition “rigorous information assessments and testing.” When it does go available, its audio output volition see Google’s SynthID watermark to emblem that it’s AI-generated.

Read Entire Article