Google has recently introduced a new update to its video-to-audio technology, enabling users to incorporate AI-generated scores, sound effects, and dialogue into their video clips. The technology, as explained in a blog post from Google, utilizes a combination of video pixels and natural language text prompts to create immersive soundscapes that complement the on-screen action. This latest development is a significant advancement towards making generated movies more realistic and engaging, marking a major leap forward in the capabilities of video editing tools.
DeepMind, a research laboratory established by Google dedicated to Artificial Intelligence (AI), has highlighted the capability of its technology to comprehend raw pixels without the need for text prompts, though utilizing them can enhance the precision of the software. The newly developed feature allows for an advanced level of control where the system, known as V2A, can create an extensive range of soundtracks corresponding to any given video input. Users also have the option to provide a 'positive prompt' to guide the output towards desired sounds, or a 'negative prompt' to steer away from undesirable sounds, as stated by a Google spokesperson. While the update has not yet been made publicly available, there are ongoing efforts to address various limitations and continue with further research and development. It's crucial to note that the quality of the audio produced is directly influenced by the quality of the video input, with anomalies or distortions in the video data that fall outside the model's training range potentially resulting in a decrease in audio quality.
Explore these example clips showcasing the technology in action below.
Written by