Nvidia Fugatto 1: Synthesize New Sounds with AI Audio Technology

Written by

November 26, 2024

Nvidia has unveiled a new generative AI model that can create audio from simple text prompts and contextual audio inputs, producing unique sounds. The company sees Fugatto 1 "as a tool for creatives, enabling them to quickly realize their sonic dreams and unheard sounds—an instrument for imagination, not just a substitute for creativity."

Research Insights

In their research paper, the Nvidia team explains that Large Language Models (LLMs) trained on text can deduce instructions from various inputs. However, LLMs focused solely on audio lack this capability since audio does not carry data indicating how it was generated.

Technical Details

Nvidia’s Fugatto 1 employs a unique dataset that encompasses a broad range of sounds, along with a technique for interpreting and managing instructions, known as ComposeableART. This enables the model to generate an emergent dataset that assists it in mixing various sounds, including those it wasn’t specifically trained to process.

Demonstration Examples

Nvidia has provided several demonstrations of the model’s capabilities on Fugatto’s Github page. Notable examples include synthesizing a dog barking in sync with electronic dance music, a typewriter softly whispering every letter typed, and even a saxophone that can bark or meow.

As of now, Nvidia does not have plans to make the model available to the public.

Source: Link,Link

Fugatto 1 Generative AI NVIDIA

Nvidia Fugatto 1: Synthesize New Sounds with AI Audio Technology

Research Insights

Technical Details

Demonstration Examples

Comments

Leave a Reply Cancel reply

More posts

AMD Surges Past Intel in CPU Sales, Even Old AM4 Chips Lead

Quantum MagNav: FAA-Grade Positioning Without Satellites

HMD Pulse 2, Pulse 2+, and Pulse 2 Pro: Affordable Mid-Range Phones

AMD RDNA 5 Leak Shows Minor Upgrades for Top GPUs