Dia by Nari Labs
Dia is a 1.6 billion parameter open-source text-to-speech (TTS) model developed by Nari Labs. Specializing in generating ultra-realistic, multi-speaker dialogues directly from text, Dia incorporates emotional nuances and non-verbal cues such as laughter and sighs. The model supports audio conditioning, allowing users to guide output tone and emotion using short audio samples. Released under the Apache 2.0 license, Dia is designed for applications in virtual assistants, gaming, audiobooks, and accessibility tools.
Key Attributes
Multi-Speaker Dialogue Generation: Produces realistic conversations between multiple distinct voices from a single text script.
Emotional and Non-Verbal Expression: Integrates non-verbal sounds like laughter and coughing to enhance expressiveness.
Audio Conditioning: Allows tone and emotion control through short reference audio samples.
Open Source: Available under the Apache 2.0 license, promoting community involvement and innovation.
Real-Time Performance: Operates efficiently on consumer-grade GPUs, with planned support for CPU use and quantized models.
Example Use Cases
Creating dynamic dialogues for virtual assistants and chatbots.
Generating character voices in video games and interactive media.
Producing audiobooks with expressive, multi-character narration.
Developing assistive technologies for individuals with speech impairments.


