FlashLabs Releases Chroma 1.0: Open-Source Real-Time Speech-to-Speech Model with Voice Cloning

FlashLabs has launched Chroma 1.0, the first open-source speech-to-speech model capable of real-time voice conversion with voice cloning. Unlike traditional systems that rely on speech recognition and text-to-speech, Chroma is an end-to-end system enabling direct voice conversation.

Key features include:

End-to-end latency under 150 ms, comparable to a live call
High-quality voice cloning from just a few seconds of audio
Voice similarity score (SIM) of 0.817, nearly identical
Efficient reasoning with only 4 billion parameters
Fully open weights and code

The model is also optimized for SGLang (LMSYS) for faster and cheaper inference, potentially offering a real open-source alternative to proprietary voice systems.

Paper

Model

Code