FlashLabs Releases Chroma 1.0: Open-Source Real-Time Speech-to-Speech Model with Voice Cloning
FlashLabs has launched Chroma 1.0, the first open-source speech-to-speech model capable of real-time voice conversion with voice cloning. Unlike traditional systems that rely on speech recognition and text-to-speech, Chroma is an end-to-end system enabling direct voice conversation.
Key features include:
- End-to-end latency under 150 ms, comparable to a live call
- High-quality voice cloning from just a few seconds of audio
- Voice similarity score (SIM) of 0.817, nearly identical
- Efficient reasoning with only 4 billion parameters
- Fully open weights and code
The model is also optimized for SGLang (LMSYS) for faster and cheaper inference, potentially offering a real open-source alternative to proprietary voice systems.
| Paper | Model | Code |