FlashLabs has launched Chroma 1.0, the first open-source speech-to-speech model capable of real-time voice conversion with voice cloning. Unlike traditional systems that rely on speech recognition and text-to-speech, Chroma is an end-to-end system enabling direct voice conversation.

Key features include:

  • End-to-end latency under 150 ms, comparable to a live call
  • High-quality voice cloning from just a few seconds of audio
  • Voice similarity score (SIM) of 0.817, nearly identical
  • Efficient reasoning with only 4 billion parameters
  • Fully open weights and code

The model is also optimized for SGLang (LMSYS) for faster and cheaper inference, potentially offering a real open-source alternative to proprietary voice systems.

Paper Model Code