Qwen has officially launched Qwen3-TTS, an open-source text-to-speech system featuring five models across 0.6B and 1.8B parameter classes. The release includes free-form voice design for generating and editing voices by description, voice cloning, support for 10 languages, and a 12Hz tokenizer enabling strong audio compression with minimal quality loss.

This open-source stack allows users to fine-tune models, create custom voices, and train domain-specific TTS systems without dependence on providers, offering state-of-the-art quality on several metrics.

More information and resources are available on GitHub, Hugging Face, and Qwen’s blog.

GitHub Hugging Face Demo Blog Paper