Sakana AI Launches KAME Voice AI That Thinks While Speaking

Japanese company Sakana AI has introduced KAME, a voice AI system that thinks on the fly while speaking, aiming to revolutionize voice assistants. KAME uses a dual-stream approach: a lightweight speech model starts replying immediately to avoid awkward pauses, while a powerful language model refines the response in real-time. The system allows swapping backend models like Claude, Gemini Flash, or GPT to balance speed, reasoning, or style without rebuilding the voice layer.

This innovation addresses the trade-off between rapid but shallow responses and slower, more thoughtful replies, potentially making voice assistants more natural and conversational. KAME is already available on Hugging Face, with details and the research paper published online.

Blog

Paper

Source: @ai_machinelearning_big_data (http://t.me/ai_machinelearning_big_data)