Nvidia has unveiled Nemotron 3 Ultra, an open Mixture-of-Experts model with 550 billion total parameters (55 billion active during inference). According to Artificial Analysis, it surpasses Gemma 4 and gpt-oss-120b on quality benchmarks, sitting just below China’s Kimi K2.6 and closed flagship models like Opus 4.8.

The model achieves over 300 tokens per second through provider DeepInfra, roughly 3–6× faster than comparably sized open models from DeepSeek and Moonshot (50–100 t/s). The release on Hugging Face, OpenRouter, and other platforms is scheduled for June 4.

Nvidia newsroom · Artificial Analysis