Microsoft Open Sources Harrier Embeddings Family
Microsoft has open-sourced the Harrier-OSS-v1 family of multilingual embedding models designed for text vectorization. These models excel in information retrieval, clustering, text classification, and semantic similarity tasks, achieving state-of-the-art results on the Multilingual MTEB v2 benchmark.
The flagship model features 27 billion parameters, with a compact 270 million parameter version, both based on the Gemma 3 architecture. A mid-sized 600 million parameter model uses the Qwen 3 base. They support a context window of up to 32,000 tokens and cover over 94 languages, including Russian.
Harrier models are open under the MIT license and integrated with sentence-transformers, LangChain, and LlamaIndex.