Anthropic has introduced BioMysteryBench, a new benchmark testing large language models (LLMs) on real bioinformatics tasks. Claude, Anthropic’s LLM, matches the performance of PhD-level experts on 76 tasks and outperforms panels of scientists on 23 particularly challenging problems.

Claude leverages canonical bioinformatics tools and databases like NCBI and Ensembl, combining vast scientific knowledge with live data analysis. Anthropic highlights Claude’s dual strategy of comprehensive knowledge integration and multi-method verification to ensure accurate results.

Meanwhile, Genentech and Roche released CompBioBench, where Claude Opus 4.6 achieves 81% overall and 69% on the hardest questions.

Source: Anthropic