Google Tests LLMs on Real Scientific Questions in Superconductivity

Google researchers evaluated six large language models (LLMs) on 67 complex questions in high-temperature superconductivity, a challenging physics field. The models tested included GPT-4o, Claude 3.5, Gemini Advanced 1.5, Perplexity, NotebookLM, and a specialized RAG system. Expert assessments showed that models trained on a closed, carefully curated scientific database—comprising 15 key review articles, about 3,300 scientific citations, and around 1,700 selected experimental and theoretical sources—provided more accurate answers than those with open internet access.

This suggests LLMs can serve as effective virtual scientific assistants, helping researchers quickly understand complex topics and view diverse scientific perspectives, but their quality heavily depends on controlled knowledge sources rather than mere internet access.

Source: Google Research Blog