Pull to refresh
Logo
Daily Brief
Following
Why Sign Up
ARC Prize Foundation

ARC Prize Foundation

AI Benchmark Organization

Appears in 1 story

Stories

Google Gemini's push toward scientific reasoning

New Capabilities

Verified Gemini 3 Deep Think's 84.6% ARC-AGI-2 score

OpenAI launched the first commercial reasoning model in September 2024. Seventeen months later, Google claims its upgraded Gemini 3 Deep Think has pulled ahead on the benchmarks that matter most for science. The February 2026 update scored 84.6% on ARC-AGI-2—a test designed to measure how well artificial intelligence generalizes to novel problems—and 48.4% on Humanity's Last Exam, a collection of 2,500 expert-level questions crowdsourced from nearly 1,000 specialists worldwide.

Updated Feb 13