AI models learn to read, predict, and write the genetic code of life

Overview

It took thirteen years and $2.7 billion to read the first human genome. Now a single AI model, trained on 9 trillion DNA base pairs from more than 128,000 species, can predict whether an uncharacterized mutation in a breast cancer gene is dangerous with 90 percent accuracy without being trained on that gene.

On March 4, the Arc Institute and NVIDIA published Evo 2 in Nature. It's the largest biological foundation model ever built, with 40 billion parameters, a context window of one million nucleotides, and the ability to design synthetic genomes the size of a simple bacterium. Evo 2 sits at the leading edge of a wave that began when DeepMind's AlphaFold cracked protein structure prediction in 2020, an achievement that won a Nobel Prize.

The next frontier is harder: moving from reading biology to writing it. Evo 2 can generate functional virus genomes that infect bacteria, a capability with direct applications in fighting antibiotic-resistant infections and obvious biosafety implications for misuse.

The model's code, training data, and weights are publicly available, while sequences from human pathogens were deliberately excluded from training as a safety measure. What comes next and who governs it remain open questions that researchers, regulators, and biosecurity experts are racing to answer.

Questions about this story

No questions yet — be the first to ask.

Play on this story Voices Debate Predict

Key Indicators

9.3T

DNA base pairs in training data

Evo 2 was trained on 9.3 trillion nucleotides from 128,000+ species spanning all three domains of life.

40B

Model parameters

The largest version of Evo 2 has 40 billion parameters, making it the biggest AI model built for biology.

90%

BRCA1 variant classification accuracy

Evo 2 predicted whether previously uncharacterized BRCA1 mutations affect gene function with 90 percent accuracy, without any task-specific training.

Nucleotide context window

The model can process up to one million nucleotides at once—eight times more than Evo 1—enabling it to capture long-range dependencies across genomes.

Viable AI-designed bacteriophages

Out of roughly 300 AI-generated phage genome designs tested, 16 proved functional, with some outperforming natural phages.

Voices

Curated perspectives — historical figures and your fellow readers.

Ever wondered what historical figures would say about today's headlines?

Play

Exploring all sides of a story is often best achieved with Play.

Predict 4 ways this could play out. Contrarian picks score more — points lock when the scenario resolves. Log in to play

Connections Sixteen names from the news. Find the four hidden groups of four. Log in to play

People Involved

Patrick Hsu

Co-senior author on Evo 2

Brian Hie

Co-senior author on Evo 2

Demis Hassabis

2024 Nobel Laureate in Chemistry for AlphaFold

Organizations Involved

Arc Institute

Independent Research Institute

Developer of Evo 1 and Evo 2

A non-profit research institute in Palo Alto that gives scientists long-term, unrestricted funding to pursue high-risk biological research in partnership with Stanford, UC Berkeley, and UC San Francisco.

Nvidia Corporation

Chip Designer

Infrastructure partner and co-developer of Evo 2

The dominant maker of graphics processing units (GPUs) used in AI training, NVIDIA provided the computing infrastructure and engineering collaboration for Evo 2.

Google DeepMind

AI research division

Developer of AlphaFold, the foundational precedent for biological AI

Google's AI research lab created AlphaFold, the protein structure prediction system that proved deep learning could transform biology and won the 2024 Nobel Prize in Chemistry.

Timeline

November 2020 March 2026

9 events Latest: March 4th, 2026 · 4 months ago

Tap a bar to jump to that date

March 2026
Evo 2 published in Nature
Latest Publication

The peer-reviewed Evo 2 paper appeared in Nature, describing the 40-billion-parameter model's ability to predict pathogenic mutations and design synthetic genomes across all domains of life.
March 4th, 2026
September 2025
AI-generated bacteriophages shown to be functional
Research

Researchers used Evo models to generate synthetic bacteriophage genomes. Of roughly 300 designs tested, 16 proved viable, with some outperforming natural phages and a cocktail overcoming bacterial resistance in three E. coli strains.
September 12th, 2025
February 2025
Evo 2 preprint released with open-source code and data
Research

Arc Institute and NVIDIA posted the Evo 2 preprint on bioRxiv, alongside publicly releasing model weights, training code, and the OpenGenome2 dataset of 9.3 trillion nucleotides.
February 18th, 2025
November 2024
Evo 1 published in Science
Research

Arc Institute published Evo 1 in Science: a 7-billion-parameter model trained on prokaryotic genomes that could generate functional CRISPR systems and transposons, marking the first protein-RNA codesign with a language model.
November 15th, 2024
October 2024
AlphaFold creators win Nobel Prize in Chemistry
Recognition

Demis Hassabis and John Jumper of Google DeepMind received the Nobel Prize in Chemistry for AlphaFold's protein structure predictions. David Baker shared the prize for computational protein design.
October 9th, 2024
January 2023
ProGen demonstrates AI-designed functional proteins
Research

Salesforce Research published results in Nature Biotechnology showing that its ProGen model could generate novel protein sequences, with 73 percent of AI-designed proteins proving functional in lab tests—outperforming 59 percent of natural proteins.
January 26th, 2023
November 2022
Meta releases ESM-2 and ESMFold protein language models
Research

Meta AI released ESM-2, a 15-billion-parameter protein language model, alongside ESMFold for structure prediction. The accompanying Metagenomic Atlas predicted structures for over 617 million proteins.
November 1st, 2022
December 2021
Arc Institute launches with $650 million in funding
Institutional

The Arc Institute launched in Palo Alto with a novel funding model: eight-year unrestricted grants for scientists, in partnership with Stanford, UC Berkeley, and UC San Francisco.
December 15th, 2021
November 2020
AlphaFold 2 cracks protein folding at CASP14
Breakthrough

Google DeepMind's AlphaFold 2 predicted protein structures with accuracy matching laboratory experiments at the CASP14 competition, effectively solving a fifty-year-old problem in biology.
November 30th, 2020

Historical Context

3 moments from history that rhyme with this story — and how they unfolded.

1990–April 2003

Human Genome Project (1990–2003)

An international consortium of researchers spent thirteen years and approximately $2.7 billion to sequence the first human genome's 3 billion base pairs. When completed in April 2003, it covered about 92 percent of the genome and was hailed as biology's equivalent of the Moon landing.

Then

Sequencing costs began a dramatic decline—from $50 million for a second genome in 2003 to under $200 by 2024—as next-generation sequencing technology emerged.

Now

The project created the reference genome that underpins all modern genomics, from cancer diagnostics to ancestry testing. But reading the genome turned out to be far easier than understanding it—the function of most genetic variation remains unknown.

Why this matters now

Evo 2 was trained on the genomic data that the Human Genome Project and its successors generated. Its ability to predict mutational effects without task-specific training directly addresses the interpretation gap that has persisted since 2003: we can read genomes cheaply, but understanding what the variations mean has remained the bottleneck.

November 2020

AlphaFold 2 solves protein structure prediction (2020)

Google DeepMind entered AlphaFold 2 in the CASP14 protein structure prediction competition and achieved accuracy comparable to experimental methods, solving a problem that had stymied biologists for fifty years. The team later predicted structures for virtually all 200 million known proteins and made the database freely available.

Then

The structural biology community gained instant access to predicted structures that would have taken years to determine experimentally. More than two million researchers used the database within two years.

Now

Demis Hassabis and John Jumper won the 2024 Nobel Prize in Chemistry. AlphaFold demonstrated that AI could transform biology, catalyzing a wave of biological foundation models—including ESM-2, ProGen, and Evo—that expanded from protein structure to protein design to whole-genome modeling.

Why this matters now

AlphaFold proved the core premise that Evo 2 extends: biological sequence data contains enough information for AI to learn deep functional relationships. AlphaFold worked on proteins; Evo 2 operates on raw DNA across all of life, a larger and more fundamental challenge.

February 1975

Asilomar Conference on Recombinant DNA (1975)

140 biologists, lawyers, and journalists gathered at Asilomar, California, to address safety concerns about recombinant DNA technology—the ability to splice genes from one organism into another. Scientists had voluntarily paused certain experiments and convened the conference to establish safety guidelines before proceeding.

Then

The conference produced a set of safety guidelines that informed the National Institutes of Health's regulations on recombinant DNA research, allowing the work to continue under oversight.

Now

Asilomar became the defining example of scientific self-regulation. Recombinant DNA technology went on to produce insulin, gene therapy, and genetically modified crops. The guidelines evolved but the framework of voluntary caution followed by formal regulation persisted.

Why this matters now

The Evo 2 team's decision to exclude human pathogen sequences from training data echoes Asilomar's approach: scientists voluntarily limiting their own work before regulators act. But the parallel has limits—Asilomar governed a handful of labs, while Evo 2's open-source release means the model is available to anyone with sufficient computing power.