It took thirteen years and $2.7 billion to read the first human genome. Now a single AI model, trained on 9 trillion DNA base pairs from more than 128,000 species, can predict whether an uncharacterized mutation in a breast cancer gene is dangerous—with 90 percent accuracy—without ever being shown that gene. On March 4, the Arc Institute and NVIDIA published Evo 2 in Nature, the largest biological foundation model ever built: 40 billion parameters, a context window of one million nucleotides, and the ability to design synthetic genomes the size of a simple bacterium.
Evo 2 sits at the leading edge of a wave that began when DeepMind's AlphaFold cracked protein structure prediction in 2020. That achievement won a Nobel Prize. The next frontier is harder: moving from reading biology to writing it. Evo 2 can already generate functional virus genomes that infect bacteria—a capability with direct applications in fighting antibiotic-resistant infections, and obvious biosafety implications for misuse. The model's code, training data, and weights are all publicly available, while sequences from human pathogens were deliberately excluded from training as a safety measure. What comes next—and who governs it—is an open question that researchers, regulators, and biosecurity experts are racing to answer.