AlphaGenome: AI for better understanding the genome
Captured source
source ↗AlphaGenome: AI for better understanding the genome — Google DeepMind Skip to main content
June 25, 2025 Science AlphaGenome: AI for better understanding the genome Ziga Avsec and Natasha Latysheva
Share
Introducing a new, unifying DNA sequence model that advances regulatory variant-effect prediction and promises to shed new light on genome function — now available via API. Update January 2026: This research has been published in Nature. You can read the full paper here and access the model here . The genome is our cellular instruction manual. It’s the complete set of DNA which guides nearly every part of a living organism, from appearance and function to growth and reproduction. Small variations in a genome’s DNA sequence can alter an organism’s response to its environment or its susceptibility to disease. But deciphering how the genome’s instructions are read at the molecular level — and what happens when a small DNA variation occurs — is still one of biology’s greatest mysteries. Today, we introduce AlphaGenome , a new artificial intelligence (AI) tool that more comprehensively and accurately predicts how single variants or mutations in human DNA sequences impact a wide range of biological processes regulating genes. This was enabled, among other factors, by technical advances allowing the model to process long DNA sequences and output high-resolution predictions. To advance scientific research, we’re making AlphaGenome available in preview via our AlphaGenome API for non-commercial research, and planning to release the model in the future. We believe AlphaGenome can be a valuable resource for the scientific community, helping scientists better understand genome function, disease biology, and ultimately, drive new biological discoveries and the development of new treatments. How AlphaGenome works Our AlphaGenome model takes a long DNA sequence as input — up to 1 million letters, also known as base-pairs — and predicts thousands of molecular properties characterising its regulatory activity. It can also score the effects of genetic variants or mutations by comparing predictions of mutated sequences with unmutated ones. Predicted properties include where genes start and where they end in different cell types and tissues, where they get spliced, the amount of RNA being produced, and also which DNA bases are accessible, close to one another, or bound by certain proteins. Training data was sourced from large public consortia including ENCODE , GTEx , 4D Nucleome and FANTOM5, which experimentally measured these properties covering important modalities of gene regulation across hundreds of human and mouse cell types and tissues.
Your browser does not support the video tag. Your browser does not support the video tag.
Animation showing AlphaGenome taking one million DNA letters as input and predicting diverse molecular properties across different tissues and cell types.
The AlphaGenome architecture uses convolutional layers to initially detect short patterns in the genome sequence, transformers to communicate information across all positions in the sequence, and a final series of layers to turn the detected patterns into predictions for different modalities. During training, this computation is distributed across multiple interconnected Tensor Processing Units (TPUs) for a single sequence. This model builds on our previous genomics model, Enformer and is complementary to AlphaMissense , which specializes in categorizing the effects of variants within protein-coding regions. These regions cover 2% of the genome. The remaining 98%, called non-coding regions, are crucial for orchestrating gene activity and contain many variants linked to diseases. AlphaGenome offers a new perspective for interpreting these expansive sequences and the variants within them. AlphaGenome’s distinctive features AlphaGenome offers several distinctive features compared to existing DNA sequence models: Long sequence-context at high resolution Our model analyzes up to 1 million DNA letters and makes predictions at the resolution of individual letters. Long sequence context is important for covering regions regulating genes from far away and base-resolution is important for capturing fine-grained biological details. Previous models had to trade off sequence length and resolution, which limited the range of modalities they could jointly model and accurately predict. Our technical advances address this limitation without significantly increasing the training resources — training a single AlphaGenome model (without distillation) took four hours and required half of the compute budget used to train our original Enformer model. Comprehensive multimodal prediction By unlocking high resolution prediction for long input sequences, AlphaGenome can predict the most diverse range of modalities. In doing so, AlphaGenome provides scientists with more comprehensive information about the complex steps of gene regulation. Efficient variant scoring In addition to predicting a diverse range of molecular properties, AlphaGenome can efficiently score the impact of a genetic variant on all of these properties in a second. It does this by contrasting predictions of mutated sequences with unmutated ones, and efficiently summarising that contrast using different approaches for different modalities. Novel splice-junction modeling Many rare genetic diseases, such as spinal muscular atrophy and some forms of cystic fibrosis, can be caused by errors in RNA splicing — a process where parts of the RNA molecule are removed, or “spliced out”, and the remaining ends rejoined. For the first time, AlphaGenome can explicitly model the location and expression level of these junctions directly from sequence, offering deeper insights about the consequences of genetic variants on RNA splicing. State-of-the-art performance across benchmarks AlphaGenome achieves state-of-the-art performance across a wide range of genomic prediction benchmarks, such as predicting which parts of the DNA molecule will be in close proximity, whether a genetic variant will increase or decrease expression of a gene, or whether it will change the gene’s splicing pattern.
Bar graph showing AlphaGenome’s relative improvements on selected DNA sequence and variant effect tasks, compared against results for the current best methods in each category.
When producing predictions for single DNA sequences, AlphaGenome outperformed the best external models on 22 out of 24…
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Notable research post by DeepMind