Using AI to help physicians diagnose rare genetic diseases affecting children
Captured source
source ↗Using AI to help physicians diagnose rare genetic diseases affecting children | OpenAI
June 18, 2026
Applied AI
Using AI to help physicians diagnose rare genetic diseases affecting children
In an NEJM AI study, experts used an OpenAI reasoning model to reanalyze 376 previously unsolved cases and surface leads for 18 diagnoses.
Read the study abstract at NEJM AI
Loading…
Share
Even with genomic sequencing, many people with rare diseases never receive a clear genetic diagnosis. Roughly half remain undiagnosed after extensive testing and specialist review. Their medical data may contain clues but finding them can require sifting through thousands to millions of possible genetic variants, fragmented clinical records, and rapidly changing scientific literature.
As new gene-disease relationships, case reports, and classification evidence accumulate, unsolved cases can become newly interpretable.
Researchers from Boston Children’s Hospital’s Manton Center for Orphan Disease Research, Harvard University, and OpenAI used the OpenAI o3 Deep Research reasoning model to analyze de-identified clinical and genomic information from 376 previously analyzed cases that remained unsolved. The model surfaced evidence-linked candidate explanations for researchers and clinicians to review. Following expert review, additional testing, and clinical confirmation, physicians established diagnoses in 18 cases—an additional diagnostic yield of 4.8% after earlier analysis by specialists. This study was published on June 18, 2026, in NEJM AI and shows how an AI-assisted research workflow can help experts generate leads when revisiting some of the most difficult cases.
Many of these cases had evaded years of expert analysis. In this study, OpenAI o3 Deep Research helped researchers identify leads that were later assessed through established clinical processes, suggesting that expert-led periodic reanalysis could become more scalable as knowledge evolves. The model did not diagnose any patient or make any clinical decision. It produced evidence-linked hypotheses for specialists to review and, where appropriate, investigate through additional testing and confirm in a clinical laboratory.
Why an old case can contain a new answer
An inconclusive genetic test is not always a permanent finding. A patient’s phenotype descriptions, test results, and family history can be split across databases that use different identifiers, formats, and vocabularies. Linking those records is difficult, so even specialists can miss a diagnosis. Experts may also sequence a child’s genome before a relevant gene or its variants have been linked to disease. As scientific knowledge advances, the same data can reveal answers that were previously impossible to uncover.
Rare-disease reanalysis is both a scientific and a maintenance problem. The patient’s genome may stay the same, but the evidence around it keeps changing: researchers link new genes and variants to disease, labs reclassify old variants, and case databases and papers accumulate new observations. Each update can make an old inconclusive case worth revisiting, so many institutions inherit a growing backlog of genomes to keep in sync with a moving knowledge base.
In this study, researchers designed the workflow so that the model acted as an explanation-first reasoning layer on top of existing genomic pipelines. Instead of returning only a ranked gene, it was asked to connect the clinical features, inheritance pattern, variant evidence, and scientific literature into a justification that a human reviewer could interrogate.
How the reanalysis worked
For each case, the team assembled a de-identified packet containing standardized Human Phenotype Ontology terms to describe the patient’s clinical presentation, occasional clinician notes and any descriptive clinical diagnosis, metadata such as age and gender, and a filtered variant table. The table captured each variant’s rarity, its predicted effect on the encoded protein, ClinVar classification, and signal quality across available family members. Most cases included data from the child and both biological parents.
The team asked the model to propose the most plausible molecular explanation and to show its work. Researchers then reviewed the outputs using the same ACMG/AMP framework that clinical labs use to classify genetic variants. At least two team members reviewed each candidate, disagreements were resolved by consensus, and a model output was never treated as a diagnosis. A finding counted as a diagnosis only after qualified experts reviewed the evidence, the variant was classified as pathogenic or likely pathogenic, a CLIA-certified laboratory confirmed it, and the clinical team returned the result to the family.
Before analyzing unsolved cases, the team refined the workflow on cases with established diagnoses. It recovered the correct gene and variant in duplicate runs for 48 of 51 cases that included a variety of rare conditions. In a set of 57 neuromuscular cases, the workflow returned the correct diagnosis in duplicate runs for 45 of the cases. In a 15-case long-read genome set, it named the correct gene in every case and both disease-causing alleles in 12 cases. These evaluations aided in prompt development and showed where expert review remained essential.
The model’s self-reported confidence scores tracked with correct diagnoses in these previously solved cases: the mean minimum score was 85.6 for consistently correct calls and 42.1 for incorrect or unknown calls. The scores were not calibrated probabilities, and the team did not use them as a substitute for evidence or clinical adjudication. But they were helpful in guiding the expert reviewers to focus on the most promising candidate diagnoses.
What the researchers found
The team then applied the workflow to four groups of previously unsolved cases: children with neurodevelopmental conditions, people with rare neuromuscular disease, children and adolescents with early psychosis, and cases of sudden unexpected death in pediatrics. These were not fresh cases awaiting a first review. Many had already been examined by multiple commercial or institutional pipelines and discussed by multidisciplinary teams.
Results by cohort
Cohort
Cases
Diagnoses surfaced
Yield
Neurodevelopmental
100
10
10.0%
Neuromuscular disease
61
4
6.6%
Sudden unexpected death in pediatrics
200
2
1.0%
Early psychosis
15
2
13.3%
Total
376
18...
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Significant healthcare application of AI by major lab.