Some diagnoses take 14 years

May 28

Nine years of clinical practice. And the cases that stay with me aren't the ones with clean answers. They're the families still waiting.

Not "we'll know in a few weeks" waiting. The deep kind: years of specialist appointments, tests that come back inconclusive, a child who doesn't fit any of the established categories. We call it the diagnostic odyssey. GCs who work in rare disease and pediatric genetics know it from the inside. Families live it for years.

That background is why a paper published last week in NEJM AI stopped me mid-scroll.

Researchers at Boston Children's Hospital and OpenAI ran an AI-assisted reanalysis of 376 previously unsolved rare disease cases: neurodevelopmental disorders, neuromuscular disease, sudden unexpected death in pediatrics, and early psychosis. Every single one had already been sequenced. Already reviewed by trained human analysts. Already returned to families without a diagnosis.

The tool went back through them. It found diagnoses in 18 cases: a 4.8% overall yield.

Eighteen families who had been waiting, some for more than a decade, got answers they didn't have before.

What the tool actually did

"AI diagnosed 18 rare diseases" misreads what happened, and the specifics matter for how you think about it.

The researchers used OpenAI's o3-deep-research model. They fed it annotated variant call files, HPO codes, and clinician notes. The model synthesized that data against published literature, generated structured candidate hypotheses, and flagged variants worth a second look. Then trained human analysts took every one of those candidates through full ACMG/AMP adjudication. Results went back to families only after expert review confirmed them.

The AI generated hypotheses. The clinicians made the diagnostic calls.

That's the workflow. It's a meaningful one, and the division of labor is the whole point.

4.8% is real. And modest.

18 families is real. Each one is years of not knowing that finally ended.

358 families are still without answers. This didn't resolve rare disease. The diagnostic odyssey continues for the vast majority of these cases. The authors say so plainly.

What I keep thinking about is why the diagnoses were missed the first time. Some cases were originally analyzed before the relevant gene-disease association even existed. One was missed because the analyst was focused on neuromuscular disease genes and didn't catch a variant outside that frame. In another case, the model noticed a pattern in low-quality genotype calls that pointed to a 22q11.2 deletion, without ever having access to CNV data.

Each of these has a structural explanation: the volume of literature no one person can fully track, cognitive anchoring on expected presentations, old cases that don't get revisited as the knowledge base evolves. The original analysts were doing their jobs well with the tools they had. AI-assisted workflows happen to address exactly those structural gaps.

What this means in practice

The inputs that drove this workflow (HPO codes, variant tables, clinician notes) are things we already produce. The model did an exhaustive literature sweep, cross-referenced phenotype against gene-disease associations, built a structured argument for each candidate, and handed the results to human experts for review. The whole process took 6-10 minutes per case. A standard ChatGPT Plus subscription is the only access it required.

In prenatal practice, I've sat with families who had done everything. Karyotype. Microarray. Methylation studies. Whole exome sequencing. Every available test, and still no answer for why their baby had the anomalies it had. Those families are making decisions about a pregnancy, sometimes irreversible ones, with nothing on paper that explains why this is happening. The uncertainty doesn't close when the pregnancy ends. It follows them. Even in acute grief, the first question that surfaces is almost always: what does this mean if I get pregnant again?

I have families I counseled six or seven years ago who still don't have answers. They may be done having children. But those pregnancies, and everything that came with them, don't just close. Being able to go back to one of those families and say: we looked again, we found something, we know why that happened. For a lot of people, that's a piece of the story they've been carrying for years finally landing somewhere. That matters well beyond the chart.

That's what this capability looks like to me in practice. What becomes possible, clinically and for the families still waiting, when a tool can run a rigorous literature sweep on cases that have been sitting unsolved for years?

Beyond rare disease

This paper lives in clinical genetics and rare disease. But the logic extends.

The core capability (AI synthesizing complex data and surfacing structured hypotheses for expert review) maps onto hereditary cancer counseling and prenatal genetics. In hereditary cancer, the variant evidence base updates constantly, and a missed reclassification has direct management implications. In prenatal genetics, the literature on low-frequency findings is often thin and fast-moving. Any subspecialty where new variant data publishes faster than any one clinician can absorb has the same structural gap this workflow addresses.

The limiting factor in all of these settings is the volume of literature synthesis required to support clinical judgment. That's the specific thing this kind of workflow does well. The judgment itself stays with us.

We're early. Prospective studies, institution-specific implementation, governance frameworks for how these tools get reviewed and deployed: that work is ahead. The authors are explicit about it. I'm not skipping past it.

But I've spent 9 years watching the knowledge base in this field expand faster than any individual clinician can absorb. So much of our interpretive work depends on having read the right paper at the right time. These tools are good at reading a lot of papers very quickly and presenting what they found for an expert to evaluate.

For 18 families this time, that mattered.

The question for us as a profession is whether we want to be the ones shaping how that capacity gets used in practice (which cases it gets applied to, how results are reviewed, what role GCs play in the workflow), or whether those decisions get made without us at the table.

Reference: Jaech A, Cheatham M, Shringarpure SS, et al. LLM-Assisted Reanalysis of Unsolved Rare Disease Genomes Increases Diagnostic Yield. NEJM AI. Published June 18, 2026. DOI: 10.1056/AIcs2501343

Tri-founders Admin

Some diagnoses take 14 years

What the tool actually did

4.8% is real. And modest.

What this means in practice

Beyond rare disease

Multiply what’s possible