LLMs are Extremists. But for healthcare AI, the middle is equally important.
April 30, 2025
Transformer-based LLMs are impressive, but they weren’t designed to reason across multi-year medical histories. And that creates a hidden failure mode in clinical summaries: the middle gets lost.
Models like GPT-4, Claude, and LLaMA operate within a fixed context window, which is the maximum amount of information they can process at once. Even with a generous 128K-token window (~200 pages), that’s a fraction of what a real patient’s record contains.
Take my cofounder’s chart: 23 years, 6 specialties, 1000+ pages of unstructured data. Key details like a failed drug in year 7, or a colonoscopy in year 10, are buried in the middle. But transformers prioritize the beginning and end, often truncating or devaluing the middle. If the model never sees it, it can’t reason about it - yet it still generates a confident summary. And those omissions? They can derail care.
Why does this happen?
Transformers use self-attention to compare every token with every other token, making compute scale quadratically with input length. To stay efficient, models cap the context window. And when it’s full, attention skews toward the beginning and end, often ignoring or truncating the middle—which the model can't recall because it never processed it.
LLMs in healthcare don’t need more tokens, they need more intelligence.
Scaling compute on longitudinal data isn’t sustainable. What we need are smarter systems: ones that structure input before summarizing, retrieve clinically relevant (not just recent) facts, and generate insights grounded in specialty knowledge and timelines. Systems that can seamlessly blend decades of clinical knowledge and evolve as medicine does.