LLMs are stochastic machines
April 20,2025
Randomness is for chatbots, not healthcare.
Large Language Models (LLMs) are some of the most sophisticated prediction machines. They predict the next token by estimating the conditional probability distribution over possible outputs, given the previous tokens. With enough training data, they get really good at this. But no matter how impressive their performance, one fact remains: these systems are inherently probabilistic.
In casual settings like a chatbot, this randomness can be a feature, not a bug. It adds variety and makes interactions feel more human. If your chatbot gave you the same answer every time, it would feel robotic. But in healthcare, where accuracy and consistency are non-negotiable, randomness becomes a liability.
At Acucare, we’ve seen this firsthand in clinical use cases like chart summarization and provider-facing Q&A. You might get a great answer once, but the next run subtly drops a key detail.
There are broadly two types of randomness we see in LLMs:
Format Randomness: This is related to how the key facts are presented. Should they be presented as bullets or paragraphs? Chronologically or in a SOAP format? Much of this randomness can be eliminated by prompt engineering, but not all.
Content Randomness: This is where things get riskier. The model might omit or include different clinical details across runs, even when the input stays the same. That inconsistency isn’t just inconvenient - it can be dangerous. Even with advanced prompting strategies like few-shot or chain-of-thought reasoning, LLMs still guess. And crucially, they don’t know when they’re guessing. They just pick the next likely token and move on.
The underlying issue? LLMs don’t know clinical facts. They approximate them.
Without careful tuning for the nuances of each specialty - the symptoms, diagnoses, and workflows that define how care is delivered - the model's output is little more than a well-educated guess. This is why specialty context matters. It’s how we reduce content-level randomness and bring outputs closer to what clinicians actually need.
Without it, we’re just playing Russian roulette with patient safety.