top of page

OpenAI’s Hallucination Paper: Insightful Research, But the Blame Game Misses the Mark

Graph illustrating OpenAI's article about hallucinations and the difference between market expectations and incentives by leader boards
Miss-alignment of consumer expectations and model incentives in AI Hallucinations

Kudos to OpenAI for publishing a dense and thoughtful academic paper that tackles one of the most widely discussed — and misunderstood — challenges in generative AI: hallucinations.


The paper makes two core arguments:

  1. Hallucinations are not mysterious OpenAI suggests that hallucinations don’t stem from some arcane flaw in the models but instead emerge naturally from how these systems are trained. They describe a connection between supervised and unsupervised learning that “demystifies their origin,” even when training data includes the phrase “I don’t know.” The key insight? Hallucination-like behavior is rewarded by most evaluation systems.


  2. Hallucinations are structurally incentivized Because LLMs are trained to perform well on benchmarks and leaderboards that value having some answer over noanswer, the models learn to guess. In other words, these are "test taker" models and on most tests, it’s better to guess than to leave a blank. "We observe that existing primary evaluations overwhelmingly penalize uncertainty," (page 3)


In the graph above I attempted to layout the miss-alignment between our expectations as users (having almost no hallucinations) and the incentives of the model which come down to: The model rather hallucinate than say it doesn't know the answer.


As the authors put it:

“Further reduction of hallucinations is an uphill battle, since existing benchmarks and leaderboards reinforce certain types of hallucination.” (Page 12, Section 4)


This is a strong statement. But while the framing is elegant, I think the conclusions risk being too convenient. It's basically blaming the test.


Here's where I feel the article falls short:


Hallucinations are not mysterious yet we can't define them

OpenAI admits that hallucinations remain poorly defined:

“It is difficult for the field to agree upon how to define, evaluate, and reduce hallucinations due to their multifaceted nature.” (Page 15, Section 5)


That ambiguity weakens the clarity of their argument. If we can’t fully agree on what a hallucination is, it’s a stretch to claim the phenomenon has been fully explained.


Society's is to blame

More importantly, pointing the finger at evaluation frameworks or “society” feels too convenient and a bit of a cop-out. Also, it's a kind of conclusion that basically means that OpenAI can't really do anything about because it societal.



Single reason for phenomenon

Yes, test-oriented incentives matter. But that doesn’t fully explain what we see in practice. I’ve personally seen “in-context” hallucinations, where models invent details even when the right information is in the prompt. I’ve also seen behavior that suggests hallucinating may be more efficient than retrieving or reasoning perhaps because guessing uses less computation or avoids latency penalties.


AI getting the date/day wrong. July 1, 2025 was a Tuesday
AI getting the date/day wrong. July 1, 2025 was a Tuesday

What I Wish They Explored

If evaluation incentives are truly the root cause, it raises a natural question: Has anyone built a non-“test taker” model? One that’s designed to reward caution over confidence? If so, how did it perform?


OpenAI suggests that “IDK” models ones that openly admit uncertainty might struggle in practice because users simply won’t accept them. And there’s some truth to that.


If a model has the ability to say “I don’t know,” (IDK) it could easily become overcautious flooding responses with IDKs, disclaimers to avoid risk. The models may also struggle in setting their own confidence level. If a confidence threshold is too high it could result in a flood of IDKs. Too low, and we’re back to unchecked hallucinations.


It’s a tricky balance.


As a user, if I ask a basic question and get an “I don’t know,” I lose trust. But when I pose a complex, nuanced question, I’d welcome a response like: “I’m not totally sure — but here’s my best guess.”


The real challenge: If hallucinations can't be eliminated can AI companies design a sustainable, trustworthy tradeoff between confidence and hallucination?


bottom of page