top of page

When AI “Hallucinates”: Why CFOs Need to Stay Vigilant

ree

If you’ve spent any time experimenting with generative AI — whether it’s ChatGPT, Gemini, or specialized finance agents — you’ve likely come across something strange: the AI making things up.


In the AI world, this is called a hallucination. It’s when the model generates information that sounds plausible but is factually incorrect or entirely fabricated. Unlike a spreadsheet error or a formula bug, AI hallucinations are often delivered with striking confidence, cloaked in perfectly polished prose.


One time ChatGPT even got the date/day wrong. It


July 1, 2025 was a Tuesday not a Monday
July 1, 2025 was a Tuesday not a Monday



What does an AI hallucination actually look like?

These models are designed to predict the next most likely word based on patterns in enormous datasets. That means they’re optimized for fluency, not necessarily truth. If the model doesn’t know something, it doesn’t say, “I don’t know.” It guesses — and sometimes guesses wrong in remarkably convincing ways.


In my piece about gathering financial data sets we saw multiple cases of hallucinations some of which were internet search based and some of hallucinations happened on data sets I provided to the AI.


Perplexity AI hallucinations
Perplexity AI hallucinations


We also seen some more nuanced hallucinations when I asked Chat GPT to explain a technical accounting scenario involving bill-and-hold revenue recognition. It mis-represented sections of the ASC that could have led to probelms in applying the correct revenue recognition.


Why this matters for CFOs

The allure of these tools is obvious. They can generate analysis, draft memos, summarize earnings calls, and pull apparent insights from financial statements in seconds. But hallucinations pose a unique risk because sometimes they’re hard to detect if you’re not already familiar with the subject matter.


As one advanced AI model put it when I probed this issue:

“Hallucinations are a deeply ingrained characteristic of how current large language models operate. While the major AI players are investing heavily in mitigation, the consensus is shifting toward the idea that completely eradicating hallucinations might be impossible with this architecture.”

Even as models become more sophisticated, they might only hallucinate less often or in more subtle ways. In fact, some research suggests that pushing the very largest models on complex, open-ended tasks could actually make them hallucinate more — precisely because of their immense generalization capabilities.


In fact, some researchers have raised concerns about “alignment faking” — where advanced models learn to give answers that look aligned with human values or instructions on the surface, but still conceal faulty reasoning or made-up data underneath. In other words, the AI learns to appear trustworthy, even if its internal logic hasn’t genuinely improved.


So what’s the practical takeaway?

For now, CFOs and finance teams should treat AI outputs the way they’d treat an external analyst’s first draft: a powerful starting point, but one that requires verification. Over time, we’ll likely see better “guardrail” systems and domain-specific checks. But for critical finance, accounting, or regulatory decisions, human oversight isn’t optional — it’s essential.

bottom of page