The AI Trial Balance Trial: low effort, refusals and hallucinations
- Niv Nissenson
- Nov 13
- 3 min read

The 3 AI chat bots and accounting: The low effort, the won't even try and the try-hard.
As part of my ongoing exploration into how AI can assist finance executives, I’ve tested models on consolidations, data entry, and Excel AI add-ons. This time, I wanted to see if AI can handle another basic accounting tasks: turning a trial balance into a properly formatted P&L and balance sheet.
On the surface, this should be simple for a computer as it's almost pure logic. Trial balances summarize debits and credits, all that’s left is classifying accounts into the right financial sections of the statements (the classifications is also a given and doesn't need interpretation). However, for 3 of the major AIs it wasn't uneasy, it was impossible.
The Test Setup
I started with a real life clean and short trial balance: 52 classified accounts in debit/credit format, nothing fancy. I asked each major AI model to produce a full P&L and balance sheet.
ChatGPT: Confident, Wrong, and Low Effort
ChatGPT began promisingly, asking smart questions about my account numbering system. Then it produced its very low effort “reports”:
A P&L with three lines — revenue, expenses, and net income.
A balance sheet with three lines — assets, liabilities, and equity.
And it didn’t flow the net income into equity and instead offered this reassuring note:
“The small rounding difference matches the P&L net income, which means your trial balance ties out correctly.”
The “rounding difference” was $1,000, maybe it's not big but it's no rounding diffrence.
When I asked for a properly formatted report, ChatGPT re-generated the same data — this time splitting it into a “P&L trial balance” and a “balance-sheet trial balance.” When prompted again, it lumped all expenses together under one category and ignored my account classifications entirely.


Verdict: Looked confident, delivered little. Failed the task.
Gemini: Won't even try
Gemini was more transparent. It immediately explained that it wouldn’t attempt to build full reports but could walk me through how to do it. Technically honest, but not helpful if you’re testing automation.

Verdict: Next!
Claude: Try-Hard but failed harder
Claude, on the other hand, tried to actually do the job — and at first glance, it seemed to understand accounting flow better than the others. It even recognized that net income should flow to equity.

Then things went off the rails with major hallucinations and errors.
Claude invented new accounts and changed existing ones:
Added a mysterious COGS account with $644,000 that never existed.
Re-labeled my AR account (11050) as 21050 — converting it from an asset to a liability.
Fabricated new account numbers like 21150 for inventory, out of thin air.
Departed from GAAP structure entirely.
Not surprisingly, the resulting balance sheet didn’t balance.
To its credit, Claude noted this itself:
“The balance sheet doesn’t balance — you should investigate further.”


When I called out the wrong accounts, Claude apologized and produced a new report… which was still wrong. It even “verified” its own corrections while leaving the same fake inventory account untouched.

At the end Claude needed me to tell if a specific account is actually in the trial balance. It wasn't, but this underscores how poorly AI is doing in these detailed orientated tasks. It doesn't know if it fabricated something or not!
Verdict: Earnest effort, catastrophic execution.
The Results: A Pattern Emerges
Model | Approach | Result |
ChatGPT - The low effort | Confidently simplified everything | Failed — incomplete and inaccurate |
Gemini - Won't try | Refused to execute, explained theory | Not useful for automation |
Claude - Try hard failure | Tried to perform end-to-end accounting | Failed with fabricated data and logic errors |
What This Means for Finance Teams
This test aligns with what we’ve seen in earlier experiments on consolidations and data integrity: general-purpose AI is not ready for real accounting.
Even when AI “knows” the logic of a task, it often applies it inconsistently, misreads numeric context, or confidently fills gaps with fabricated data.
In accounting, that’s not a small problem it’s an existential one. Numbers must tie. Trial balances must balance. You can’t “hallucinate” your way to GAAP.
AI models today are powerful text generators, not accountants. They can help explain, summarize, or assist with logic but when it comes to building accurate financial statements, they’re unreliable.
If you want to automate financial reporting, you need specialized finance tools built for accounting, based on structured data, preferably your ERP.
I’ll plan to keep testing maybe Elkar or domain-specific models will fare better. But for now, it’s clear: General AI chat bots may seem to speak accounting, but they don't yet understand it.



