US GAAP test ChatGPT vs. Accountant GPT

Niv Nissenson
Jun 27, 2025
2 min read

Updated: Jun 29, 2025

I wanted to see how well ChatGPT handles basic accounting questions—especially as they become slightly more complex. I also wanted to compare responses between the standard ChatGPT experience (model 4o) and a specialized GPT. For this experiment, I used the “Accountant GPT”.

Baseline Test: Straightforward Revenue Recognition

I began with a simple scenario:

On 6/15/2025, I delivered services to a customer and issued an invoice for $1,000. As of 6/30/2025, the invoice remains unpaid. Question: What would be the correct accounting for this transaction in both Profit and Loss statement and Balance sheet for 6.30.2025?

Both the standard ChatGPT and the Accountant GPT answered accurately. They clearly explained that the revenue should be recognized when earned (on 6/15/2025) and not when cash is received, in accordance with accrual accounting. Each also provided journal entries to illustrate the treatment.

Added Complexity: Services Delivered, Not Yet Invoiced

Next, I removed the invoicing component from the scenario. Both models correctly maintained that under US GAAP, revenue is recognized when earned, not when invoiced. Again, solid answers from both.

Escalation: Bill-and-Hold Scenario

Then, I introduced a bill-and-hold arrangement. This is where the differences became more apparent.

Both models correctly went to ASC 606 to base their answers and justifications but in interpreting 606 they diverged in nuance and structure:

Standard ChatGPT identified four conditions for recognizing revenue under a bill-and-hold arrangement.
Accountant GPT listed five conditions, aligning more closely with expanded interpretations.

Chat GPT and Accountant GPT Bill and Hold recognition steps

And here's the original ASC 606-10-55-83

Interestingly, ChatGPT’s version appears more aligned with the core language of ASC 606-10-55-83, which outlines the specific criteria for bill-and-hold revenue recognition. As any accountant knows, details and phrasing can matter a lot in applying US GAAP:

ChatGPT stated that “a substantive reason” is needed—like a customer requesting the delay.
Accountant GPT treated “customer request” and “substantive reason” as two separate requirements. Also it indicated that the customer has to have requested the delay and that is not in and of itself a substantive reason.
ChatGPT said the goods must be “separately identified,” while Accountant GPT only said they need to be “identified.”
As noted ChatGPT was very close to the ASC but the wording shifted slightly from the original ASC standard, which could affect interpretation. The original ASC 606 phrase is:“The entity cannot have the ability to use the product or to direct it to another customer.”ChatGPT paraphrased this as:“The entity cannot use the goods…”That subtle difference—cannot have the ability vs. cannot—could matter in legal or audit contexts.

Next Steps: More Complex Challenges Ahead

Overall, I was impressed. Both models handled the fundamentals well, and ChatGPT in particular appears to anchor its answers more directly in source material like ASC 606. I’ll continue challenging both ChatGPT, specialized GPTs and other models with more complex accounting and financial reporting questions to see where the cracks begin to show.

As always with new technology: trust, but verify.

— Niv Nissenson

The CFO AI All posts

US GAAP test ChatGPT vs. Accountant GPT

Recent Posts

A Finance Executive's AI Journey