The First Mortgage AI Accuracy Test: Tidalwave’s SOLO VS. Anthropic’s Claude 4.5 Skip to main content

The First Mortgage AI Accuracy Test: Tidalwave’s SOLO VS. Anthropic’s Claude 4.5

Mar 17, 2026
SOLO vs Claude
Associate Editor

Mortgage AI showed stronger performance than a general-purpose model in a new benchmark, particularly in underwriting accuracy and compliance checks

Tidalwave and Columbia University’s DAPLab released results of the first public benchmark measuring AI accuracy on real mortgage origination tasks. The two contenders, Tidalwave’s mortgage-trained SOLO and Anthropic’s Claude 4.5, were prompted with realistic questions loan officers typically ask during loan origination.

The joint study found that loan officers using SOLO received more accurate answers to underwriting questions than when using general-purpose LLMs such as Anthropic’s Claude 4.5.

Both systems were evaluated on 90 questions across 10 synthetic borrower scenarios designed to reflect issues loan officers face during origination, including payroll mismatches, undisclosed liabilities, and suspicious deposits.

According to the benchmark, SOLO scored 84% overall accuracy, compared with 71% for Claude 4.5.

The widest gap appeared on yes-or-no compliance checks, where SOLO scored 95% and Claude 4.5 scored 42%, according to the study. On transaction identification questions, SOLO scored 83% versus 80% for Claude 4.5. However, Claude 4.5 scored higher on account verification, 86% to SOLO’s 67%.

Tidalwave Benchmark

Why The Compliance Gap Matters

The study said yes-or-no compliance checks are central to loan quality review because they are used to flag issues such as payroll mismatches, undisclosed debts, suspicious deposit patterns, and structurally inconsistent bank statements. At a 42% accuracy rate, the benchmark said, a general-purpose model produced the wrong answer more often than the right one on those questions.

According to the release, the gap reflects differences in how the systems process mortgage data. The study said general-purpose models treat a loan file as raw text, while Tidalwave’s SOLO is integrated with Fannie Mae and Freddie Mac underwriting systems and trained on structured mortgage data, including Uniform Loan Application Dataset files and bank statement transaction records.

The study's findings arrive just a few weeks after Fannie Mae, Freddie Mac and the Federal Housing Finance Agency severed ties with Anthropic’s AI tools, including Claude, amid a broader federal phaseout of the company’s technology.

Tidalwave said SOLO’s lower score on account verification was tied to its practice of stripping personally identifiable information from AI interactions. The company said a next-generation capability is intended to improve performance in that category while maintaining data protections.

“Forty-two percent on compliance questions should worry every lender relying on off-the-shelf AI right now," said Diane Yu, co-founder and CEO of Tidalwave. "When I was building technology at Better.com, I watched general-purpose tools fail on mortgage data over and over. They'd miss a payroll mismatch or fail to recognize a deposit, and a human had to catch it every time. That's why we built Tidalwave’s SOLO differently, and that's why we tested it with Columbia University, not internally. If you're going to tell lenders your AI is accurate, you should be willing to prove it publicly."

The First Benchmark Measuring AI Accuracy

The benchmark was conducted in fall and winter 2025 by Tidalwave engineers and Columbia researchers. Researchers built the test using complete synthetic loan application files and up to two months of synthetic bank statement transaction data. Performance was measured using F1 score, which the study said gives partial credit for partially correct answers on list-type questions and binary scoring on yes-or-no questions.

According to the release, the benchmark was designed around actual usage patterns and included edge cases such as foreign transactions, mismatches between bank statements and applications, and deposits from lesser-known vendors. 

“We partnered with Tidalwave on this benchmark to reflect the actual decision points loan officers face during origination, not abstract NLP tasks,” said Zhou Yu, associate professor at Columbia University.

Loan officers across the U.S. are already using general-purpose AI tools to work through loan files on 43-day closing timelines, the joint press release noted, meanwhile the average lender loses $600+ per loan originated. Lenders are under increasing pressure to adopt AI, but until now, no public benchmark has measured whether the AI tools lenders are adopting actually produce accurate answers on the questions that matter for loan quality. 

 

About the author
Associate Editor
Katie Jensen is a mortgage news reporter at NMP.
Published
Mar 17, 2026
Bay Area Buyers Bring Bigger Down Payments As AI Wealth Grows

New Realtor.com report suggests AI-driven wealth is reshaping competition for homes across California's most expensive markets

Jun 08, 2026
Home Sales Climb To Highest Level Since 2022

Closed transactions reflected April's lower mortgage rates, while flat pending sales offered an early warning that higher borrowing costs are weighing on buyers again

Jun 08, 2026
Mortgage Fraud Risk Falls In Q1

Cotality says fraud indicators appeared in one out of every 129 mortgage applications, though investor and multifamily loans continued to carry elevated risk

Jun 07, 2026
Most Prospective Homebuyers Fail Basic Mortgage Quiz

Survey of first-time buyers reveals major knowledge gaps around mortgages, closing costs, and the homebuying process

Jun 05, 2026
Foreclosure And Employment Trends Signal Housing Risk

County-level data reveals where market conditions may be most vulnerable to future price declines

Jun 05, 2026
Homebuyer Down Payments Slip To 15%

Redfin says buyers are keeping more cash on hand as affordability pressures persist and bidding wars ease

Jun 04, 2026