The First Mortgage AI Accuracy Test: Tidalwave’s SOLO VS. Anthropic’s Claude 4.5

Mar 17, 2026

Associate Editor

Mortgage AI showed stronger performance than a general-purpose model in a new benchmark, particularly in underwriting accuracy and compliance checks

Tidalwave and Columbia University’s DAPLab released results of the first public benchmark measuring AI accuracy on real mortgage origination tasks. The two contenders, Tidalwave’s mortgage-trained SOLO and Anthropic’s Claude 4.5, were prompted with realistic questions loan officers typically ask during loan origination.

The joint study found that loan officers using SOLO received more accurate answers to underwriting questions than when using general-purpose LLMs such as Anthropic’s Claude 4.5.

Both systems were evaluated on 90 questions across 10 synthetic borrower scenarios designed to reflect issues loan officers face during origination, including payroll mismatches, undisclosed liabilities, and suspicious deposits.

According to the benchmark, SOLO scored 84% overall accuracy, compared with 71% for Claude 4.5.

The widest gap appeared on yes-or-no compliance checks, where SOLO scored 95% and Claude 4.5 scored 42%, according to the study. On transaction identification questions, SOLO scored 83% versus 80% for Claude 4.5. However, Claude 4.5 scored higher on account verification, 86% to SOLO’s 67%.

Why The Compliance Gap Matters

The study said yes-or-no compliance checks are central to loan quality review because they are used to flag issues such as payroll mismatches, undisclosed debts, suspicious deposit patterns, and structurally inconsistent bank statements. At a 42% accuracy rate, the benchmark said, a general-purpose model produced the wrong answer more often than the right one on those questions.

According to the release, the gap reflects differences in how the systems process mortgage data. The study said general-purpose models treat a loan file as raw text, while Tidalwave’s SOLO is integrated with Fannie Mae and Freddie Mac underwriting systems and trained on structured mortgage data, including Uniform Loan Application Dataset files and bank statement transaction records.

The study's findings arrive just a few weeks after Fannie Mae, Freddie Mac and the Federal Housing Finance Agency severed ties with Anthropic’s AI tools, including Claude, amid a broader federal phaseout of the company’s technology.

Tidalwave said SOLO’s lower score on account verification was tied to its practice of stripping personally identifiable information from AI interactions. The company said a next-generation capability is intended to improve performance in that category while maintaining data protections.

“Forty-two percent on compliance questions should worry every lender relying on off-the-shelf AI right now," said Diane Yu, co-founder and CEO of Tidalwave. "When I was building technology at Better.com, I watched general-purpose tools fail on mortgage data over and over. They'd miss a payroll mismatch or fail to recognize a deposit, and a human had to catch it every time. That's why we built Tidalwave’s SOLO differently, and that's why we tested it with Columbia University, not internally. If you're going to tell lenders your AI is accurate, you should be willing to prove it publicly."

The First Benchmark Measuring AI Accuracy

The benchmark was conducted in fall and winter 2025 by Tidalwave engineers and Columbia researchers. Researchers built the test using complete synthetic loan application files and up to two months of synthetic bank statement transaction data. Performance was measured using F1 score, which the study said gives partial credit for partially correct answers on list-type questions and binary scoring on yes-or-no questions.

According to the release, the benchmark was designed around actual usage patterns and included edge cases such as foreign transactions, mismatches between bank statements and applications, and deposits from lesser-known vendors.

“We partnered with Tidalwave on this benchmark to reflect the actual decision points loan officers face during origination, not abstract NLP tasks,” said Zhou Yu, associate professor at Columbia University.

Loan officers across the U.S. are already using general-purpose AI tools to work through loan files on 43-day closing timelines, the joint press release noted, meanwhile the average lender loses $600+ per loan originated. Lenders are under increasing pressure to adopt AI, but until now, no public benchmark has measured whether the AI tools lenders are adopting actually produce accurate answers on the questions that matter for loan quality.

Artificial Intelligence

fintech

loan origination system (LOS)

About the author

Associate Editor

Katie Jensen is a mortgage news reporter at NMP.

Link copied

Published

Mar 17, 2026

Non-QM

Fairway Sees Two To Three More Monthly Closings In Non-QM

The retail lender is expanding its alternative-income products and originator training to capture borrowers outside agency guidelines

Fairway Home Mortgage is expanding its Non-QM platform and loan officer training as the retail lender reports growing demand from borrowers whose income does not meet traditional agency documentation requirements.Fairway’s expanding product menu allows qualified borrowers to...

Investor Loans

Truss Details DSCR HELOC For Rate-Locked Investors

The brokerage’s investment-property credit line offers up to $1 million and permits qualification with a DSCR as low as 0.75

Truss Financial Group is expanding its home equity strategy with a DSCR-based line of credit designed to help real estate investors tap rental-property equity without replacing low-rate first mortgages.The California-based mortgage brokerage said its DSCR HELOC offers credit...

Oct

Denver, CO

More from

Analysis and Data

Higher Mortgage Rates Push Pending Home Sales Lower In June

Contract signings fell 5.4% from May as elevated borrowing costs and record home prices continued to pressure affordability, particularly for first-time buyers

Analysis and Data

Jul 20, 2026

Short Sales Now Recover More Value Than Foreclosures

Realtor.com finds short-sale activity accelerating, though the transactions represented just 0.6% of typical home sales in 2025

Analysis and Data

Jul 17, 2026

Chrisman: Why Do Mortgage Rates Care About Inflation?

When prices rise, bond values fall — here’s the mechanics behind why inflation drives mortgage rates higher

Analysis and Data

Jul 15, 2026

AD Mortgage Closes Fifth Non-QM Securitization Of 2026, Betting Big On Geographic Diversification

A $432.4 million deal backed by over 1,000 loans shows investors are still hungry for Non-QM paper — but the real story is where the loans are coming from

Analysis and Data

Jul 15, 2026

Mortgage Apps Fall As Rates Hit Highest Level Since August 2025

Purchase demand softened while refinance activity continued to show resilience despite higher borrowing costs

Analysis and Data

Jul 15, 2026

Foreclosure Inquiries Reach Highest Level Since 2020

LegalShield points to rising homeowner distress following the expiration of pandemic-era FHA relief programs

Analysis and Data

Jul 14, 2026

Popular