Tidalwave and Columbia University’s DAPLab Release First Public Benchmark for AI Accuracy in Mortgage Origination
Joint study finds Tidalwave’s mortgage-trained SOLO scored 95% on underwriting compliance checks where generic LLM scored 42%
The study tested Tidalwave’s SOLO against Anthropic’s Claude 4.5 on 90 questions that loan officers routinely ask during the origination process: Does the payroll match the stated employer? Are there buy-now-pay-later payments? Could any deposits be from a foreign source? Is there undisclosed income?
On boolean verification questions, the type that flag payroll mismatches, undisclosed liabilities, and suspicious transactions, Tidalwave’s SOLO scored 95% versus 42% for the baseline model.
Tidalwave’s SOLO scored 84% overall accuracy compared to 71% for Claude 4.5. The widest gap was on yes/no compliance checks, the questions that determine whether a loan gets flagged or approved.
|
Benchmark results |
||
|
Question Type |
Tidalwave’s SOLO |
Anthropic’s Claude 4.5 |
|
Yes/no compliance checks |
95% |
42% |
|
Transaction identification |
83% |
80% |
|
Account verification |
67% |
86% |
|
Overall accuracy |
84% |
71% |
|
Source: Tidalwave / Columbia University Benchmark Technical Report, 2025–2026. Measured by F1 score across 90 questions and 10 borrower scenarios. |
||
Why the compliance gap matters
Yes/no compliance checks are the backbone of loan quality review. They’re the questions that catch payroll mismatches, undisclosed debts, suspicious deposit patterns, and structurally inconsistent bank statements. A 42% accuracy rate means a general-purpose model produces the wrong answer more often than the right one on exactly the questions where errors lead to bad loans, compliance violations, or missed fraud.
The gap exists because general-purpose models process a loan file as raw text. Tidalwave’s SOLO is integrated with
Tidalwave’s SOLO scored lower than Claude 4.5 in one category, account verification (67% vs. 86%). The company attributes this gap to its practice of stripping personally identifiable information (PII) from SOLO-enabled AI interactions and says its next-generation capability is designed to close that performance gap while safeguarding sensitive data.
Why this benchmark matters now
Loan officers across the
“42% on compliance questions should worry every lender relying on off-the-shelf AI right now," said
Yu previously co-founded FreeWheel, a video ad-tech company acquired by Comcast for
Study methodology
The benchmark was conducted in fall and winter 2025 as a collaboration between Tidalwave’s engineering team and researchers at
A mortgage industry subject matter expert designed all questions from actual Tidalwave’s SOLO usage patterns. The benchmark intentionally included edge cases: foreign transactions, mismatches between bank statements and applications, and deposits from lesser-known vendors, to test agents under realistic conditions. Performance was measured using F1 score, a standard accuracy metric that gives partial credit for partially correct answers on list-type questions and binary scoring on yes/no questions.
“We partnered with Tidalwave on this benchmark to reflect the actual decision points loan officers face during origination, not abstract NLP tasks,” said
The full technical report, including dataset statistics and failure mode analysis, is available here.
About Tidalwave:
Tidalwave is an agentic AI platform that automates the full mortgage lifecycle, from application through closing. The company integrates directly with Fannie Mae DU and Freddie Mac LPA, along with verification partners Plaid, Argyle, and Truv. Lenders on the platform have automated up to 70% of manual tasks, cut processing from 45 days to under 15, and saved up to
About Columbia University’s
DAPLab
: The Data, Agents, and
View source version on businesswire.com: https://www.businesswire.com/news/home/20260317118247/en/
tgillogley@tidalhq.com
Source: Tidalwave