Flatiron Health Publishes First Peer-Reviewed Validation Framework for AI-Extracted Real-World Oncology Data in Journal of Clinical Oncology
Three-pillar framework sets methodological benchmark for data quality and transparency in oncology data
As large language models emerge as a tool for clinical data extraction from sources such as electronic health records, the industry faces a tradeoff—AI can unlock speed and scale, but it requires rigorous validation. Flatiron’s VALID Framework makes real-world data quality transparent and measurable, enabling evidence that meets the bar for high-stakes clinical decisions. Specifically, the framework applies a rigorous, three-pillar approach: variable-level performance metrics that benchmark LLM extraction against expert human abstraction; automated verification checks that systematically identify logical inconsistencies and implausibilities in data; and replication and benchmark analyses that confirm LLM-extracted results replicate established clinical findings.
"By publishing this framework transparently, we hope to contribute to raising the bar across the industry," said
Flatiron's LLM-extracted data builds on the highest-quality, human-abstracted real-world oncology data. By combining AI with expert human abstraction, Flatiron delivers gold-standard data quality at scale without trading off the clinical rigor that makes it fit for use in the highest-stakes decisions in cancer care and drug development. Every LLM-enabled dataset is subject to the VALID Framework, alongside long term clinical and scientific oversight to ensure data that captures complete patient journeys and validated outcomes.
"The VALID Framework, combined with our robust clinical and methodological expertise, gives us—and our customers—a clear basis for evaluating whether efficiency and accuracy go hand in hand, as well as confidence in clinical and strategic decisions made using real-world data," said
Read the full publication: Estevez M, Singh N, Dyson L, et al. Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework. JCO Clin Cancer Inform. 2026. https://ascopubs.org/doi/10.1200/CCI-25-00215
About Flatiron
View source version on businesswire.com: https://www.businesswire.com/news/home/20260420057302/en/
press@flatiron.com
Source: