We have heard the beating of the AI drum for years.

Nodule detection. Fracture flagging. Hemorrhage triage. One finding, one organ, one algorithm. Useful tools, genuinely. But the pitch has been the same since 2016, and most of us have learned to tune it out.

It is worth it to accept that AI has made a significant change.

The AI most of us have encountered in clinical practice is a segmentation or detection model. One task, defined inputs, tractable validation. That is also why it could get FDA clearance. Nearly every cleared radiology AI product today fits this description.

A foundation model is built on a different architecture entirely. Trained on vast, diverse data. Capable of attempting a wide range of tasks from a single model. In radiology, the clearest current example is what Northwestern is doing: not flagging a finding, writing the entire report.

If you have used Google Maps, you know what a navigation algorithm looks like. It finds a route. It does that one thing well.

ChatGPT is different.

You describe a problem, any problem, and it attempts an answer across whatever territory the question covers. That is the architectural shift. One model, broad capability, natural language output.

That is what is arriving in radiology.

The failure profile is also different, and this is the part worth slowing down for. A segmentation model fails in one place. You know where to look. A foundation model can fail across the entire study, expressed in fluent, structured, confident prose. The output looks like expertise. Verifying it requires working across the same territory the model worked across, which is the whole scan.

How we incorporate these into workflow is the question radiology has not answered yet. Nobody has.

I wrote a book to share what I learned about this. It is on Amazon now, Kindle and paperback.

Link to book on Amazon: https://a.co/d/03QZA4WG

Most clinical AI systems are evaluated before deployment and assumed to perform the same in production. In reality, performance shifts across sites, scanners, populations, and workflows, and those shifts are rarely measured systematically.

If you are building or deploying AI in radiology, including VLM-based reporting or multi-model orchestration systems, you need a way to monitor real-world behavior continuously. This includes tracking disagreement with clinicians, identifying drift, and understanding failure modes over time.

Veriloop provides a vendor-agnostic observability layer for clinical AI. We sit downstream of your model and workflow, measuring performance where it matters: in production, across real cases, with real users.

This is not model evaluation. This is system monitoring.

Contact: ty@orainformatics.com

Menu