The End of Diagnostic Silos: What Foundation Models Mean for Radiology’s Scope

RADIOLOGY IN THE AGE OF AI & VLMS  |  ARTICLE 11 OF 14

For fifty years, radiology organized itself around organ systems and modalities. We trained in chest, or neuro, or MSK. We built expertise in a defined territory, and the clinical world around you organized referral patterns, call schedules, and practice economics to match that structure.

The AI being deployed right now performs opportunistic screening whether we asked it to or not.

This is happening and understanding it clearly matters for every radiologist trying to figure out where this technology is going and what it means for the work.

One Model, Two Cancers, One Scan

Opportunistic screening is not a new concept in radiology. The idea that a scan ordered for one indication might reveal actionable findings relevant to a different clinical question has been part of imaging practice for years, most visibly in the literature around coronary calcium scoring on non-gated chest CT, vertebral bone density on abdominal CT, and aortic aneurysm detection on scans obtained for other purposes. The radiology community has debated the evidence base, the reporting obligations, and the downstream cost implications of these findings for more than a decade.

What is new is the architecture performing that screening.

OMAFound is a foundation model trained to perform breast and lung cancer screening simultaneously on a single non-contrast chest CT.¹ One scan. One model. Two separate oncologic screening tasks running in parallel, across two organ systems, from imaging that was not ordered for either purpose. The silo assumption fails at the model level. That is not an incremental improvement on a single-task algorithm. It is a different design philosophy: broad scope as a first principle rather than narrow precision as a constraint.

BrainIAC extends the same logic to neuroimaging: a generalizable foundation model for brain MRI built to handle the breadth of what brain imaging actually involves, across neurological conditions, scanner parameters, and clinical settings, rather than a series of modular tools each optimized for a single finding.² Trained on nearly 49,000 brain MRIs across ten neurological conditions, it outperforms task-specific models particularly in low-data and high-difficulty settings.

The single-organ assumption, once treated as a technical necessity, is now being retired by design.

Merlin takes this architecture further still.³ A 3D vision-language model (VLM) for abdominal CT, trained on more than six million images paired with EHR diagnosis codes and radiology reports, evaluated across 752 individual tasks spanning diagnostic, prognostic, and quality-related work. One model. One scan type. Hundreds of clinical questions addressed simultaneously. Chest, brain, abdomen: foundation models built for broad scope are arriving across the full body in parallel, and the subspecialty structure that organized radiology training and practice for fifty years does not map onto that architecture in any straightforward way.

Note: Foundation model is used in this series as the broader term, of which vision-language models (VLMs) and large language models (LLMs) are the most clinically relevant examples in radiology.

The Accountability Gap Opportunistic Screening Creates

Here is the problem that the foundation models architecture creates before most practices are ready for it.

A patient presents with chest pain. The ordering physician requests a chest CT for pulmonary evaluation. A foundation models reads that scan and, in the course of performing its primary task, identifies imaging characteristics in the breast parenchyma that meet criteria warranting follow-up. It flags them. The radiologist receives the output.

Who is responsible for that finding?

The question sounds simple, but we know it is not. The CT was not ordered for breast evaluation. The radiologist’s attestation of the study may or may not have included review of incidental breast findings, depending on institutional protocol, prior training, and the practice’s expectation framework at the time the study was ordered. The AI found something the scan was not looking for. The radiologist signed a report on an indication the finding does not belong to.

This is the accountability gap that opportunistic screening by foundation models creates and the liability process has not caught up to this architecture.

The payer coverage rules have not been written around it. The ACR guidance on incidentalomas was developed in a world where most incidental findings came from radiologists trained to recognize them, not from models trained to find them across organ systems simultaneously.

Foundation models are generating a category of finding that exists in a governance gap: detected by the technology, surfaced to the radiologist, but outside the established accountability framework the practice was operating under when the study was ordered. That gap is going to require deliberate institutional policy decisions. Practices that wait for a formal ruling before addressing it will find themselves managing the problem reactively when a specific case forces the issue.

The Scope Opportunity, If Radiologists Choose It

There is a version of this development that works out well for radiology. It requires radiologists to lead the conversation rather than wait for it to arrive.

The debate around opportunistic screening has historically centered on whether individual findings are worth reporting, whether the evidence base for intervention is strong enough to justify the downstream workup burden, and whether the economic incentives are aligned with patient benefit. Those are legitimate questions and they will need to be answered at the institutional level for each new finding category that foundation models surfaces.

But the deeper question is about who is positioned to answer them.

If a foundation model can screen for findings across organ systems, the radiologist who understands what that model is doing, what it is likely to find, where its confidence degrades, and what the evidence base for its recommendations looks like across each of those tasks, is a radiologist who can function as a genuine clinical partner for any referring physician managing a complex patient. That is not a narrow service and becomes a consultative one.

The emerging view among radiologists working at the frontier of AI evaluation is that detection is not where the real opportunity lies. Staging and prognosis, the clinical questions that sit downstream of a finding, represent the territory where AI can add genuine decision support and where radiologist expertise becomes most difficult to replace. Foundation models and opportunistic screening expand that territory. They also raise the bar for what it means to oversee the work responsibly.

Radiologists who understand foundation models can define a broader clinical role. Radiologists who do not will find the role defined for them, by vendors, by administrators, and eventually by regulators who fill the governance vacuum when the profession does not.

What Governance Looks Like for a Model That Crosses Organ Lines

Monitoring a single-task algorithm is tractable. You define the clinical task, establish performance baselines, track drift against those baselines, and build an escalation pathway when performance falls outside acceptable bounds. Articles 8 and 9 of this series covered that infrastructure in detail.

Monitoring a foundation model performing opportunistic screening across multiple organ systems is a harder problem. The performance dimensions are not singular.

A model operating across two or three clinical domains simultaneously can be performing well on its primary task while drifting on a secondary one, and standard accuracy metrics will not surface the divergence unless the monitoring infrastructure was designed with that possibility in mind.

Each task domain requires its own baseline. Each clinical context the model is applied to requires its own validation lens. Each opportunistic finding category requires its own institutional policy for how it gets reported, communicated to the ordering physician, and tracked for follow-up.

This is not an argument against foundation model deployment. It is an argument for understanding what you are taking on when you deploy it. The observability requirements for a foundation model are not additive to the requirements for a single-task tool. They are structurally different, and most practices are not currently resourced to address them.

Where the Architecture Is Heading

OMAFound, BrainIAC, and Merlin are early instances of a model design philosophy that will expand. The subspecialty organization of radiology training and practice does not map onto that architecture in any obvious way, and that gap will only widen as the models get broader.

None of this means subspecialty expertise becomes irrelevant.

The opposite case is more likely: when a model is flagging opportunistic screening findings across organ systems, the radiologist’s job is to understand which of those flags are reliable, which require additional clinical context, which represent genuine incidental findings with management implications, and which are artifacts of a model operating outside the range of cases it was validated on.

That is sophisticated clinical judgment. It requires domain knowledge. It requires understanding the evidence base behind each task the model is attempting. It is not a job that gets easier as the models get broader.

It is, however, a job that looks very different from the one organized around reading within a subspecialty lane and signing out studies within a defined modality. Opportunistic screening performed by AI at scale, across organ systems, on studies ordered for entirely different indications, is not a niche use case. It is the direction the technology is pointing. The practices and radiologists who recognize that shift early enough to build the governance infrastructure around it are in a structurally different position than those who encounter it for the first time in a specific case that demands an answer.

What Is Coming Next – Article 12

Next week: A vendor just sent your group a validation study. AUC of 0.93, external cohort, peer reviewed. It is genuinely good work. But there is a question that study cannot answer, and that no vendor contract in radiology currently addresses: six months after go-live, if the model quietly degrades or starts producing reports that are internally consistent but clinically wrong, how would you know? Article 12 examines the gap between what vendors can measure and what clinical outcomes actually require, and what it means for every practice signing an AI contract right now.

Deploying clinical AI is not the hard part. Knowing whether it is actually working in your environment is.

Performance varies across sites, scanners, patient populations, and workflows. Over time, models drift, disagreement patterns shift, and small failures accumulate in ways that are rarely visible in standard dashboards or validation reports. This is where risk and missed opportunity both live.

Veriloop.health is a vendor-agnostic clinical AI observability layer that measures real-world performance after deployment. We quantify agreement, track error patterns, and monitor trust over time so teams can see where AI is reliable, where it is degrading, and how much workload it can safely support.

We do not replace pre-deployment validation or regulatory clearance. We make sure that what was approved continues to perform safely and effectively in practice.

If you are deploying or scaling clinical AI and need visibility into real-world performance, contact

ty@orainformatics.com

References

1 Liang, Niu, Wang, Han et al. (Xuejun Qian, corresponding author), “A Foundation Model for Breast and Lung Cancer Screening Using Non-Contrast Computed Tomography,” Nature Health, February 5, 2026. https://www.nature.com/articles/s44360-026-00055-8

2 Tak, Garomsa, Chaunzwa et al., “A Generalizable Foundation Model for Analysis of Human Brain MRI,” Nature Neuroscience, volume 29, pages 945–956, February 5, 2026. https://www.nature.com/articles/s41593-026-02202-6

3 “Merlin: A Computed Tomography Vision–Language Foundation Model and Dataset,” Nature, 2026. https://www.nature.com/articles/s41586-026-10181-8

Menu