Agentic AI in Radiology: From Co-Pilot to Autopilot

RADIOLOGY IN THE AGE OF AI & VLMS | ARTICLE 7 OF 14

Article 6 established what it takes to monitor a single AI tool operating within a defined clinical task. A deployment produces outputs; those outputs can be measured against performance thresholds; drift can be detected; the governance infrastructure can trigger a review. That is a tractable problem, and the field is beginning to build the scaffolding to address it.

The problem changes when AI stops generating a report and starts taking action.

From Output to Action

A co-pilot and an autopilot are not simply points on a spectrum of confidence. They are categorically different machines. A co-pilot presents options. An autopilot executes them. The difference that matters in clinical AI is not how accurate the system is or how fast it runs. The difference is whether a human being authorizes each consequential action before it happens.

Most radiologists who have encountered AI in clinical practice have experienced the prompt end of the spectrum: a system that takes an image as input and returns an output, and a human who decides what to do with that output. The radiologist reads the AI flag and decides whether to act. That is the architecture this series has been analyzing since Article 2, and it is the architecture for which our current governance frameworks were designed.

Agents are different. An agent is an AI system that uses a language model or VLM as its reasoning core, then wraps that core with the capacity to use tools, retain memory across steps, and execute multi-step plans without explicit human authorization at each handoff. In radiology, the concrete version could look like this:

The system identifies a pulmonary nodule, cross-references the prior CT from three years ago, calculates nodule growth rate, determines the patient meets Lung-RADS 4B criteria, orders a PET-CT through the EHR, generates a patient notification, and pages the referring pulmonologist, all before a radiologist has opened the study. Each individual step involves a capability that already exists somewhere in clinical AI. The agent is what happens when those capabilities are connected into a single autonomous workflow.

What Agents Are Already Doing in Medicine

Before the imaging-specific agent tier arrives, it is worth understanding what autonomous multi-step AI is already doing in adjacent clinical workflows, because the timeline people tend to imagine is shorter than they expect.

Prior authorization automation is one of the most widely deployed applications: an agent that reads a clinical note, identifies the proposed procedure, queries payer guidelines, determines if criteria are met, and submits the request without a human touching each step. Clinical documentation agents are in production at major health systems, pulling context from EHR notes, encounter records, and laboratory results to draft structured outputs. Scheduling optimization agents allocate imaging slots based on urgency flags, protocol requirements, and scanner capacity. Triage routing agents are directing patients through emergency workflows based on early vital sign patterns and chief complaint analysis.

These are not research systems. They are deployed, and in several cases they operate without any formal regulatory framework governing their actions, because the FDA’s existing clearance apparatus was built for software that produces outputs, not software that takes actions.¹

The Architecture Behind the Autonomy

Understanding what makes an agent different from a VLM is not academic. It is the prerequisite for knowing what governance actually needs to cover. As a reminder:

A standard VLM is a perception-to-language system: image in, text out. The model has been trained on image-report pairs, and at inference time it generates language conditioned on visual input. The radiologist reviews the output and makes all downstream decisions.

An agent built on top of that VLM adds three capabilities the base model does not have. The first is tool use: the ability to query external systems, write to databases, place orders, and send communications. The VLM that can only generate text becomes, with tool access, a system that can change the state of clinical operations. The second is memory: through retrieval-augmented generation, the agent can pull prior imaging studies, institutional protocols, patient history, and real-time laboratory values into its reasoning context before generating any output. The third is planning: the ability to decompose a multi-step objective, sequence its own actions toward that objective, and revise its plan based on intermediate results.

Together, these three additions transform a document-generating tool into a process-executing agent. The error modes are correspondingly different. A VLM that hallucinates a finding produces a report with an incorrect statement. An agent that acts on a hallucinated finding may place an order, notify a specialist, and trigger a billing event before anyone has reviewed the underlying error.¹

The TEFCA Connection

TEFCA, the Trusted Exchange Framework and Common Agreement, is the federal infrastructure that enables health records, including imaging studies, to flow between providers across the United States.

Article 10 of this series examines TEFCA and the national imaging graph in depth. But the relationship between agents and TEFCA is worth flagging here, because it is not a future concern. It is a present one.

An agent operating within a single institution’s data environment has a bounded scope. It can query PACS, access the local EHR, and route within the local clinical workflow. That is already a significant surface area for governance purposes. An agent operating with access to the QHIN infrastructure, the national imaging graph of X-rays, MRIs, and CT scans flowing across eleven-plus designated networks and tens of thousands of provider connections, is categorically more powerful and categorically more consequential if it acts on that data without appropriate human oversight at each decision point.²

TEFCA’s data-sharing framework was designed with the assumption that human clinicians make decisions based on the records being exchanged. It was not designed for AI agents that can query, synthesize, and act on that data autonomously at scale. The legal ambiguity around who can use QHIN-accessible imaging data to train AI models is a governance gap. An agent that acts on QHIN-accessible imaging data without explicit human authorization at each material decision point is a larger one.

The Governance Vacuum

The FDA’s regulatory framework for software as a medical device was built to evaluate tools that produce outputs: a system that flags a finding, recommends an action, generates a report. The underlying regulatory logic assumes a human clinician stands between the software output and the clinical consequence. The software assists. The clinician decides.

Agentic AI disrupts that logic structurally. If an agent can order a study, page a specialist, and update the medical record without direct physician authorization at each step, the human is no longer interposed between the AI output and the clinical consequence. In the current regulatory environment, that situation has no clear framework. There is no FDA pathway specifically designed for autonomous multi-step clinical workflows. There is no defined standard of care for the radiologist who is technically responsible for outcomes in a workflow they did not individually authorize at each stage. There is no vendor liability standard for actions taken by an agent operating downstream of the radiologist’s last explicit decision point.

The ACR/ESR/RSNA multi-society statement on AI evaluation provides a professional baseline for what governance of AI tools should look like.³ But that document was written primarily with single-task AI deployment in mind. The multi-step agent tier requires a governance framework the field has not yet built. And the gap is not theoretical: agents are already running in clinical-adjacent workflows while that framework remains absent.

At ECR 2026 in Vienna, Dr. Hugh Harvey raised a dimension of this problem that deserves direct attention. Harvey flagged that AI embedded within quality assurance processes is itself classified as high-risk under the EU AI Act and requires its own monitoring layer.⁴ The practical consequence is striking: the agentic future will require AI systems designed to oversee other AI systems. That is a second-order governance problem. We are still working on the first-order one.

Safety Design Principles for Agentic Radiology

The absence of a regulatory framework is not a reason to wait. It is a reason to build one locally, at the practice and department level, before the agents arrive and the rules are being written around existing deployments rather than sound clinical principles.

Four design principles offer a starting point for any practice beginning to think through agentic governance.

The first is human checkpoints at defined decision nodes. Not every step in a multi-step workflow requires a radiologist’s explicit authorization, but the steps that carry material clinical consequence do. Defining which those are, in advance, is the governance work. The second is comprehensive audit logs for every tool call. An agent that can query, write, communicate, and order creates an action trail that is simultaneously a liability record, a quality improvement dataset, and a monitoring substrate. That trail needs to exist and be reviewable. The third is scope limitation: a formal policy specifying what an agent can do without explicit approval, what it can do with approval, and what it cannot do regardless of the instruction it receives. Scope limitation is how you prevent an agent optimized for efficiency from taking actions the radiologist would not have authorized if asked. The fourth is graceful failure modes: what the agent does when it encounters ambiguity, contradiction, or a scenario outside its training distribution. A well-designed agentic system does not act when uncertain. It escalates.

None of these principles is technically complex. Each of them requires clinical leadership to define the parameters, because the parameters are clinical judgments, not engineering ones.¹

The Radiologist’s Moment

There is a version of the agentic future in which radiologists are observers: systems are designed by engineers, deployed by health systems, governed by regulators, and radiologists adapt to the environment they inherit. That version is not inevitable. It is simply what happens if radiologists are not in the rooms where the design decisions are made.

The clinical expertise required to define what an agent should and should not do, which decisions require human authorization before an action is taken, which findings warrant immediate escalation versus scheduled follow-up, which workflow steps are appropriate to automate and which are not: that expertise resides with radiologists. It does not reside with engineers, vendors, or regulators working alone.

The radiologists who define the scope limitation policies and the checkpoint architectures for agentic radiology workflows will shape how this technology is deployed across the field.

The Findings Checklist Workflow introduced in Article 2 is the right design pattern for thinking about this. The accept/reject interface for individual findings is a human-gated agent interface, not just an editing tool. Each radiologist decision is a supervision signal. The human is in the loop at every finding, not every step, which is the correct balance of efficiency and oversight in any well-designed agentic system. Building those authorization points into the interface from the start is how you build an agentic system that remains clinically accountable as the autonomy increases.

Up Next in Article 8:

The governance challenges in this article center on agentic systems operating within defined institutional workflows. The next dimension of the problem is what happens when those systems carry biases that are invisible at deployment and compound over time, particularly in multimodal architectures integrating imaging, pathology, genomics, and laboratory data simultaneously. Article 8 examines how bias, drift, and collision risk behave in that environment, and why a better pre-deployment audit is not the answer.

Clinical AI does not fail at deployment. It fails quietly after deployment.

If you are building or deploying radiology AI, especially vision-language models or automated reporting systems, post-deployment monitoring is no longer optional. Regulators, including the FDA, increasingly expect continuous assessment of real-world performance, including drift, site variability, and human-AI disagreement.

Veriloop.health is a vendor-agnostic clinical AI observability layer designed to measure how models behave in real workflows. We track agreement, error patterns, and trust over time, helping teams understand where AI is reliable, where it breaks, and how much workload it can safely absorb.

We do not replace regulatory approval or pre-deployment validation. We provide the missing layer after deployment, where safety, performance, and trust actually evolve.

If you are building, deploying, or evaluating clinical AI systems and need real-world monitoring, reach out at ty@orainformatics.com

The governance layer for agentic radiology does not exist yet at most institutions. If you are building it, evaluating vendors or trying to understand what questions to ask before it arrives in your workflow, I am glad to hop on a call.

References

1. RAISE: Radiology AI Safety, an End-to-End Approach. arXiv Preprint, 2023. https://arxiv.org/pdf/2311.14570

2. TEFCA Final Rule: ASTP Codifies QHIN Requirements. McDermott Will & Emery, January 2025. https://www.mwe.com/insights/astp-final-rule-codifies-requirements-for-tefca-qualified-health-information-networks/

3. Brady, Allen, Chong, Kotter, Kottler, Mongan, Oakden-Rayner, Pinto dos Santos, Tang, Wald, Slavotinek. “Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: ACR/CAR/ESR/RANZCR/RSNA Multi-Society Statement.” Radiology: AI, 2024. https://pubs.rsna.org/doi/full/10.1148/ryai.230513

4. Tschabuschnig. “ECR: Ethical AI in Radiology — Why Safety Begins After Deployment.” AuntMinnie Europe, March 7, 2026. ECR 2026 session coverage: Harvey on EU AI Act high-risk classification; Van Leeuwen on postmarket surveillance; Shelmerdine on responsible withdrawal and moral injury. https://www.auntminnieeurope.com/resources/conferences/ecr/2026/article/15818943/ecr-ethical-ai-in-radiology-why-safety-begins-after-deployment

Post Views: 183