Liability, Accountability, and the Absent Audit Trail
RADIOLOGY IN THE AGE OF AI & VLMS  |  ARTICLE 9 OF 14

When an AI-assisted read produces a missed finding, the first question in the room will not be whether AI was involved. Everyone will already know that. The question will be whether you can prove how you used it, what it flagged, what you did with that flag, and why. Most practices cannot answer any of those questions today. Our workflows were not built for it.

Between 2022 and 2024, malpractice claims involving AI tools increased 14% across medicine. Radiology accounted for the majority. The legal infrastructure to handle them barely exists, and the case law being written right now will define the standard of care for every AI-assisted read you sign for the next decade.¹

We Are Still Responsible.

We know that FDA clearance does not transfer liability to the vendor.  Neither does the presence of an AI flag in your worklist. Neither does a contract clause buried in your enterprise AI agreement.

The FDA clears AI devices through the 510(k) pathway or the De Novo process. Both demonstrate substantial equivalence to a predicate device or establish a novel classification with acceptable risk. What neither pathway requires is proof that the tool performs well in your patient population, on your scanner, with your case mix, on the day you are using it.

Clearance is a regulatory threshold, not a clinical validation.

It does not establish what the device can do in your specific deployment context.²

Prof. Gleeson of Oxford University Hospitals NHS Foundation Trust cut through the indemnification debate plainly at ECR 2026: postmarketing surveillance by developers “doesn’t work well,” and placing legal accountability for failures onto AI companies is roughly analogous to suing social media platforms for what individual users post. It almost never holds up in court. His operating principle was unambiguous: “If you act on an algorithm and something goes wrong, you will be responsible, not the company that developed it.”³

He extended the accountability argument further with an analogy that deserves wider circulation. Consider the controversy around unregulated “physician associates” in the UK and the patient harm that followed from their unsupervised practice. Now substitute “other people using AI” for “physician associates.” Without a national training body, without a national training exam, without audit or quality control, you arrive at the same structural problem. The person using the tool is not necessarily the issue. How they are trained, assessed, and held accountable is.

That framework does not yet exist for AI-assisted reading at scale, and Gleeson’s point is that the absence of it is not a theoretical risk. It is a replicable one.³

Your vendor contract almost certainly confirms the liability reality. Most enterprise AI agreements in radiology include language that shifts liability for clinical use to the physician or health system. When you signed the contract, you accepted that position. When you sign the report, you own the outcome.

None of this changes the near-term reality of practice. Radiologists will continue signing reports. The regulatory and contractual architecture that currently places liability on the interpreting physician is not going to be redesigned in the next contract cycle, or the next legislative session. Meaningful redistribution of AI accountability, whether toward vendors, health systems, or some shared framework the field has not yet built, requires regulatory changes that move slowly and case law that moves even more slowly. In the meantime, that system is doing something useful that rarely gets acknowledged: it is generating real-world performance data on these tools, in real clinical environments, at scale. Every signed report is a data point. The field is learning, whether it has instrumented that learning or not. The radiologists and practices that use this window deliberately, building the documentation habits, the monitoring infrastructure, and the workflow discipline now, will not be scrambling to retrofit governance when the regulatory environment eventually forces the issue. They will already have it.

The Automation Culpability Shift

Here is the part that surprises radiologists the most. Their exposure does not only increase when they miss something the AI also missed. It increases when they miss something the AI correctly identified.

The legal theory is straightforward: if you deployed a tool specifically to catch certain findings, and that tool flagged a finding, and you did not act on it, the plaintiff’s attorney does not need to argue that AI made you worse. They only need to show that you had a second set of eyes, those eyes flagged the finding, and you ignored it. The tool becomes evidence against you.

The mock-juror research from Bernstein, Sheppard, Bruno, Lay, and Baird at Brown University, Penn State, and Seton Hall, published in Nature Health in March 2026, makes this vivid in numbers. In a randomized experiment with 282 mock jurors, radiologists using a single-read workflow saw plaintiff-siding in 74.7% of cases. Among those using a double-read workflow, where the radiologist read independently before reviewing the AI output, plaintiff-siding dropped to 52.9%. The odds ratio was 2.6. The p-value was 0.0002.⁴

That is not a soft signal.

A single workflow change, reading before you look at what the AI flagged rather than using the AI flag to anchor your read, moves your liability exposure by more than twenty percentage points in a jury room.

That finding should be in your department’s protocol documentation. It is probably not yet.

The High-Volume Practice Problem

There is a liability dimension to AI-assisted productivity that vendor marketing never mentions.

When AI tools compress per-study interpretation time, a radiologist signing a large volume of reports in a compressed window generates a documented pattern that is independently discoverable in litigation, separate from any individual report error. Timestamp metadata, report volume per session, and time-per-study are all extractable from RIS and PACS logs. A radiologist who uses AI efficiency to read at three times their unassisted volume without commensurately rigorous review does not just increase miss-rate exposure. They construct an audit trail that makes the standard of care argument substantially easier for a plaintiff’s attorney to build.

The transcript of that workflow is sitting in your PACS right now.

What Congress Is Regulating, and What It Is Missing

Legislative AI oversight has arrived in Washington. In March 2026, written testimony from Giannikopoulos of Rad AI before the U.S. Senate Commerce Subcommittee addressed AI safety in radiology at the federal level.⁵ That is meaningful progress. There is now a congressional record that acknowledges radiology AI as a domain requiring governance attention.

The limitation is that the regulatory frame Congress is working with is the narrow, single-task triage AI paradigm.

The testimony addressed tools that triage pneumothorax on chest X-ray, flag intracranial hemorrhage on head CT, detect pulmonary embolism on CTA. These are the tools that have FDA clearance and several years of real-world deployment data.

Vision-language models that can draft a complete radiology report, reason across modalities, and integrate clinical context are not the tools being regulated. The gap between what lawmakers are regulating and what VLMs can already do is itself a policy risk, and it is a risk that radiologists, not Congress, will absorb first when something goes wrong.

Harvey, at ECR 2026, outlined where the regulatory trajectory is heading under the EU AI Act: mandatory governance requirements, mandatory audit trails, mandatory registration of which algorithms are in use and what they are trained to do. That is not a hypothetical future. It is the logical regulatory endpoint of the international framework being built right now.⁶

The Audit Trail as Liability Management

The Royal College of Radiologists published post-deployment monitoring guidance in March 2026 that codifies what the accountability infrastructure should actually look like.⁷ Under IR(ME)R 2024, effective October 2024, UK NHS providers are now legally required to maintain a software inventory for every AI tool assisting in interpretation of ionizing radiation imaging. That inventory must include vendor name, brand name, current software version, and installation dates. This is not a voluntary best practice. It has regulatory teeth.

The United States has no analogous mandatory registry obligation yet. But for the American radiologist, the practical message is this: the audit trail your UK counterpart now maintains by law is the same audit trail your plaintiff’s attorney will ask for when something goes wrong. The question is not whether that documentation standard is coming. The question is whether you are building it before or after it becomes mandatory.

The practical implications are not complicated, they are just not yet standard. Document that AI was used, which tool, which version, what it flagged, and what we did with that information. When we disagree with an AI flag, note that we reviewed it and document our clinical reasoning. When we act on an AI flag, the same. Our report is the only place in the clinical record where that narrative can live right now. We should use it.

Reasoning Traces as Clinical Due Diligence

One development in the current VLM landscape is worth understanding specifically for the liability argument. Chain-of-thought reasoning models, such as NV-Reason-CXR-3B developed by Myronenko and colleagues at NVIDIA, NIH, and Children’s Hospital of Philadelphia, produce an auditable reasoning trace alongside their outputs.⁸

The model does not just generate a conclusion. It documents which features it weighted, which alternatives it considered, and why it arrived at the finding it reported.

That reasoning trace is the closest thing radiology AI currently has to “showing your work.” It is documented clinical due diligence in the only form AI is currently capable of producing. A radiologist who reviews that trace and incorporates it into their report documentation has a materially stronger position than one who accepted or rejected an opaque flag with no recorded reasoning on either side.

This is not a product pitch. It is an observation about where the liability argument goes when AI-generated reasoning becomes part of the discoverable record, and the radiologist either engaged with it or did not.

The Quality Score Is a Compliance Tool

Reframe how you think about AI report quality scoring. It is not primarily a quality improvement mechanism. It is a compliance and liability management framework.

A system that measures AI output quality over time, tracks consistency across patient populations, and flags performance deviation from baseline is not building a scorecard for its own sake. It is building the documentation infrastructure that closes the gap between a deployed AI tool and a governed one. That gap is where our liability lives.

The practices building that infrastructure now are doing it as liability management, not administrative overhead. The practices waiting for mandatory requirements will be building it under pressure, probably after something goes wrong.

What Is Coming Next – Article 10

Article 10 turns to TEFCA, the Trusted Exchange Framework and Common Agreement, and the national imaging graph being assembled right now across more than 11 Qualified Health Information Networks. The data layer that makes AI work at scale is also the data layer that creates the next set of governance questions your practice has not yet asked.


Clinical AI does not fail at deployment. It fails quietly after deployment.

If you are building or deploying radiology AI, especially vision-language models or automated reporting systems, post-deployment monitoring is no longer optional. Regulators, including the FDA, increasingly expect continuous assessment of real-world performance, including drift, site variability, and human-AI disagreement.

Veriloop is a vendor-agnostic clinical AI observability layer designed to measure how models behave in real workflows. We track agreement, error patterns, and trust over time, helping teams understand where AI is reliable, where it breaks, and how much workload it can safely absorb.

We do not replace regulatory approval or pre-deployment validation. We provide the missing layer after deployment, where safety, performance, and trust actually evolve.

If you are building, deploying, or evaluating clinical AI systems and need real-world monitoring, reach out at ty@orainformatics.com


References

1. Malpractice Claims Involving AI in Radiology: Emerging Case Law. Background reference; to be verified and updated at publication.

2. FDA 510(k) and De Novo Pathway Overview. FDA.gov. Background reference.

3. Rylands-Monk F. ECR: Does AI toll the beginning of the end for chest x-ray reporting? AuntMinnie Europe. March 6, 2026. https://www.auntminnieeurope.com/resources/conferences/ecr/2026/article/15819005/ecr-does-ai-toll-the-beginning-of-the-end-for-chest-xray-reporting

4. Bernstein MH, Sheppard B, Bruno MA, Lay PS, Baird GL. The Radiologist-AI Workflow and the Risk of Medical Malpractice Claims. Nature Health. March 10, 2026. https://www.nature.com/articles/s44360-026-00085-2

5. Giannikopoulos J / Rad AI. Written Testimony Before the U.S. Senate Commerce Subcommittee. March 3, 2026.

6. Tschabuschnig S. ECR 2026 Ethical AI in Radiology: Why Safety Begins After Deployment. AuntMinnie Europe. March 7, 2026. [Harvey: EU AI Act regulatory framework.] https://www.auntminnieeurope.com/resources/conferences/ecr/2026/article/15818943/ecr-ethical-ai-in-radiology-why-safety-begins-after-deployment

7. Post-Deployment Monitoring and Safety Reporting of AI Medical Imaging Devices in Clinical Practice. Royal College of Radiologists. March 2026. https://www.rcr.ac.uk/our-services/all-our-publications/clinical-radiology-publications/post-deployment-monitoring-and-safety-reporting-of-ai-medical-imaging-devices-in-clinical-practice

8. Myronenko A, Yang D, Turkbey B, Aboian M, et al. Reasoning Visual Language Model for Chest X-Ray Analysis (NV-Reason-CXR-3B). NVIDIA / NIH / Children’s Hospital of Philadelphia. arXiv:2510.23968. 2025.https://arxiv.org/abs/2510.23968

Menu