Ten Boxes to Check Before Investing in an Early Stage AI Startup

If AI in medical imaging is going to make a significant widespread improvement in clinical care, there will need to be 10 to 20 times the current number of FDA approved algorithms. As of mid 2020, there are about 70 approved algorithms. Compare this with the 10,000 things I look for each day as a radiologist. Tech entrepreneurs worldwide are working on this from many angles and hopefully assembling proper teams to address all components of a complex algorithm, but below is a reasonably simple 10 item checklist to use when evaluating an early startup.

  1. How are their imaging sets defined? Do they have training, validation and testing sets? If they don’t have a crystal clear answer, you can stop here as they are likely a liability. 
  2. Where did their testing set come from? This should be from an external source. A common and large problem is to recycle training or validation images for testing.
  3. What image modality vendors were represented in the training, validation and testing sets? If this an MRI algorithm, did they use images from GE, Siemens and Philips in all sets? This must be yes.
  4. Why did they choose the size of the training, validation and testing sets? Previous research? Image data source constraints? The test set number should have traditional statistical benchmarks to support their findings. 
  5. What was the reference standard to train the algorithm? For a chest x-ray algorithm finding pulmonary nodules, the best reference standard is a CT scan, next best would be an expert panel of readers, then a single reader and finally an NLP mined report. A CT scan is much better than an x-ray in finding nodules so using that as the reference standard is ideal in this scenario. Mining prior radiology reports have not been proven to be as accurate and placed lower in the preference of standards. 
  6. How did they prepare the images? Cropped to just include the area of interest? Changed from DICOM to JPEG? Entire CT or MRI sequence or truncated? The downstream applicability can be severely narrowed if significant preparation is required.
  7. What is the AI performance benchmarked to? The only acceptable answers are expert radiologists or pathology results. Non radiologist clinicians, trainees or other algorithms are common benchmarks but to establish credibility in the market the highest standard possible is required. Note, this may seem similar to #5 but training and testing are completely separate.
  8. How is the AI decision made? This is a hard one as many algorithms are not explainable. Including a “bounding box” or some other demarcation on the image to show what the algorithm found in addition to the likelihood prediction is helpful to gain support in the market. This could look like this: There is a 90% likelihood that there is pneumonia on the chest x ray versus the same thing with a circle on the image at the concerning location.
  9. Is some or all of the algorithm publicly available? Were any parts gathered from or posted to Github? There may be benefits from performance evaluation in a crowdsource environment. Also there is the usual business evaluation of an algorithm and is it significantly more useful than an publically available one.
  10. How will this make a clinical impact? Who are you helping? This could be interpreted in the realm of addressable market and should clarify are they helping the radiologist or the referring provider? The single patient or a population segment? Do they understand where this fits in the current clinical workflow?

These 10 questions can help sort the strong teams and products from weak ones. For the past five years I have been helping companies build and deploy algorithms. You can see more of my thoughts on LinkedIn.

Feel free to email me to learn more about evaluating AI algorithms: ty@orainformatics.com

You may be interested in a similar article: AI in Health and Medical Imaging – 5 Problems to Solve

This list is adapted from an article from the Radiological Society of North America and used to guide academic pursuit of excellent manuscripts. You can read the Free Access article here.