A research paper comparing the performance of commercially available AI algorithms was published in the Radiology. While multiple commercial artificial intelligence (AI) products for assessing radiographs exist in the market, comparable performance data for these algorithms is limited.

The research team led by Kicky G. van Leeuwen aimed to perform an independent, stand-alone validation of commercially available AI products for bone age prediction based on hand radiographs and lung nodule detection on chest radiographs.

A methodology was created for conducting validation studies to assess the stand-alone performance of commercially available Conformité Européenne–marked AI-based software for radiology. The products are tested on representative data sets, and the results are compared with radiologists’ performance on the same sets. The data sets remain confidential, which allows the process to be repeated when new products and updated algorithms are brought to market. 

Oxipit ChestEye algorithm was made available for the lung nodule detection comparison. In total seven lung nodule detection algorithms were validated on chest radiographs. The vendors had no control of the data and information submitted for publication.

For lung nodule detection, the final set used for validation consisted of radiographs from 386 patients (mean age, 64 years ± 11; 223 male patients), of whom 144 had at least one nodule according to the reference standard and were therefore considered nodule cases, and 242 were considered controls. Lateral radiographs were available for 383 patients.

The algorithms and human readers showed a wide performance spread regarding the AUC. The mean AUC for the readers (n = 17) was 0.81 (95% CI: 0.77, 0.85). Compared with human readers, multireader multicase analysis demonstrated superior performance for Annalise.ai (AUC, 0.90 [95% CI: 0.87, 0.94]; P < .001), Lunit (AUC, 0.93 [95% CI: 0.91, 0.96]; P < .001), Milvue (AUC, 0.86 [95% CI: 0.82, 0.90]; P = .04), and Oxipit (AUC, 0.88 [95% CI: 0.85, 0.92]; P = .005). 

The researchers note that the Oxipit product is intended to autonomously report on normal chest radiographs in cases where it is highly certain of the results. In the last study by the vendor, sensitivity was 99.8% at a specificity level of 28%. As other abnormalities were included as well, results are not directly comparable, but the receiver operating characteristic curve from our study confirms that this algorithm is optimized for high sensitivity.

The full study is available in the Radiology.

The lung nodule detection leaderboard is published here.