RSNA 2008 

Abstract Archives of the RSNA, 2008


SSJ22-03

Assessing Operating Characteristics of CAD Algorithms in the Absence of a Gold Standard

Scientific Papers

Presented on December 2, 2008
Presented as part of SSJ22: Physics (CAD: Methods/Observer Studies)

 Research and Education Foundation Support

Participants

Kingshuk Roy Choudhury PhD, Abstract Co-Author: Nothing to Disclose
Chin A Yi MD, PhD, Abstract Co-Author: Nothing to Disclose
Sandy Napel PhD, Abstract Co-Author: Medical Advisory Board, Fovia, Inc Medical Advisory Board, Vital Images, Inc Consultant, Carestream Health, Inc Stockholder, Hologic, Inc Stockholder, General Electric Company
David Seungwon Paik PhD, Presenter: Spouse is consultant, 23andMe, Inc
Justus E. Roos MD, Abstract Co-Author: Nothing to Disclose
Geoffrey D. Rubin MD, Abstract Co-Author: Consultant, TeraRecon, Inc Medical Advisory Board, Fovia, Inc Grant, Johnson & Johnson Grant, Cook Group Incorporated Speaker, Bracco Group

PURPOSE

Establishing the operating characteristics of CAD algorithms typically requires an independent assessment of the inventory of nodules (‘the truth’ or ‘gold standard’). Previous studies have shown significant variation amongst trained radiologists in determining ‘true’ nodules. We propose a method based on latent class analysis (LCA) that does not require a ‘gold standard’. LCA has been used in diagnostic testing settings with imperfect reference tests, e.g. colorectal cancer screening, determining bacterial infection from specimen samples and testing children for deafness.

METHOD AND MATERIALS

LCA is applied to the LIDC dataset, comprising 36 thoracic CT scans and free search markings of four radiologists (without CAD). Four different radiologists marked the scans with CAD output and previous markings for assistance. A binomial model for detections was constructed, assuming conditional independence of readings given true nodule status. The true positive fraction (TPF) for each screening protocol was estimated by maximum likelihood. Variability of TPF difference between protocols was estimated by repeated resampling. For validation, data for 4 readers each was simulated from two binomial populations with 60% and 70% TPF respectively.

RESULTS

A set of n = 1145 nodule candidates was considered. The estimated TPF of the no CAD readers (TPFM) was 55% and that of the CAD assisted readers (TPFC) was 68%. Under repeated resampling, TPFC was 3% to 41% greater than TPFM in 95% of cases, indicating that TPFC is significantly higher than TPFM. With simulated data, estimated TPFs are close to true TPFs when n > 300. When using three or more no CAD readers as gold standard, the estimated TPFC was 83%.

CONCLUSION

CAD assisted readers appear to have significantly higher sensitivity than non-assisted readers for the LIDC data. Results suggest that the actual sensitivity of CAD based methods may be lower that that predicted by using no CAD readings as gold standard. The substantial over estimation of TPFC when using no CAD readers as gold standard relative to the LCA estimate suggests it may not be a good gold standard, even with four readers.

CLINICAL RELEVANCE/APPLICATION

A gold standard for nodules using multiple readers conducting free search can lead to inaccurate measures of performance. Latent class analysis may provide an alternative method of assessment.

Cite This Abstract

Roy Choudhury, K, Yi, C, Napel, S, Paik, D, Roos, J, Rubin, G, Assessing Operating Characteristics of CAD Algorithms in the Absence of a Gold Standard.  Radiological Society of North America 2008 Scientific Assembly and Annual Meeting, February 18 - February 20, 2008 ,Chicago IL. http://archive.rsna.org/2008/6012880.html