Do Computer-aided Diagnosis Systems in Mammography Need to Be Trained to Individual Observers?

SSQ01-03

Do Computer-aided Diagnosis Systems in Mammography Need to Be Trained to Individual Observers?

Scientific Papers

Presented on December 1, 2005
Presented as part of SSQ01: Breast (Computer-assisted Detection)

Charles Edward Kahn MD, Presenter: Nothing to Disclose

Katie A. McCarthy, Abstract Co-Author: Nothing to Disclose

Elizabeth Suzanne Burnside, Abstract Co-Author: Nothing to Disclose

To determine if interobserver variability in use of the Breast Imaging Reporting and Data System (BI-RADS) influences the performance of a system for computer-aided diagnosis (CADx) in mammography, we analyzed the effect on diagnostic performance of training "personalized" decision models.

A Bayesian network model predicted the probability of malignancy based on six demographic risk factors (such as age, hormone replacement therapy, and family history of breast cancer) and 30 mammography findings encoded by radiologists using BI-RADS descriptors. The initial model contained probabilities from the literature and expert opinion. Eight experienced radiologists used BI-RADS descriptors to encode findings from all diagnostic and screening mammography examinations performed in a 58-month period at a single institution. We prospectively excluded any radiologist who read fewer than 1000 exams. To examine the effect of interobserver variability, we trained and tested the CADx system in two ways. The "combined" model was trained using exams read by all radiologists. The "personalized" models were trained and tested separately for each radiologist; results were aggregated for analysis. Ten-fold cross-validation was used; receiver operating characteristic (ROC) analysis was performed.

The six radiologists who met the volume criteria for inclusion interpreted 48,215 examinations (range, 1,157 to 21,917 exams per radiologist; median, 4,241 exams) on 17,994 patients. Before training, the system's Az (area under the ROC curve) was 0.923. The Az values were 0.975 for the "combined" model and 0.978 for the "personalized" models. Although both approaches were significantly better than the untrained model (p=.0002), the "personalized" models' performance was not significantly different from that of the "combined" model (p>.9).

Personalized training of a probabilistic CADx system for mammography yielded no significant difference in diagnostic performance. This finding suggests that interobserver variability in BI-RADS terminology usage does not significantly affect the performance of a CADx system based on BI-RADS descriptors.

Kahn, C, McCarthy, K, Burnside, E, Do Computer-aided Diagnosis Systems in Mammography Need to Be Trained to Individual Observers?. Radiological Society of North America 2005 Scientific Assembly and Annual Meeting, November 27 - December 2, 2005 ,Chicago IL. http://archive.rsna.org/2005/4409068.html