RSNA 2012 

Abstract Archives of the RSNA, 2012


SSG09-09

A Statistical Model for Predicting Sample Size for Radiologists' Perception of Similarity in Liver Lesions

Scientific Formal (Paper) Presentations

Presented on November 27, 2012
Presented as part of SSG09: ISP: Informatics (Advanced Visualization)

Participants

Jessica Faruque MS, Presenter: Nothing to Disclose
Daniel L. Rubin MD, Abstract Co-Author: Grant, General Electric Company
Christopher Frederick Beaulieu MD, PhD, Abstract Co-Author: Nothing to Disclose
Jarrett Rosenberg PhD, Abstract Co-Author: Nothing to Disclose
Sandy Napel PhD, Abstract Co-Author: Medical Advisory Board, Fovia, Inc Consultant, Carestream Health, Inc

PURPOSE

A gold standard for perceptual similarity in medical images is vital to content-based image retrieval (CBIR) applications, but it is challenging to assess similarity due to inter-reader variability. Our objective was to develop a statistical model that predicts the number of readers necessary to achieve acceptable levels of variability. 

METHOD AND MATERIALS

We first collected empirical data, which consisted of 3 radiologists’ ratings of the psychophysical similarity of 171 pair-wise combinations of 19 CT images of focal liver lesions on a 9-point scale. We estimated the parameters of the readers' score distributions from the empirical data using an Expectation Maximization algorithm. Using these parameters, we then simulated readers’ scores as bimodal distributions with different levels of additive Gaussian noise. We calculated the agreement between the ground truth and the mean value of the simulated radiologists’ scores for each image pair, as well as inter-reader agreement, using a quadratically-weighted Cohen’s Kappa metric (K). We varied the noise standard deviation (S) and the number of readers (N) in the simulations, computing 1000 iterations for each combination of N and S.

RESULTS

Inter-reader agreement for the empirical data ranged from K=0.41 to 0.66. Simulated agreement for 171 image pairs and 3 readers yielded this range for S=1.5 to 2.5 (9-point scale). For these values of S, agreement with the ground truth ranged from K=.81±.02 to .91±.04. As expected, agreement with the ground truth increased with the number of readers, ranging from K=.83±.03 to .92±.06 for N=2 to 50, respectively, for additive noise with S=2.

CONCLUSION

Our simulations demonstrated that in the presence of moderate to good inter-reader agreement in the studies, excellent agreement with the ground truth could nonetheless be obtained from the observations. Thus, this statistical model for perceptual similarity may be used to predict the number of readers necessary to accurately evaluate similarity in arbitrary size datasets. 

CLINICAL RELEVANCE/APPLICATION

A statistical model of similarity may be useful in creating reference standards, which may enable a better understanding of radiologists’ approach to evaluating medical images.

Cite This Abstract

Faruque, J, Rubin, D, Beaulieu, C, Rosenberg, J, Napel, S, A Statistical Model for Predicting Sample Size for Radiologists' Perception of Similarity in Liver Lesions.  Radiological Society of North America 2012 Scientific Assembly and Annual Meeting, November 25 - November 30, 2012 ,Chicago IL. http://archive.rsna.org/2012/12037848.html