ParticipantsJanette Sam, RT, Vancouver, BC (Presenter) Nothing to Disclose
Evaluate the performance of a commercial Artificial Intelligence (AI) system for breast cancer detection using the digital mammograms from the BC Cancer Breast Screening Program
METHODS AND MATERIALSDigital screening mammograms and associated outcomes, including mammographic findings (features), demographics, and risk factors were extracted for 136,700 women who underwent breast screening in British Columbia, Canada during the period February 1, 2019 - January 31, 2020. The outcome data were extracted on March 18, 2022. The images were de-identified and fed to the Lunit MMG AI algorithm version 1.1.2.0 running on a GeForce RTX 2080 GPU with 11 GB VRAM. The AI model performance was evaluated using Area Under the Curve (AUC) of Receiver Operating Characteristic (ROC) methodology.
RESULTSThe overall performance of the AI algorithm measured with AUC was 0.938 (CI: 0.927-0.949). However, once binary classification is performed using the 10% cut-off value used by the algorithm, the AUC dropped to 0.846 (0.836-0.856) compared to the radiologists performance at 0.937 (0.929-0.944). The AI AUC for BIRAD breast density categories assigned by the radiologists were: A: 0.964 (0.934-0.993); B: 0.946 (0.932-0.961); C: 0.934 (0.916-0.952) and D: 0.831 (0.752-0.911). For women with family history of breast cancer, the AI AUC was 0.919 (0.895-0.943) whereas for women with no family history, 0.945 (0.933-0.957). The algorithm performance for cancers with any architectural distortion was 0.961 (0.944-0.978) whereas for cancers with any calcifications was 0.878 (0.855-0.900).
CONCLUSIONThe tested commercial AI algorithm igeneralizable for a large external cohort from Canada. However, the performance of the AI algorithm fell short of that of the well-qualified screening program radiologists. Performance of the algorithm for women with family history of breast cancer and for cancermanifesting acalcificationwafound to be weaker.
CLINICAL RELEVANCE/APPLICATIONCommercial AI algorithms well trained with datasets from multiple countries for breast cancer detection can be generalizable to cohorts in other countries. However, further improvements may be needed to match the performance of radiologists from well-organized population based breast screening programs.