ParticipantsSahar Kazemzadeh, Mountain View, California (Presenter) Employee, Alphabet Inc;Stockholder, Alphabet Inc
Tuberculosis (TB) is a top-10 cause of death worldwide. Though the WHO recommends chest radiographs (CXR) for TB screening, lack of access to expertise in CXR interpretation limits its use in many regions. To help with this problem, we developed a deep learning system (DLS) to detect active TB, evaluated it with retrospective data from multiple countries and settings, and compared its performance to radiologists from both endemic (India) and non-endemic (US) practice settings.*Methods and Materials We trained a DLS using over 100,000 de-identified CXR from 9 countries spanning Africa, Asia, and Europe. To improve generalization, we incorporated large-scale CXR pretraining, attention pooling, and semi-supervised learning via “noisy student”. The DLS was evaluated on a combined test set of 1,262 images (1 per patient) from China, India, US, and Zambia, with TB confirmation via microbiology or molecular testing. Given WHO targets of 90% sensitivity and 70% specificity, the DLS’s operating point was prespecified to favor sensitivity over specificity.*Results The DLS’s receiver operating characteristic (ROC) curve was above all 9 India-based radiologists, with an area under the curve (AUC) of 0.90 (95%CI 0.87-0.92). At the prespecified operating point, the DLS’s sensitivity (88%) was higher than the India-based radiologists (median sensitivity: 74%, range 69-87%, p<0.001 for superiority), and the DLS’s specificity (79%) was non-inferior to these radiologists (median specificity: 86%, range 78-88%, p=0.004). Similar trends were observed within HIV positive and sputum smear positive sub-groups within these datasets. We additionally found that 5 US-based radiologists were more sensitive but less specific than the India-based radiologists. The DLS was similarly non-inferior to this second cohort of radiologists. Depending on the setting, use of the DLS as a prioritization tool could reduce the cost per positive TB case detected by 40-80% compared to the use of molecular testing alone.*Conclusions We developed a DLS to detect active pulmonary TB on CXR, that generalized to patient populations from 4 regions of the world, and merits prospective evaluation to assist cost-effective screening efforts in settings with scarce access to radiologists.*Clinical Relevance/Application Our AI detects pulmonary tuberculosis on chest x-rays with performance comparable to radiologists, and could be a cost-effective way to select patients for confirmatory workup and treatment.
RESULTSThe DLS’s receiver operating characteristic (ROC) curve was above all 9 India-based radiologists, with an area under the curve (AUC) of 0.90 (95%CI 0.87-0.92). At the prespecified operating point, the DLS’s sensitivity (88%) was higher than the India-based radiologists (median sensitivity: 74%, range 69-87%, p<0.001 for superiority), and the DLS’s specificity (79%) was non-inferior to these radiologists (median specificity: 86%, range 78-88%, p=0.004). Similar trends were observed within HIV positive and sputum smear positive sub-groups within these datasets. We additionally found that 5 US-based radiologists were more sensitive but less specific than the India-based radiologists. The DLS was similarly non-inferior to this second cohort of radiologists. Depending on the setting, use of the DLS as a prioritization tool could reduce the cost per positive TB case detected by 40-80% compared to the use of molecular testing alone.
CLINICAL RELEVANCE/APPLICATIONOur AI detects pulmonary tuberculosis on chest x-rays with performance comparable to radiologists, and could be a cost-effective way to select patients for confirmatory workup and treatment.