SSG03

Chest (Lung Nodule)

Tuesday, Nov. 27 10:30AM - 12:00PM Room: S504AB

AICHCT

AMA PRA Category 1 Credits ™: 1.50
ARRT Category A+ Credit: 1.75

FDA Discussions may include off-label uses.

Participants
Sudhakar N. Pipavath, MD, Mercer Island, WA (Moderator) Adjudicator, Gilead Sciences, Inc
Mark L. Schiebler, MD, Madison, WI (Moderator) Stockholder, Stemina Biomarker Discovery, Inc; Stockholder, HealthMyne, Inc;

Sub-Events
SSG03-01

Awards
Student Travel Stipend Award

Participants
Jooae Choe, MD, Seoul, Korea, Republic Of (Presenter) Nothing to Disclose
Sang Min Lee, MD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Kyunghee Lee, MD, PhD, Seongnam, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Kyu-Hwan Jung, PhD, Seoul, Korea, Republic Of (Abstract Co-Author) Stockholder, VUNO Inc
Jaeyoun Yi, Seoul, Korea, Republic Of (Abstract Co-Author) Officer, Coreline Soft, Co Ltd
Sang Min Lee, MD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Joon Beom Seo, MD, PhD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose

For information about this presentation, contact:

sangmin.lee.md@gmail.com

PURPOSE

To evaluate the added value of a deep-learning based computer-aided detection (CAD) system for multiclass multiple lesions on radiographs when radiologists read chest radiographs.

METHOD AND MATERIALS

We developed new CAD system using deep learning for detecting multiple lesions with 4 different patterns (nodule/mass, interstitial opacity, pleural effusion, and pneumothorax) on chest radiograph. To train the deep learning network, 17917 images were collected in two tertiary hospitals. Numbers of normal and abnormal patients are 11000 and 6917, respectively. We labeled disease type and delineate region of interests (ROI) drawn as ground truths by two thoracic radiologists with consensus. To validate the effect of the developed CAD on observer's performance, 9 observers including 7 board-certified radiologists and two radiology residents reviewed 200 chest radiographs twice with two weeks interval. 200 chest radiographs consists of 100 normal and 100 abnormal (nodule/mass: 60, interstitial opacity: 10, pleural effusion: 10, pneumothorax: 10) chest radiographs. The diagnostic performance of the developed CAD, observers with and without CAD were evaluated and compared using jackknife free-response receiver operating characteristic (JAFROC) figure of merits (FOMs) on a per-lesion basis. The reading time for review was recorded.

RESULTS

The developed CAD showed FOMs of 0.931 for nodule/mass, 0.900 for interstitial opacity, 1 for pleural effusion, and 1 for pneumothorax. The mean FOMs of 9 observers without CAD were 0.916 for nodule/mass, 0.922 for interstitial opacity, 0.944 for pleural effusion, and 0.978 for pneumothorax. After applying the CAD, the mean FOMs of 9 observers were 0.942 for nodule/mass, 0.900 for interstitial opacity, 0.967 for pleural effusion, and 1 for pneumothorax. Except for interstitial opacity, the accuracy of three patterns with CAD increased. The mean reading time was 91.5 minutes 53.2 without CAD and 79.1 minutes 28.2 with CAD.

CONCLUSION

The deep-learning based CAD may help improve observer performance for reading chest radiograph as well as reducing reading time.

CLINICAL RELEVANCE/APPLICATION

The deep-learning based CAD has the potential to improve observer efficiency in terms of accuracy and reading and may provide preliminary interpretation for chest radiographs.

SSG03-02

Awards
Student Travel Stipend Award

Participants
Yongsik Sim, MD, Seoul, Korea, Republic Of (Presenter) Nothing to Disclose
Myung Jin Chung, MD, Seoul, Korea, Republic Of (Abstract Co-Author) Research Grant, General Electronic Company; Research Grant, Samsung Electronics Co, Ltd; Research Grant, Lunit Inc
Elmar C. Kotter, MD, MSc, Freiburg, Germany (Abstract Co-Author) Editorial Advisory Board, Thieme Medical Publishers, Inc
Synho Do, PhD, Boston, MA (Abstract Co-Author) Nothing to Disclose
Kyunghwa Han, PhD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Hanmyoung Kim, MS, Suwon, Korea, Republic Of (Abstract Co-Author) Employee, Samsung Electronics Co, Ltd
Seungwook Yang, PhD, Suwon, Korea, Republic Of (Abstract Co-Author) Employee, Samsung Electronics Co, Ltd
Dong-Jae Lee, Suwon, Korea, Republic Of (Abstract Co-Author) Employee, Samsung Electronics Co, Ltd
Byoung Wook Choi, MD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose

For information about this presentation, contact:

ysim1@yuhs.ac

PURPOSE

To evaluate performance of radiologists detecting pulmonary malignant nodules assisted by deep-learning based computer-aided detection (CAD) software, compared with performance of radiologist or CAD alone.

METHOD AND MATERIALS

Each of four participating centers in three countries retrospectively collected 150 lung cancer radiographs and 50 normal radiographs. Normal x-ray images are from healthy adults, confirmed by a CT scan taken within 14 days. Each cancer x-ray image has 1 to 3 pathologically confirmed nodule(s), whose sizes are between 1 and 3 centimeters. The estimated location of each nodule was marked on x-ray image referring to the CT scan. 12 radiologists from 4 institutions with various experiences independently analyzed a set of x-ray images and marked region of interests (ROIs) on each radiograph in suspicion of a nodule. Deep learning-based computer-aided detection (CAD) software was applied to find suspicious nodules on chest radiographs. Finally, 12 radiologists reviewed whole set of images with assistance of CAD, accepting or dismissing ROIs suggested by CAD. Sensitivity and false negative per image (FPPI) of radiologist alone, CAD alone and radiologist with CAD were statistically analyzed.

RESULTS

The overall sensitivity and FPPI of the CAD system were 63.75% and 0.20, which was not statistically distinct from those of radiologists. The average sensitivity of radiologists appeared to increase significantly from 65.1% to 70.3%, after aided by the CAD software (p<0.0001). The average FPPI was 0.2 and 0.18, without and with CAD, respectively. The decline of FPPI was significant (p=0.0006). On subgroup analysis, incremental effects of CAD on nodule detection sensitivity were not affected by radiologists' experience, size, location, type (primary or metastatic) of nodules and modality of acquisition.

CONCLUSION

The average sensitivity and FPPI of our CAD system were not statically different from those of radiologists. When radiologists were assisted by the CAD, overall sensitivity increased significantly while FPPI seemed to decrease. Incremental effect of the CAD system was not affected by radiologist's experience, characteristics of a nodule or modality, which can support the potential general use of this software.

CLINICAL RELEVANCE/APPLICATION

Radiologists' performance in lung cancer nodule detection can be improved with a deep learning-based CAD system regarding both sensitivity and false positive rate.

SSG03-03

Participants
Sarim Ather, MBChB, PhD, Oxford, United Kingdom (Presenter) Nothing to Disclose
Carlos Arteta, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Nicholas Dowson, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Lyndsey C. Pickup, MEng, DPhil, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd; Co-founder, Optellum Ltd
Petr Novotny, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Catarina Santos, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Heiko Peschl, Oxford, United Kingdom (Abstract Co-Author) Nothing to Disclose
Maria Tsakok, Oxford, United Kingdom (Abstract Co-Author) Nothing to Disclose
William Hickes, MSc, Oxford, United Kingdom (Abstract Co-Author) Research Grant, Mirada Medical Ltd
Samia Hussain, Oxford, United Kingdom (Abstract Co-Author) Nothing to Disclose
Jerome M. Declerck, PhD, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd; Co-founder, Optellum Ltd
Vaclav Potesil, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd Founder, Optellum Ltd Employee, Hocoma AG
Timor Kadir, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd;
Fergus V. Gleeson, MBBS, Oxford, United Kingdom (Abstract Co-Author) Consultant, Alliance Medical Limited; Consultant, Blue Earth Diagnostics Ltd; Consultant, Polarean, Inc

For information about this presentation, contact:

sarim.ather@ouh.nhs.uk

PURPOSE

To assess the impact of automated segmentation of pulmonary nodules by measuring the accuracy of the prediction of malignancy using the Brock University Cancer Prediction Model.

METHOD AND MATERIALS

Retrospective analysis was carried out of 7927 nodules (of which 314 were malignant) from 5394 patients who were scanned as part of the US NLST (mean age 625 years; of which 3192 were male). Following BTS guidelines, nodules <5mm in size were excluded, but all other nodules were included regardless of type, attenuation, and margin. Automatic 3D nodule segmentations were generated via a deep learned model and initiated with a single click point inside the nodule. We used the following methods for measuring nodule size: the NSLT radiologist measurements, D2D, the long axis from the automatic segmentations, D3D, and in order to characterize the nodule volumes more accurately, the volumes of the automatic segmentations, V, were converted to an equivalent linear size using the equation for a sphere. Each was tested as the size term in the standard Brock model to generate a malignancy risk and Area-Under-the-Receiver-Operating-Characteristics (AUC-ROC) curve calculated.

RESULTS

The AUC-ROC was 85.96% (95% confidence interval (CI): 84.33, 87.76) for D2D, 86.64 (95% CI: 85.04, 88.19) for D3D, and 88.17 (95% CI: 86.71, 89.82) for Dsph. The expected increase in AUC Dsph offers over D2D is 2.21 (95% CI: 1.28, 3.12).

CONCLUSION

The automatic nodule size measurements outperformed the manual radiologist measurements in predicting lung cancer as an input to the Brock model. The non-axial Dsph, which is derived from the volumetric segmentation outperforms both long axis-based methods. Assessing nodule segmentation by measuring prediction efficacy is a viable alternative to overlap measures such as DICE.

CLINICAL RELEVANCE/APPLICATION

Automatic segmentation removes the need for manual extraction of axial diameters of lung nodules. It is not subject to intra- and inter-radiologist variation thereby improving consistency.

SSG03-04

Awards
Student Travel Stipend Award

Participants
Ramandeep Singh, MBBS, Boston, MA (Presenter) Nothing to Disclose
Chayanin Nitiwarangkul, MD, Boston, MA (Abstract Co-Author) Nothing to Disclose
Jo-Anne O. Shepard, MD, Boston, MA (Abstract Co-Author) Nothing to Disclose
Fatemeh Homayounieh, MD, Chelsea, MA (Abstract Co-Author) Nothing to Disclose
Atul Padole, MD, Boston, MA (Abstract Co-Author) Nothing to Disclose
Shaunagh McDermott, FFR(RCSI), Boston, MA (Abstract Co-Author) Nothing to Disclose
Mannudeep K. Kalra, MD, Boston, MA (Abstract Co-Author) Research Grant, Siemens AG; Research Grant, Canon Medical Systems Corporation
Subba R. Digumarthy, MD, Boston, MA (Abstract Co-Author) Nothing to Disclose
Brent Little, MD, Boston, MA (Abstract Co-Author) Author, Reed Elsevier; Editor, Reed Elsevier

PURPOSE

Most studies with CAD and artificial intelligence (AI) software have focused on solid lung nodules. We assessed the effect of AI-based vessel suppression (AI-VS) and automatic detection (AI-AD) on ground glass (GGN) and part-solid lung nodules (PSN) in low-dose CT (LDCT).

METHOD AND MATERIALS

Our study included 100 LDCT examinations with mixed attenuation pulmonary nodules (average diameter>5mm) identified from the National Lung Cancer Screening Trial (NLST). These exams were not used in training or validation of the AI software (ClearRead CT, Riverain Inc.). All 100 LDCT were processed to generate three image series per case - unprocessed, AI-VS, and AI-AD series with annotated lung nodules. Two thoracic radiologists (R1: 3-year experience, R2: 27-year experience) independently assessed the unprocessed images alone, then together with AI-VS series, and finally with AI-AD. For each assessment, number of all > 5mm with location & size of dominant GGN and PSN were recorded. Descriptive statistics and student t tests were performed for data analysis.

RESULTS

On unprocessed images, R1 and R2 detected 278 nodules (123 PSN, 155 GGN) and 269 (117 PSN, 152 GGN), respectively (p>0.05). With addition of AI-VS images, R1 and R2 detected 290 nodules (126 PSN, 164 SSN) and 293 (132 PSN, 161 GGN), respectively, which were significantly greater than those detected without the AI-VS (p= 0.004). AI-VS aided in detection of solid component in 22 PSN which were deemed SSN by both readers. Conversely, AI-AD annotated only 75 PSN and 54 GGN (total 129 nodules). In 21 patients, AI-AD did not detect the dominant PSN or SSN; it detected 14 false positive nodules (vessels, atelectasis, anterior junctional line). Average respective sizes of 69-matched and detected PSN on unprocessed and AI-AD series were 15 7 mm and 13 6 mm (p =0.07).

CONCLUSION

AI-VS improves detection and characterization of GGN and PSN on LDCT of the chest. Specifically, improved and easier detection of the solid component in non-solid nodules with AI-VS can avoid false down-grading of Lung-RADS category, and thus help in appropriate patient management.

CLINICAL RELEVANCE/APPLICATION

AI software can aid in improved detection and confident detection of ground-glass and part-solid lung nodules on low dose chest CT.

Honored Educators

Presenters or authors on this event have been recognized as RSNA Honored Educators for participating in multiple qualifying educational activities. Honored Educators are invested in furthering the profession of radiology by delivering high-quality educational content in their field of study. Learn how you can become an honored educator by visiting the website at: https://www.rsna.org/Honored-Educator-Award/ Subba R. Digumarthy, MD - 2013 Honored EducatorBrent Little, MD - 2018 Honored Educator

SSG03-05

Participants
Montserrat Alemany, Uppsala, Sweden (Presenter) Nothing to Disclose
Tomas Hansen, MD, PhD, Uppsala, Sweden (Abstract Co-Author) Nothing to Disclose
Carlos Trampal, Uppsala, Sweden (Abstract Co-Author) Nothing to Disclose
Jens Sorensen, Uppsala, Sweden (Abstract Co-Author) Nothing to Disclose

For information about this presentation, contact:

montserrat.alemany.ripoll@akademiska.se

PURPOSE

Detection of small lung nodules is important for apropiate staging of cancer. There is controversy in literature about the value of adding a separate CT of the lungs in deep inspiration. Radiation dose is no longer an issue with the use of modern equipment because only aproximately 3 mSv are added to the usual dose. The purpose of this study was to assess the value of additional thoracic CT in deep inspiration and the use of maximum intensity projection (MIP) reconstructions in PET-CT of oncologic patients.

METHOD AND MATERIALS

186 consecutive patients (99 male and 89 female; mean age,72 years; range: 26-93 y) underwent FDG PET-CT for one of the following indications: characterization of a new detected lung nodule/mass (n=101), staging of cancer (n=31), therapy respons monitoring (n=33), suspicion of tumor relapse (n=19) and cancer of unknown origin (n=2). After PET-CT acquisition with shallow breathing, a thoracic CT in deep inspiration was performed to all patients (slide thickness: 1.25 mm). MIP of the two sets of lung images were performed. Two experienced radiologist analyzed the 4 sets of CT studies. The number of lung nodules was recorded. Lung nodule was defined as a rounded opacity smaller than 10 mm completely surrounded by lung parenchyma. The clinical relevance of the eventual discrepancies between CT studies was analized (i.e. upstaging).

RESULTS

120/186 patients presented with nodules. PET-CT with shallow breathing detected 393 nodules, and 578 when MIP images were analized. Thoracic CT with deep inspiration found 534 nodules and 905 when MIP was used. The number of detected nodules increased from free breathing to breathe hold CT in 42 patients. The detected number of nodules with breath hold technique compared with free breathing increased increased in 51 patients when MIP was used. The extradetected nodules were considered clinical relevant in 7/120 (6%) of patients because they influence patient management for example by increasing TNM staging.

CONCLUSION

According to our results the addition of deep inspiration thoracic CT with MIP reconstructions can be recommended in clinical practice because this approach yields better performance in TNM staging in oncologic patients.

CLINICAL RELEVANCE/APPLICATION

Addition of deep inspiration CT with MIP reconstructions to conventional FDG PET-CT in oncologic patients yields better performance in TNM staging.

SSG03-06

Participants
Heiko Peschl, Oxford, United Kingdom (Abstract Co-Author) Nothing to Disclose
Carlos Arteta, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Lyndsey C. Pickup, MEng, DPhil, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd; Co-founder, Optellum Ltd
Maria Tsakok, Oxford, United Kingdom (Abstract Co-Author) Nothing to Disclose
Sarim Ather, MBChB, PhD, Oxford, United Kingdom (Presenter) Nothing to Disclose
Samia Hussain, Oxford, United Kingdom (Abstract Co-Author) Nothing to Disclose
William Hickes, MSc, Oxford, United Kingdom (Abstract Co-Author) Research Grant, Mirada Medical Ltd
Petr Novotny, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Catarina Santos, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Emily Fay, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd
Jerome M. Declerck, PhD, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd; Co-founder, Optellum Ltd
Vaclav Potesil, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd Founder, Optellum Ltd Employee, Hocoma AG
Timor Kadir, Oxford, United Kingdom (Abstract Co-Author) Employee, Optellum Ltd;
Fergus V. Gleeson, MBBS, Oxford, United Kingdom (Abstract Co-Author) Consultant, Alliance Medical Limited; Consultant, Blue Earth Diagnostics Ltd; Consultant, Polarean, Inc

For information about this presentation, contact:

Heiko.Peschl@ouh.nhs.uk

PURPOSE

To assess the follow-up rule-out accuracy of a convolutional neural network (CNN) in patients with incidentally detected, indeterminate pulmonary nodules in a multi-site, heterogeneous population.

METHOD AND MATERIALS

The US National Lung Screening Trial (NLST) dataset was manually curated and used to create a training set: each reported nodule and cancer was located, contoured and diagnostically characterised (9310 benign nodule patients; 1058 cancer patients). All patients with solid and semi-solid nodules of 6mm and above, where benign nodules and cancers could be confidently identified by clinicians (5972 patients, of which 575 were cancer patients), were selected. A CNN was trained using Deep Learning and three thresholds for benign rule-out were calculated at three levels of sensitivity: 100%, 99.5% and 99%. An independent dataset of patients with incidentally detected indeterminate pulmonary nodules was retrospectively collected from a tertiary referral centre and surrounding hospitals in the UK with a heterogeneous mix of scan parameters, manufacturers and clinical indications (610 patients, 698 nodules, 5-15mm). Diagnosis was established according to British Thoracic Society guidelines (2015). The dataset contained 50 cancers from 47 patients (7% of all nodules). Performance was evaluated by measuring the specificity at the three benign rule-out thresholds; i.e. to measure the proportion of benign nodules correctly stratified while missing no or few cancers. Overall Area-Under-the-ROC-Curve analysis (AUC) was also calculated.

RESULTS

The specificity (sensitivity) was 24% (100%), 24% (100%) and 48.6% (100%) at the three thresholds respectively. AUC was 0.93 (95%CI = 0.90-0.96).

CONCLUSION

On this independent dataset, the CNN was able to correctly classify just under half of the benign nodules whilst not misclassifying any cancers.

CLINICAL RELEVANCE/APPLICATION

Our work shows the potential of CNNs in ruling out benign pulmonary nodules and therefore reducing the need for follow up scans in a large number of patients.

SSG03-07

Participants
Mark M. Hammer, MD, Saint Louis, MO (Presenter) Nothing to Disclose
Lauren Palazzo, Boston, MA (Abstract Co-Author) Nothing to Disclose
Andrew Eckel, Boston, MA (Abstract Co-Author) Nothing to Disclose
Eduardo J. Mortani Barbosa JR, MD, Philadelphia, PA (Abstract Co-Author) Nothing to Disclose
Chung Yin Kong, PhD, Boston, MA (Abstract Co-Author) Nothing to Disclose

For information about this presentation, contact:

markmhammer@gmail.com

PURPOSE

To use simulation modeling based on evidence from the literature to evaluate several management strategies and treatment options for patients with ground glass nodules (GGNs).

METHOD AND MATERIALS

We developed a Monte Carlo model for patients with GGNs as they underwent follow-up per Lung-RADS for up to ten years. Nodules could grow and develop solid components over time. Rates of clinically-significant malignancy were calibrated to data from the National Lung Cancer Screening Trial. We investigated modifications to the follow-up schedule and tested different treatment strategies, specifically lobectomy, radiation therapy, and no therapy.

RESULTS

Overall, 2.3% of nodules represented clinically significant malignancies, and 6.3% of nodules were treated. Only 29.8% of Lung-RADS 4B/4X nodules were clinically-significant malignancies. We compared outcomes of patients with Lung-RADS 2 nodules followed at 1-, 2-, and 3-year intervals; overall survival at 10 years of follow-up was similar, ranging from 74.7% (annual) to 73.5% (triennial). We also evaluated 10-year outcomes from Lung-RADS 4B/4X non-solid nodules treated with different modalities; at 10 years, overall survival was highest in the radiation therapy arm, at 83.9%, and lowest in the no treatment arm, at 78.1%.

CONCLUSION

Our results suggest a conservative approach to the follow-up and treatment of GGNs. The follow-up interval for GGNs can be increased to 3 years with minimal change in outcomes. Our results also favor the use of radiation therapy when a nodule has met criteria for treatment. Prospective randomized trials are needed to evaluate thresholds for management and different treatment modalities for GGNs.

CLINICAL RELEVANCE/APPLICATION

Conservative management strategies for non-solid nodules, such as triennial follow-up for Lung-RADS 2 nodules and radiation therapy instead of lobectomy for Lung-RADS 4B/4X nodules, are preferable to more aggressive treatment.

SSG03-08

Participants
Jianlin Wu, MD, Dalian, China (Abstract Co-Author) Nothing to Disclose
Wen Tang, Beijing, China (Abstract Co-Author) Employee, Infervision Inc
Rongguo Zhang, Beijing, China (Abstract Co-Author) Employee, Infervision Inc
Tianci Song, Beijing, China (Abstract Co-Author) Employee, Infervision Inc
Chen Xia, Beijing, China (Abstract Co-Author) Nothing to Disclose
Yufeng Deng, PhD, Durham, NC (Presenter) Employee, Infervision Inc
Kai Liu, Shanghai, China (Abstract Co-Author) Nothing to Disclose
Yi Xiao, Shanghai, China (Abstract Co-Author) Nothing to Disclose
Shiyuan Liu, PhD, Shanghai, China (Abstract Co-Author) Nothing to Disclose

PURPOSE

Pulmonary nodules could be early manifestations of lung cancer, but the morphological complexity makes it difficult to differentiate benign and malignant nodules. This paper proposes two deep learning models aiming to accurately determine the malignancy of pulmonary nodules from CT images.

METHOD AND MATERIALS

Model-1 was adapted from the winning model in Data Science Bowl 2017.. We chose ResNet as its backbone and integrated U-Net and Capsule Network architectures to enable the model to comprehensively capture multiscale features of pulmonary nodules. Model-2 took extracted features from Model-1 as input to a random forest classifier to further predict nodule malignancy, as inspired from the NoduleX model. Two datasets were adopted to validate the performance of the proposed two models. Dataset 1 contains 1061 samples (benign/malignant: 703/353) from Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), and Dataset 2 consists of 1117 samples (benign/malignant: 354/763) provided by collaborating hospitals. Nodules in both datasets were biopsy or surgery proven, and pathology diagnoses were used as gold standard. We randomly selected 20% from each dataset as the testing set and used the rest 80% as the training set. We trained and tested our two models on the above two datasets respectively.

RESULTS

On Dataset 1 (LIDC-IDRI), Model-1 achieved an AUC of 0.91 in the prediction of pulmonary nodule malignancy while Model-2 achieved an AUC of 0.96. On Dataset 2, Model-1 again reached a high AUC of 0.90, which significantly outperformed the Model-2 with AUC=0.80.

CONCLUSION

Model-1 showed consistently high accuracy in pulmonary nodule malignancy prediction on both the LIDC dataset and CT scans collected from collaborating hospitals. Our two models achieved comparable results with NoduleX model which had got the state-of-the-art performance in LIDC dataset. The experimental results demonstrated that Model-1 showed more stable performance across datasets and had better model robustness. The strength of Model-1 may lie in its Capsule Network structure that could extract more universally informative features and the end-to-end deep learning architecture.

CLINICAL RELEVANCE/APPLICATION

Our proposed model can serve as a useful tool for early diagnosis of lung cancer and has the potential to be applied in clinical treatment planning.

SSG03-09

Participants
Audrey Winter, PhD, Los Angeles, CA (Presenter) Nothing to Disclose
William Hsu, PhD, Los Angeles, CA (Abstract Co-Author) Research Grant, Siemens AG

For information about this presentation, contact:

AWinter@mednet.ucla.edu

PURPOSE

Lung cancer screening results in the discovery of an estimated 1.57 million screen- and incidentally-detected pulmonary nodules. Prediction models, which estimates the probability of lung cancer in pulmonary nodules detected on computed tomography (CT) can potentially aid in manage patients and minimize overdiagnosis. Thus, we performed an external validation of an existing model developed by McWilliams et al (doi:10.1056/NEJMoa1214726).

METHOD AND MATERIALS

Based on the inclusion/exclusion criteria stated by McWilliams, we identified 7,879 non-calcified nodules greater than 4 mm discovered at the baseline CT screening with at least 2 years of follow-up using data from the CT arm of the National Lung Screening Trial (NLST). We assessed model discrimination (the ability to distinguish between cancer/not cancer) and calibration (the agreement between predicted and observed probabilities). We identified differences between PanCan, the derivation dataset, and NLST. The regression coefficient and the intercept coefficient were estimated by fitting a logistic regression on NLST. We also attempted to update and recalibrate the model. Finally, we evaluated whether the addition of new covariates such as body mass index, smoking status, pack-years and asbestos improved performance.

RESULTS

While the AUC of the model was good 0.905 [0.882-0.928]), the histogram plot showed that whether a nodule was cancer/not cancer could not be well-separate (see Figure, left). The calibration plot showed that the model tended to overestimate the probability of cancer. Following methods in Steyerberg et al (doi: 10.1002/sim.1844), the updated model achieved an AUC of 0.914 [0.892-0.936] and a better calibration (see Figure, right). Emphysema (p=0.03) and nodule spiculation (p<0.01) had a significantly different effect in the NLST cohort compared to the PanCan. Among the new covariates, only the pack-year history was found to be significant (p<0.01).

CONCLUSION

While the model achieved high AUC, discrimination and calibration remain suboptimal, motivating our efforts to improve additional clinical, imaging, and the evolution of covariates over time that could influence performance.

CLINICAL RELEVANCE/APPLICATION

External validation is necessary to assess generalizability of a prediction model to new patients. We show how discrimination and calibration can be examined to assess how models can likely enter in clinical practice.