RSNA 2019

Abstract Archives of the RSNA, 2019


An Ensemble of Models with a Multi-Threshold Approach to Improve Chest X-Ray Predictions

Sunday, Dec. 1 11:55AM - 12:05PM Room: E450A

FDA Discussions may include off-label uses.

Jessica d. de Oliveira, MSc, Sao Paulo , Brazil (Presenter) Employee, NeuralMed
Maria Fernanda B. Wanderley, DSc, Sao Paulo, Brazil (Abstract Co-Author) Employee, Neuralmed
Vitor De Mario, Sao Paulo , Brazil (Abstract Co-Author) Employee, NeuralMed
Andre C. Castilla, MD,PhD, Sao Paulo, Brazil (Abstract Co-Author) Stockholder, Neuralmed
Anthony Eigier, BA, Sao Paulo, Brazil (Abstract Co-Author) Stockholder, NeuralMed

For information about this presentation, contact:


Our main goal is to assess if deep learning can decrease the list of exams that radiologists need to read, with minimal loss of critical cases. We propose an ensemble with a multi-threshold approach, focusing on the detection of general opacities.


We use four public datasets: JSTR, OpenI, Shenzen, and Chest-Xray14. After removing some lateral and low quality images, the total amount of images were 117,094 images. We cut the images surrounding the lung mask predicted with a trained U-net, applied a Limited Adaptive Histogram Equalization (CLAHE), resized to 384x384 and normalized based on the mean and standard deviation of images in the ImageNet. Then we developed three models: M1: a binary classifier to detect if an image has some finding or if it is normal M2: a multilabel trained with all images to predict five classes: mass/nodule, edema, atelectasis, alveolar opacity, and non-opacity. M3: a multilabel to predict the same five classes, but without the normal images in the training set. All of them use Inception V4. The ensemble was created using a weighted average in the form: (4*ym1 + 3*ym2 + 3*ym3)/10. We calculate the AUC of ROC Curve and choose two best cut-points using Youden's index.


The mean F1 Score of our model is 0.478 among all classes with an AUC of 0.90 for mass/nodule, 0.86 for edema, 0.85 for atelectasis, 0.86 for alveolar opacity and 0.93 for nonopacity. Analyzing the predictions, we saw that normal images had lower values, the target classes had high values, and in the middle values were images of other pathologies. This justifies the use of two thresholds. With the two thresholds, the general quality of our model is improved. We correctly classified more than 70% of all normal images with just 5% of False Negative Rate (FNR) and the average True Positive Rate (TPR) is 44% in the target classes.


The image preprocessing along with the use of ensemble and multi thresholds techniques produced a model with greater certainty and better results.


We can accelerate the radiologist's work by detecting 70% of normal images, decreasing the number of images analyzed and suggesting the pathology according to what was predicted.

Printed on: 03/01/22