Unnecessity of Real Likeness in Synthetic Training Data Augmentation for Deep Learning Classification of CT Focal Liver Lesions

Participants
Hansang Lee, Daejeon, Korea, Republic Of (Presenter) Nothing to Disclose
Haeil Lee, Daejeon, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Helen Hong, PhD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Heejin Bae, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Sungwon Kim, MD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Joonseok Lim, MD, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose
Junmo Kim, Seoul, Korea, Republic Of (Abstract Co-Author) Nothing to Disclose

For information about this presentation, contact:

hansanglee@kaist.ac.kr

CONCLUSION

Our preliminary results suggest that the diverse types of generative networks suitable for training data augmentation can be developed in addition to the current GAN structures based on the generator and the real-fake discriminator.

Background

Data augmentation is a process of increasing data size and diversifying patterns to improve machine learning efficiency. Recent studies have widely adopted the generative adversarial network (GAN) method for training data augmentation, which generates real-like synthetic images. However, real-likeness regularization through real-fake discriminator learning, which is a key part of GAN, is not only computationally expensive but also is uncertain if it improves learning efficiency through data augmentation. In our work, we investigate the necessity of real-likeness regularization in synthetic data augmentation by observing the performance of synthetic data generated by the generation network without the real-fake discriminator.

Evaluation

We used a dataset of 502 CT scans including 676 cysts, 130 hemangiomas, and 484 metastases. We constructed the training data of three settings; (1) the non-augmented data, (2) the data augmented with real-like synthetic data generated by DCGAN, and (3) the data augmented with non-real-like synthetic data generated by GTN, which is a generation network model without real-fake discriminator. For each of these data, AlexNet was trained to compare the performances. In experiments, (2) and (3) showed competitive accuracies of 75.2% and 77.5%, respectively, and both outperformed (1) with accuracy of 73.0%.

Discussion

The conventional GAN has shown a breakthrough in generating real-like images through regularized learning using the real-fake discriminator. But this real-likeness regularization was originally designed for visualizing images, not for augmenting training data and improving classification performances. Our results showed that the non-real-like synthetic data achieved competitive performances compared to the real-like synthetic data. This suggests that various regularizations suitable for data augmentation should be considered instead of real-likeness regularization in the existing GAN.

Abstract Archives of the RSNA, 2020