Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings
To investigate the feasibility of a deep learning–based detection (DLD) system for multiclass lesions on chest radiograph, in comparison with observers.
A total of 15,809 chest radiographs were collected from two tertiary hospitals (7204 normal and 8605 abnormal with nodule/mass, interstitial opacity, pleural effusion, or pneumothorax). Except for the test set (100 normal and 100 abnormal (nodule/mass, 70; interstitial opacity, 10; pleural effusion, 10; pneumothorax, 10)), radiographs were used to develop a DLD system for detecting multiclass lesions. The diagnostic performance of the developed model and that of nine observers with varying experiences were evaluated and compared using area under the receiver operating characteristic curve (AUROC), on a per-image basis, and jackknife alternative free-response receiver operating characteristic figure of merit (FOM) on a per-lesion basis. The false-positive fraction was also calculated.
Compared with the group-averaged observations, the DLD system demonstrated significantly higher performances on image-wise normal/abnormal classification and lesion-wise detection with pattern classification (AUROC, 0.985 vs. 0.958; p = 0.001; FOM, 0.962 vs. 0.886; p < 0.001). In lesion-wise detection, the DLD system outperformed all nine observers. In the subgroup analysis, the DLD system exhibited consistently better performance for both nodule/mass (FOM, 0.913 vs. 0.847; p < 0.001) and the other three abnormal classes (FOM, 0.995 vs. 0.843; p < 0.001). The false-positive fraction of all abnormalities was 0.11 for the DLD system and 0.19 for the observers.
The DLD system showed the potential for detection of lesions and pattern classification on chest radiographs, performing normal/abnormal classifications and achieving high diagnostic performance.
• The DLD system was feasible for detection with pattern classification of multiclass lesions on chest radiograph.
• The DLD system had high performance of image-wise classification as normal or abnormal chest radiographs (AUROC, 0.985) and showed especially high specificity (99.0%).
• In lesion-wise detection of multiclass lesions, the DLD system outperformed all 9 observers (FOM, 0.962 vs. 0.886; p < 0.001).
KeywordsDeep learning Thoracic radiography Automated pattern recognition Classification
Area under the curve
Area under the receiver operating characteristic curve
Deep learning–based detection
Figure of merit
Jackknife alternative free-response receiver operating characteristic curve
Receiver operating characteristic
This study has received funding from the Industrial Strategic Technology Development Program (10072064, Development of Novel Artificial Intelligence Technologies To Assist Imaging Diagnosis of Pulmonary, Hepatic, and Cardiac Diseases and Their Integration into Commercial Clinical PACS Platforms), which is funded by the Ministry of Trade Industry and Energy (MI, South Korea).
Compliance with ethical standards
The scientific guarantor of this publication is Sang Min Lee.
Conflict of interest
The authors declare that they have no conflict of interest.
Statistics and biometry
The statistician of our institution (Seon Ok Kim) kindly provided statistical advice for this manuscript.
Written informed consent was waived by the institutional review board.
Institutional review board approval was obtained.
• diagnostic or prognostic study
• multicenter study
- 13.Annarumma M, Withey SJ, Bakewell RJ, Pesce E, Goh V, Montana G (2019) Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology. https://doi.org/10.1148/radiol.2018180921:180921