Comparison of prediction models with radiological semantic features and radiomics in lung cancer diagnosis of the pulmonary nodules: a case-control study
- 291 Downloads
To compare the ability of radiological semantic and quantitative texture features in lung cancer diagnosis of pulmonary nodules.
Materials and methods
A total of N = 121 subjects with confirmed non-small-cell lung cancer were matched with 117 controls based on age and gender. Radiological semantic and quantitative texture features were extracted from CT images with or without contrast enhancement. Three different models were compared using LASSO logistic regression: “CS” using clinical and semantic variables, “T” using texture features, and “CST” using clinical, semantic, and texture variables. For each model, we performed 100 trials of fivefold cross-validation and the average receiver operating curve was accessed. The AUC of the cross-validation study (AUCCV) was calculated together with its 95% confidence interval.
The AUCCV (and 95% confidence interval) for models T, CS, and CST was 0.85 (0.71–0.96), 0.88 (0.77–0.96), and 0.88 (0.77–0.97), respectively. After separating the data into two groups with or without contrast enhancement, the AUC (without cross-validation) of the model T was 0.86 both for images with and without contrast enhancement, suggesting that contrast enhancement did not impact the utility of texture analysis.
The models with semantic and texture features provided cross-validated AUCs of 0.85–0.88 for classification of benign versus cancerous nodules, showing potential in aiding the management of patients.
• Pretest probability of cancer can aid and direct the physician in the diagnosis and management of pulmonary nodules in a cost-effective way.
• Semantic features (qualitative features reported by radiologists to characterize lung lesions) and radiomic (e.g., texture) features can be extracted from CT images.
• Input of these variables into a model can generate a pretest likelihood of cancer to aid clinical decision and management of pulmonary nodules.
KeywordsLung cancer Tomography Radiomics Semantics Statistical models
X-ray computed tomography
Non-small cell lung cancer
The bounding volume maximum length
Volume of interest
At the University of Washington Medical Center, we thank Steven R. Bowen, PhD, for helpful suggestions on the project, Nina A. Mayr, MD, and William T. Yuh for providing the access to MIM software and guidance on how to use the software.
This study has received funding by NIH grants U01CA148131, U01185097, U01186157, P30CA015704, and F32CA200265, as well as National Natural Science Foundation of China (No. 81471637).
Compliance with ethical standards
The scientific guarantor of this publication is Paul E. Kinahan.
Conflict of interest
Paul E. Kinahan received a research grant from GE Healthcare outside of this work, and is the cofounder of PET/X LLC.
All other coauthors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.
Statistics and biometry
Timothy W. Randolph and Yuzheng Zhang (two coauthors in our paper) have significant statistical expertise.
Written informed consent was waived by the Institutional Review Board.
Institutional Review Board approval was obtained.
• case-control study/diagnostic or prognostic study
• multicenter study
- 6.Swensen SJ, Silverstein MD, Ilstrup DM, Schleck CD, Edell ES (2008) The probability of malignancy in solitary pulmonary nodules. Arch Intern Med 157:849–855Google Scholar
- 16.Huang YQ, Liang CH, He L et al (2016) Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol 34:2157–2164Google Scholar
- 23.Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58:267–288Google Scholar
- 26.Armato SG 3rd, McNitt-Gray MF, Reeves AP et al (2007) The Lung Image Database Consortium (LIDC): an evaluation of radiologist variability in the identification of lung nodules on CT scans. Acad Radiol 14:1409–1421Google Scholar
- 38.Hastie T, Tibshirani R, Sherlock G, Eisen M, Brown P, Botstein D (1999) Imputing missing data for gene expression arrays. Stanford University Statistics Department Technical report. URL: http://www-stat.stanford.edu/~hastie/Papers/missing.pdf. Last downloaded 2019-03-01