Selecting Features with Group-Sparse Nonnegative Supervised Canonical Correlation Analysis: Multimodal Prostate Cancer Prognosis
This paper presents Group-sparse Nonnegative supervised Canonical Correlation Analysis (GNCCA), a novel methodology for identifying discriminative features from multiple feature views. Existing correlation-based methods do not guarantee positive correlations of the selected features and often need a pre-feature selection step to reduce redundant features on each feature view. The new GNCCA approach attempts to overcome these issues by incorporating (1) a nonnegativity constraint that guarantees positive correlations in the reduced representation and (2) a group-sparsity constraint that allows for simultaneous between- and within- view feature selection. In particular, GNCCA is designed to emphasize correlations between feature views and class labels such that the selected features guarantee better class separability. In this work, GNCCA was evaluated on three prostate cancer (CaP) prognosis tasks: (i) identifying 40 CaP patients with and without 5-year biochemical recurrence following radical prostatectomy by fusing quantitative features extracted from digitized pathology and proteomics, (ii) predicting in vivo prostate cancer grade for 16 CaP patients by fusing T2w and DCE MRI, and (iii) localizing CaP/benign regions on MR spectroscopy and MRI for 36 patients. For the three tasks, GNCCA identifies a feature subset comprising 2%, 1% and 22%, respectively, of the original extracted features. These selected features achieve improved or comparable results compared to using all features with the same Support Vector Machine (SVM) classifier. In addition, GNCCA consistently outperforms 5 state-of-the-art feature selection methods across all three datasets.
KeywordsFeature Selection Canonical Correlation Analysis Feature Selection Method Nonnegative Matrix Factorization Linear Support Vector Machine
- 2.Singanamalli, A., Wang, H., et al.: Supervised multi-view canonical correlation analysis: Fused multimodal prediction of disease diagnosis and prognosis. In: SPIE Medical Imaging, vol. 9038 (2014)Google Scholar
- 4.Ginsburg, S., Tiwari, P., Kurhanewicz, J., Madabhushi, A.: Variable ranking with pca: Finding multiparametric mr imaging markers for prostate cancer diagnosis and grading. In: Madabhushi, A., Dowling, J., Huisman, H., Barratt, D. (eds.) Prostate Cancer Imaging 2011. LNCS, vol. 6963, pp. 146–157. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 9.Jingu Kim, R.M., Park, H.: Group sparsity in nonnegative matrix factorization. In: SIAM International Conference on Data Mining (SDM), pp. 851–862 (2012)Google Scholar
- 10.Duchi, J., et al.: Efficient projections onto the l1-ball for learning in high dimensions. In: The 25th International Conference on Machine Learning (ICML), pp. 272–279 (2008)Google Scholar
- 12.Gleason, D.F.: The veteran’s administration cooperative urologic research group: histologic grading and clinical staging of prostatic carcinoma. In: Urologic Pathology: The Prostate, pp. 171–198 (1977)Google Scholar