Abstract
In this work, we study the behavior of a feature selection algorithm (backwards selection) using random forests, by fusing multi-modal data from different subjects. Two separate datasets related to cutaneous melanoma, obtained from image (dermoscopy) and non-image (microarray) sources are used. Imputations are applied in order to acquire a unified dataset, prior the effect of machine learning algorithms. The results suggest that application of the normal random imputation method acts as an additional variation factor, helping towards stability of potential recommended biomarkers. In addition, microarray-derived features were favorably selected as best predictors compared to image-derived features.
Chapter PDF
Similar content being viewed by others
References
Fenner, J.W., Brook, B., Clapworthy, G., Coveney, P.V., Feipel, V., Gregersen, H., Hose, D.R., Kohl, P., Lawford, P., McCormack, K.M., Pinney, D., Thomas, S.R., Van Sint Jan, S., Waters, S., Viceconti, M.: The EuroPhysiome, STEP and a roadmap for the virtual physiological human. Philos. Transact. A Math. Phys. Eng. Sci. 366, 2979–2999 (2008)
Rohlfing, T., Pfefferbaum, A., Sullivan, E.V., Maurer, C.R.: Information fusion in biomedical image analysis: combination of data vs. combination of interpretations. Inf. Process. Med. Imaging 19, 150–161 (2005)
Haapanen, R., Tuominen, S.: Data combination and feature selection for multi-source forest inventory. Photogrammetric Engineering and Remote Sensing 74, 869–880 (2008)
Jesneck, J.L., Nolte, L.W., Baker, J.A., Floyd, C.E., Lo, J.Y.: Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis. Med. Phys. 33, 2945–2954 (2006)
Lee, G., Doyle, S., Monaco, J., Madabhushi, A., Feldman, M.D., Master, S.R., Tomaszewski, J.E.: A knowledge representation framework for integration, classification of multi-scale imaging and non-imaging data: preliminary results in predicting prostate cancer recurrence by fusing mass spectrometry and histology. In: Proceedings of the Sixth IEEE International Conference on Symposium on Biomedical Imaging: From Nano to Macro, pp. 77–80. IEEE Press, Boston (2009)
Tiwari, P., Viswanath, S., Lee, G., Madabhushi, A.: Multi-Modal Data Fusion Schemes for Integrated Classification of Imaging and Non-Imaging Biomedical Data. In: IEEE International Symposium on Biomedical Imaging (ISBI), pp. 165–168 (2011)
Balazs, M., Ecsedi, S., Vizkeleti, L., Begany, A.: Genomics of Human Malignant Melanoma. In: Tanaka, Y. (ed.) Breakthroughs in Melanoma Research. InTech (2011)
Timar, J., Gyorffy, B., Raso, E.: Gene signature of the metastatic potential of cutaneous melanoma: too much for too little? Clin. Exp. Metastasis 27, 371–387 (2010)
Martins, W.K., Esteves, G.H., Almeida, O.M., Rezze, G.G., Landman, G., Marques, S.M., Carvalho, A.F., Reis, L.F.L., Duprat, J.P., Stolf, B.S.: Gene network analyses point to the importance of human tissue kallikreins in melanoma progression. BMC Med. Genomics 4, 76 (2011)
Ogorzalek, M., Nowak, L., Surowka, G., Alekseenko, A.: Modern Techniques for Computer-Aided Melanoma Diagnosis. In: Murph, M. (ed.) Melanoma in the Clinic - Diagnosis, Management and Complications of Malignancy. InTech (2011)
Maglogiannis, I., Doukas, C.N.: Overview of advanced computer vision systems for skin lesions characterization. IEEE Trans. Inf. Technol. Biomed. 13, 721–733 (2009)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Breiman, L.: Random Forests. Machine Learning, 5–32 (2001)
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23, 687–719 (2009)
Liaw, A., Wiener, M.: Classification and Regression by randomForest. R News 2, 18–22 (2002)
Chen, C., Liaw, A., Breiman, L.: Using Random Forest to Learn Imbalanced Data (2004), http://www.stat.berkeley.edu/users/chenchao/666.pdf
R Foundation for Statistical Computing, http://www.R-project.org
Maragoudakis, M., Maglogiannis, I.: Skin lesion diagnosis from images using novel ensemble classification techniques. In: 10th IEEE EMBS International Conference on Information Technology Applications in Biomedicine, Corfu, Greece (2010)
Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Muertter, R.N., Holko, M., Ayanbule, O., Yefanov, A., Soboleva, A.: NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011)
Talantov, D., Mazumder, A., Yu, J.X., Briggs, T., Jiang, Y., Backus, J., Atkins, D., Wang, Y.: Novel genes associated with malignant melanoma but not benign melanocytic lesions. Clin. Cancer Res. 11, 7234–7242 (2005)
Davis, S., Meltzer, P.: GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 14, 1846–1847 (2007)
Smyth, G.K.: Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor, pp. 397–420. Springer, New York (2005)
Gentleman, R.C., Carey, V.J., Bates, D.M., et al.: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5, R80 (2004)
Horton, N.J., Kleinman, K.P.: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61, 79–90 (2007)
Wickham, H.: The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software 40, 1–29 (2011)
Kuhn, M., Weston, S., Williams, A., Keefer, C., Engelhardt, A.: caret: Classification and Regression Training (2012), http://CRAN.R-project.org/package=caret
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)
He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34, 215–225 (2010)
Breiman, L.: Manual on setting up, using, and understanding random forests v3.1. p. 10, 11 (2002), http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Moutselos, K., Chatziioannou, A., Maglogiannis, I. (2012). Feature Selection Study on Separate Multi-modal Datasets: Application on Cutaneous Melanoma. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H., Karatzas, K., Sioutas, S. (eds) Artificial Intelligence Applications and Innovations. AIAI 2012. IFIP Advances in Information and Communication Technology, vol 382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33412-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-33412-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33411-5
Online ISBN: 978-3-642-33412-2
eBook Packages: Computer ScienceComputer Science (R0)