Abstract
Rotation Forest (RF) is an ensemble method that has shown effectiveness on microarray data set classification problems. RF works by generating sparse rotation matrixes of the input space, a method that creates accurate and diverse base classifiers. In its original formulation, elemental rotations were obtained by Principal Component Analysis (PCA). However, for microarray data sets, Independent Component Analysis (ICA) may be a better option. In this paper, an experimental study on ten microarray data sets has been performed. The study confirms that, except for a small number of attributes, Rotation Forest outperforms Bagging and Boosting on this domain. However, RF with ICA does not generally improve on RF with PCA.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Fern, X.Z., Broadley, C.E.: Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proc. 20th International Conference on Machine Learning, ICML, pp. 186–193 (2003)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. of Computer and System Sciences 55(1), 119–139 (1997)
Fukunaga, K., Mantock, J.: Nonparametric discriminant analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(3), 671–678 (1983)
Garcia, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
Golub, T.R., Stomin, D.K., Tamayo, P.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Han, J., Kanber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Networks 14(4-5), 411–430 (2000)
Kuncheva, L.I., Rodríguez, J.J.: An experimental study on rotation forest ensembles. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 459–468. Springer, Heidelberg (2007)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley Interscience, Hoboken (2004)
Lambertz, M.: Fastica for java (2006), http://sourceforge.net/projects/fastica/
Lee, S., Batzoglou, S.: Application of independent component analysis to microarrays. Genome Biology 4(11) (2003)
Li, W., Yang, Y.: How many genes are needed for a discriminant microarray data analysis? In: Critical Assessment of Techniques for Microarray Data Mining Workshop, pp. 137–150 (2000)
Liebermeister, W.: Linear modes of gene expressions determined by independent component analysis. Bioinformatics 18, 51–56 (2002)
Liu, K., Huang, D.: Cancer classification using rotation forest. Computers in Biology and Medicine 38, 601–610 (2008)
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)
Nanni, L., Lumini, A.: Using ensemble of classifiers in Bioinformatics. In: Machine Learning Research Progress. Nova Science publisher (2009)
Ridge, K.: Kent ridge bio-medical dataset (2009), http://datam.i2r.a-star.edu.sg/datasets/krbd/
Rodríguez, J.J., Kuncheva, L.I., Alonso-González, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1621 (2006)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Stiglic, G., Rodríguez, J.-J., Kokol, P.: Feature selection and classification for small gene sets. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 121–131. Springer, Heidelberg (2008)
Symons, S., Nieselt, K.: Data mining microarray data - Comprehensive benchmarking of feature selection and classification methods, Pre-print, www.zbit.uni-tuebingen.de/pas/preprints/GCB2006/SymonsNieselt.pdf
Tang, Y., Zhang, Y., Huang, Z.: FCM-SVM-RFE gene feature selection algorithm for leukemia classification from microarray gene expression data. In: FUZZ 2005, The 14th IEEE International Conference on Fuzzy Systems, pp. 97–101 (2005)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Xiong, M., Fang, Z., Zhao, J.: Biomarker identification by feature wrappers. Genome Research 11, 1878–1887 (2001)
Zhang, X.W., Yap, Y.L., Wei, D., Chen, F., Danchin, A.: Molecular diagnosis of human cancer type by gene expresion profiles and independent component analysis. European J. Human Genetics 13, 1303–1311 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alonso-González, C.J., Moro-Sancho, Q.I., Ramos-Muñoz, I., Simón-Hurtado, M.A. (2010). Rotation Forest on Microarray Domain: PCA versus ICA. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13025-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-13025-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13024-3
Online ISBN: 978-3-642-13025-0
eBook Packages: Computer ScienceComputer Science (R0)