Abstract
Microarray gene expression profile data is used to accurately predict different tumor types, which has great value in providing better treatment and toxicity minimization on the patients. However, it is difficult to classify different tumor types using microarray data because the number of samples is much smaller than the number of genes. It has been proved that a small feature gene subset can improve classification accuracy, so feature gene selection and extraction algorithm is very important in tumor classification. In this paper, a novel hybrid gene selection method is proposed to find a feature gene subset so that the feature genes related to certain cancer can be kept and the redundant genes can be leave out. In the proposed method, we combine the advantages of the PCA and the LDA and proposed a novel feature gene extraction scheme. We also compared several kinds of parametric and non-parametric feature gene selection methods. We use the SVM as the classifier in the experiment and compare the performance of three common SVM kernels. Their differences are analyzed. Using the n-fold cross validation, the proposed algorithm is carried out on three published benchmark tumor datasets and experimental results show that this algorithm leads to better classification performance than other methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Harrington, C.A., Rosenow, C., Retief, J.: Monitoring Gene Expression Using DNA Microarrays. Int. J. Current Opinion in Microbiology 3(3), 285–291 (2000)
Patra, J.C., Lim, G.P., Meher, P.K.: DNA Microarray Data Analysis: Effective Feature Selection for Accurate Cancer Classification. In: IJCNN 2007, pp. 260–265 (2007)
Kohavi, R., John, G.H.: Wrapper for Feature Subset Selection. Artif. Intell. 97(1/2), 273–324 (1997)
Zhang, H.P., Yu, C.Y., Singer, B., Xiong, M.M.: Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data. PNAS 98(12), 6730–6735 (2001)
Chu, W., Ghahramani, Z., Falciani, F., Wild, D.L.: Biomarker Discovery in Microarray Gene Expression Data with Gaussian Processes. Bioinformatics 21(16), 3385–3393 (2005)
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Agnes, J.M., Haussler, D.: Support Vector Machine Classification of Microarray Gene Expression Data. Technical Report, U. California (Santa Cruz) (1999)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Guyon, I., Weston, J., Barnhill, S.: Gene Selection for Cancer Classification Using Support Vector Machines. Mach. Learn. 46, 389–422 (2002)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 1157–1182 (2003)
Wang, Y.H., Makedon, F.S., Ford, J.C., Pearlman, J.: HykGene: A Hybrid Approach for Selecting Marker Genes for Phenotype Classification Using Microarray Gene Expression Data. Bioinformatics 21(8), 1530–1537 (2005)
Deng, L., Pei, J., Ma, J., Lee, D.L.: A Rank Sum Test Method for Informative Gene Discovery. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, USA, pp. 22–25 (2004)
Lehmann, E.L.: Non-parametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco (1975)
Liu, Z.Q., Chen, D.C., Bensmail, H.: Gene Expression Data Classification with Kernel Principal Component Analysis. Journal of Biomedicine and Biotechnology, 155–159 (2005)
Joliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002)
Niijima, S., Okuno, Y.: Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection. IEEE/ACM Transactions on Computational Biology and Bioinformatics (to appear, 2008)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1992)
Burges, C.: A Tutorial on Support Vector Machines for Pattern Recognition. Kluwer Academic Publishers, Dordrecht (1998)
Wang, S.L., Wang, J., Chen, H.W., Tang, W.S.: The Classification of Tumor Using Gene Expression Profile Based on Support Vector Machines and Factor Analysis. In: Intelligent Systems Design and Applications, Jinan, China, pp. 471–476. IEEE Computer Society Press, Los Alamitos (2006)
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
You, Z., Wang, S., Gui, J., Zhang, S. (2008). A Novel Hybrid Method of Gene Selection and Its Application on Tumor Classification. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_127
Download citation
DOI: https://doi.org/10.1007/978-3-540-85984-0_127
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85983-3
Online ISBN: 978-3-540-85984-0
eBook Packages: Computer ScienceComputer Science (R0)