Abstract
Multi-subtype tumor diagnosis based on gene expression profiles is promising in clinical medicine application. Therefore, a great deal of research on tumor classification based on gene expression profiles has been developed, where various machine learning approaches were applied to constructing the best tumor classification model to improve the classification performance as much as possible. To achieve this goal, extracting features or finding informative genes that have good classification ability is crucial. We propose a novel gene selection approach, which adopts Kruskal-Wallis rank sum test to rank all genes and then apply an algorithm based on neighborhood rough set model to gene reduction to obtain gene subsets with fewer genes and more classification ability. Experiments on a small round blue cell tumor (SRBCT) dataset show that our approach can achieve very high classification accuracy with only three or four genes as evaluated by three classifiers: support vector machines, K-nearest neighbor and neighborhood classifier, respectively.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Fu, L.M., Fu-Liu, C.S.: Multi-class Cancer Subtype Classification Based on Gene Expression Signatures with Reliablity Analysis. FEBS Lett. 561, 186–190 (2004)
Fung, B.Y.M., Vincent, T.Y.N.: Meta-classification of Multi-type Cancer Gene Expression Data. BIOKDD, 31–39 (2004)
Chen, D.C., Liu, Z.Q., Ma, X.B., Hua, D.: Selecting Genes by Test Statistics. J. Biomed. Biotechnol. 2, 132–138 (2005)
Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Hauessler, D.: Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinform. 16(10), 906–914 (2000)
Xiong, M.M., Li, W.J., Zhao, J.Y., Li, J., Boerwinkle, E.: Feature (Gene) Selection in Gene Expression-based Tumor Classification. Mol. Genet. Metab. 73, 239–247 (2001)
Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved Gene Selection for Classification of Microarrays. In: Pacific Symposium on Biocomputing, vol. 8, pp. 53–64 (2003)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue Classification with Gene Expression Profiles. J. Comput. Biol. 7(3-4), 559–584 (2000)
Deng, L., Ma, J.W., Pei, J.: Rank Sum Method for Related Gene Selection and Its Application to Tumor Diagnosis. Chinese Sci. Bull. 49(15), 1652–1657 (2004)
Xiong, M.M., Fang, X.Z., Zhao, J.Y.: Biomarker Identification by Feature Wrappers. Genome Research 11(11), 1878–1887 (2001)
Hu, Q.H., Yu, D.R., Xie, Z.X.: Neighborhood Classifiers. Expert Syst. Appl. 34(2), 866–876 (2008)
Hu, Q.H., Yu, D.R., Xie, Z.X.: Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation. J. Software 19(3), 640–649 (2008)
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Lehmann, E.L.: Non-parametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco (1975)
Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometr. 1, 80–83 (1945)
Kruskal, W.H., Wallis, W.A.: Use of Ranks in One-criterion Variance Analysis. J. Amer. Statist. Assoc. 47(260), 583–621 (1952)
Deng, L., Pei, J., Ma, J.W., Lee, D.L.: A Rank Sum Test Method for Informative Gene discovery. In: KDD 2004, Seattle, USA, pp. 410–419 (2004)
Wang, S.L., Chen, H.W., Li, F.R., Zhang, D.X.: Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Based on Support Vector Machines. In: International Computer Symposium, Taiwan, pp. 1368–1373 (2006)
Vapnik, V.N.: Statistical Learning Theory. Springer, New York (1998)
Dasarathy, B.: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine 7(6), 673–679 (2001)
Deutsch, J.M.: Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction. Bioinform. 19(1), 45–52 (2003)
Wang, L.P., Chu, F., Xie, W.: Accurate Cancer Classification Using Expressions of Very Few Genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(1), 40–53 (2007)
Lee, Y., Lee, C.K.: Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data. Bioinform. 19(9), 1132–1139 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, S., Li, X., Zhang, S. (2008). Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2008. Lecture Notes in Computer Science, vol 5226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87442-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-87442-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87440-9
Online ISBN: 978-3-540-87442-3
eBook Packages: Computer ScienceComputer Science (R0)