Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification

Wang, Shulin; Li, Xueling; Zhang, Shanwen

doi:10.1007/978-3-540-87442-3_20

Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification

Shulin Wang^5,6,
Xueling Li⁵ &
Shanwen Zhang⁵

Conference paper

1647 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5226))

Abstract

Multi-subtype tumor diagnosis based on gene expression profiles is promising in clinical medicine application. Therefore, a great deal of research on tumor classification based on gene expression profiles has been developed, where various machine learning approaches were applied to constructing the best tumor classification model to improve the classification performance as much as possible. To achieve this goal, extracting features or finding informative genes that have good classification ability is crucial. We propose a novel gene selection approach, which adopts Kruskal-Wallis rank sum test to rank all genes and then apply an algorithm based on neighborhood rough set model to gene reduction to obtain gene subsets with fewer genes and more classification ability. Experiments on a small round blue cell tumor (SRBCT) dataset show that our approach can achieve very high classification accuracy with only three or four genes as evaluated by three classifiers: support vector machines, K-nearest neighbor and neighborhood classifier, respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fu, L.M., Fu-Liu, C.S.: Multi-class Cancer Subtype Classification Based on Gene Expression Signatures with Reliablity Analysis. FEBS Lett. 561, 186–190 (2004)
Article Google Scholar
Fung, B.Y.M., Vincent, T.Y.N.: Meta-classification of Multi-type Cancer Gene Expression Data. BIOKDD, 31–39 (2004)
Google Scholar
Chen, D.C., Liu, Z.Q., Ma, X.B., Hua, D.: Selecting Genes by Test Statistics. J. Biomed. Biotechnol. 2, 132–138 (2005)
Article Google Scholar
Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Hauessler, D.: Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinform. 16(10), 906–914 (2000)
Article Google Scholar
Xiong, M.M., Li, W.J., Zhao, J.Y., Li, J., Boerwinkle, E.: Feature (Gene) Selection in Gene Expression-based Tumor Classification. Mol. Genet. Metab. 73, 239–247 (2001)
Article Google Scholar
Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved Gene Selection for Classification of Microarrays. In: Pacific Symposium on Biocomputing, vol. 8, pp. 53–64 (2003)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue Classification with Gene Expression Profiles. J. Comput. Biol. 7(3-4), 559–584 (2000)
Article Google Scholar
Deng, L., Ma, J.W., Pei, J.: Rank Sum Method for Related Gene Selection and Its Application to Tumor Diagnosis. Chinese Sci. Bull. 49(15), 1652–1657 (2004)
Article MATH MathSciNet Google Scholar
Xiong, M.M., Fang, X.Z., Zhao, J.Y.: Biomarker Identification by Feature Wrappers. Genome Research 11(11), 1878–1887 (2001)
Google Scholar
Hu, Q.H., Yu, D.R., Xie, Z.X.: Neighborhood Classifiers. Expert Syst. Appl. 34(2), 866–876 (2008)
Article Google Scholar
Hu, Q.H., Yu, D.R., Xie, Z.X.: Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation. J. Software 19(3), 640–649 (2008)
Article Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Google Scholar
Lehmann, E.L.: Non-parametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco (1975)
Google Scholar
Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometr. 1, 80–83 (1945)
Google Scholar
Kruskal, W.H., Wallis, W.A.: Use of Ranks in One-criterion Variance Analysis. J. Amer. Statist. Assoc. 47(260), 583–621 (1952)
Article MATH Google Scholar
Deng, L., Pei, J., Ma, J.W., Lee, D.L.: A Rank Sum Test Method for Informative Gene discovery. In: KDD 2004, Seattle, USA, pp. 410–419 (2004)
Google Scholar
Wang, S.L., Chen, H.W., Li, F.R., Zhang, D.X.: Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Based on Support Vector Machines. In: International Computer Symposium, Taiwan, pp. 1368–1373 (2006)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Springer, New York (1998)
MATH Google Scholar
Dasarathy, B.: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Google Scholar
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine 7(6), 673–679 (2001)
Article Google Scholar
Deutsch, J.M.: Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction. Bioinform. 19(1), 45–52 (2003)
Article Google Scholar
Wang, L.P., Chu, F., Xie, W.: Accurate Cancer Classification Using Expressions of Very Few Genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(1), 40–53 (2007)
Article MathSciNet Google Scholar
Lee, Y., Lee, C.K.: Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data. Bioinform. 19(9), 1132–1139 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Heifei, Anhui, 230031, China
Shulin Wang, Xueling Li & Shanwen Zhang
School of Computer and Communication, Hunan University, Changsha, Hunan, 410082, China
Shulin Wang

Authors

Shulin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xueling Li
View author publications
You can also search for this author in PubMed Google Scholar
Shanwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Automation, University of Science and Technology of China, 230026, Hefei, Anhui, China
De-Shuang Huang
Applied Computational Intelligence Laboratory, P.O. Box
Donald C. Wunsch II
Arlington, USA
Daniel S. Levine
Graduate School of Electrical Engineering, University of Ulsan, Korea, San 29, Mugeo-Dong, 680 - 749, Nam-Ku, Ulsan, Korea
Kang-Hyun Jo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Li, X., Zhang, S. (2008). Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues. ICIC 2008. Lecture Notes in Computer Science, vol 5226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87442-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-87442-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87440-9
Online ISBN: 978-3-540-87442-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics