Advertisement

A Novel Metric for Redundant Gene Elimination Based on Discriminative Contribution

  • Xue-Qiang Zeng
  • Guo-Zheng Li
  • Jack Y. Yang
  • Mary Qu Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4983)

Abstract

As a high dimensional problem, analysis of microarray data sets is a hard task, where many weakly relevant but redundant features hurt generalization performance of classifiers. There are previous works to handle this problem by using linear or nonlinear filters, but these filters do not consider discriminative contribution of each feature by utilizing the label information. Here we propose a novel metric based on discriminative contribution to perform redundant feature elimination. By the new metric, complementary features are likely to be reserved, which is beneficial for the final classification. Experimental results on several microarray data sets show our proposed metric for redundant feature elimination based on discriminative contribution is better than the previous state-of-arts linear or nonlinear metrics on the problem of analysis of microarray data sets.

Keywords

Support Vector Machine Feature Selection Mutual Information Discriminative Ability Label Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary dna microarray. Science 270, 467–470 (1995)CrossRefGoogle Scholar
  2. 2.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression. Bioinformatics & Computational Biology 286(5439), 531–537 (1999)Google Scholar
  3. 3.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 6745–6750 (1999)Google Scholar
  4. 4.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Dougherty, E.R.: Small sample issue for microarray-based classification. Comparative and Functional Genomics 2, 28–34 (2001)CrossRefGoogle Scholar
  6. 6.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Zhou, X., Tuck, D.P.: MSVM-RFE: Extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23, 1106–1114 (2006)CrossRefGoogle Scholar
  8. 8.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the Computational Systems Bioinformatics Conference, pp. 523–529 (2003)Google Scholar
  9. 9.
    Liu, H., Dougherty, E.R., Dy, J.G., Torkkola, K., Tuv, E., Peng, H., Ding, C., Long, F., Berens, M., Parsons, L., Yu, L., Zhao, Z., Forman, G.: Evolving feature selection. IEEE Transaction on Intelligent Systems 20(6), 64–76 (2005)CrossRefGoogle Scholar
  10. 10.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  11. 11.
    Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. 10th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, pp. 22–25 (2004)Google Scholar
  12. 12.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)MathSciNetGoogle Scholar
  13. 13.
    Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(6), 1437–1447 (2003)CrossRefGoogle Scholar
  14. 14.
    Guyon, I., Elisseefi, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3(7-8), 1157–1182 (2003)zbMATHCrossRefGoogle Scholar
  15. 15.
    Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)zbMATHCrossRefGoogle Scholar
  16. 16.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc, San Francisco (1993)Google Scholar
  17. 17.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. Cambridge University Press, Cambridge (1988)zbMATHGoogle Scholar
  18. 18.
    Li, J., Liu, H.: Kent ridge bio-medical data set repository (2002), http://sdmc.lit.org.sg/GEDatasets/Datasets.html
  19. 19.
    Van’t Veer, L.V., Dai, H., Vijver, M.V., He, Y., Hart, A., Mao, M., Peterse, H., Kooy, K., Marton, M., Witteveen, A., Schreiber, G., Kerkhoven, R., Roberts, C., Linsley, P., Bernards, R., Friend, S.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)CrossRefGoogle Scholar
  20. 20.
    Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Jr, J.H., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRefGoogle Scholar
  21. 21.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Xue-Qiang Zeng
    • 1
    • 2
  • Guo-Zheng Li
    • 1
    • 2
  • Jack Y. Yang
    • 3
  • Mary Qu Yang
    • 4
  1. 1.Institute of System BiologyShanghai UniversityShanghaiChina
  2. 2.School of Computer Engineering and ScienceShanghai UniversityShanghaiChina
  3. 3.Harvard Medical SchoolHarvard UniversityCambridgeUSA
  4. 4.Department of Health and Human Services BethesdaNational Human Genome Research Institute National Institutes of Health (NIH) U.S.USA

Personalised recommendations