Simultaneous Sample and Gene Selection Using T-score and Approximate Support Vectors

  • Piyushkumar A. Mundra
  • Jagath C. Rajapakse
  • D. A. K. Maduranga
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7986)

Abstract

T-score, based on t-statistics between samples and disease classes, is a widely used filter criterion for gene selection from microarray data. However, classical T-score uses all the training samples but for both biological and computational reasons, selection of relevant samples for training is an important step in classification. Using a modified logistic regression approach, we propose a sample selection criterion based on T-score and develop a backward elimination approach for gene selection. The method is more stable and computationally less costly compared to support vector machine recursive feature elimination (SVM-RFE) methods.

Keywords

data point selection gene selection instance selection logistic regression 

References

  1. 1.
    Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A.: Filter versus wrapper gene selection approaches in dna microarray domains. Artificial Intelligence Medicine 31, 91–103 (2004)CrossRefGoogle Scholar
  2. 2.
    Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(4), 1106–1119 (2012)CrossRefGoogle Scholar
  3. 3.
    Mundra, P.A., Rajapakse, J.C.: Svm-rfe with mrmr filter for gene selection. IEEE Transactions on Nanobioscience 9(1), 31–37 (2010)CrossRefGoogle Scholar
  4. 4.
    Rajapakse, J.C., Mundra, P.A.: Multiclass gene selection using pareto-fronts. IEEE/ACM Transactions on Computational Biology and Bioinformatics (accepted, 2013)Google Scholar
  5. 5.
    Guyon, I., Weston, J., Barhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)MATHCrossRefGoogle Scholar
  6. 6.
    Cavill, R., Keun, H., Holmes, E., Lindon, J., Nicholson, J., Ebbels, T.: Genetic algorithms for simultaneous variable and sample selection in metabonomics. Bioinformatics 25(1), 112–118 (2009)CrossRefGoogle Scholar
  7. 7.
    Chakraborty, S.: Simultaneous cancer classification and gene selection with bayesian nearest neighbor method: An integrated approach. Computational Statistics & Data Analysis 53(4), 1462–1474 (2009)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Hapfelmeier, A., Ulm, K.: A new variable selection approach using random forests. Computational Statistics & Data Analysis 60, 50–69 (2013)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kira, K., Rendell, L.A.: A feature selection problem: traditional methods and a new algorithm. In: Proc. of the 10th National Conference on Artificial Intelligence, pp. 129–134 (1992)Google Scholar
  10. 10.
    Wang, Y., Tetko, I., Hall, M., Frank, E., Facius, A., Mayer, K., Mewes, H.: Gene selection from microarray data for cancer classification - a machine learning approach. Computational Biology and Chemistry 29, 37–46 (2005)MATHCrossRefGoogle Scholar
  11. 11.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J Bioinformatics Computational Biology 3, 185–205 (2005)CrossRefGoogle Scholar
  12. 12.
    Tang, Y., Zhang, Y.Q., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE Trans on Computational Biology and Bioinformatics 4(3), 365–381 (2007)CrossRefGoogle Scholar
  13. 13.
    Tang, Y., Zhang, Y.Q., Huang, Z., Hu, X., Zhao, Y.: Recursive fuzzy granulation for gene subset extraction and cancer classification. IEEE Trans on Information Technology in Biomedicine 12(6), 723–730 (2008)CrossRefGoogle Scholar
  14. 14.
    Kai-Bo, D., Rajapakse, J., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4, 228–234 (2005)CrossRefGoogle Scholar
  15. 15.
    Yoon, S., Kim, S.: Adaboost-based multiple svm-rfe for classification of mammograms in ddsm. BMC Medical Informatics and Decision Making 9(S1), 693–708 (2009)Google Scholar
  16. 16.
    Abeel, T., Helleputte, T., Van de Peer, Y., Sayes, Y., et al.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)Google Scholar
  17. 17.
    Diaz-Uriarte, R., Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)Google Scholar
  18. 18.
    Zou, H., Hastie, T.: The regularization and variable selection via the elastic net. J. Royal Stat. Society B 67, 301–320 (2005)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience Publications (1998)Google Scholar
  20. 20.
    Freund, Y., Schapire, R.: A short introduction to boosting. J. Japanese Society for Artificial Intelligence 14(5), 771–780 (1999)Google Scholar
  21. 21.
    Clarke, R., Ressom, H., Wang, A., Xuan, J., et al.: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nature Reviews Cancer 8, 37–49 (2008)CrossRefGoogle Scholar
  22. 22.
    Han, Y., Yu, L.: A variance reduction framework for stable feature selection. In: Proc. of the 10th IEEE International Conference on Data Mining (2010)Google Scholar
  23. 23.
    Liu, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artificial Intelligence 159, 49–74 (2004)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Pechenizkiy, M., Puuronen, S., Tsymbal, A.: The impact of sample reduction on PCA-based feature extraction for supervised learning. In: Proc. of the 21st ACM Symposium on Applied Computing, pp. 553–558 (2006)Google Scholar
  25. 25.
    Shen, Q., Mei, Z., Ye, B.X.: Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification. Computers in Biology and Medicine 39, 646–649 (2009)CrossRefGoogle Scholar
  26. 26.
    Lei, Y., Yue, H., Berens, M.: Stable gene selection from microarray data via sample weighting. IEEE Transactions on Computational Biology and Bioinformatics 9(1), 262–272 (2012)CrossRefGoogle Scholar
  27. 27.
    Somol, P., Novovicova, J.: Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Transactions on Pattern Analysis and machine intelligence 32(11), 1921–1939 (2010)CrossRefGoogle Scholar
  28. 28.
    Haury, A.C., Gestraud, P., Vert, J.P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. Plos One 6(12), e28210 (2011)Google Scholar
  29. 29.
    Mundra, P.A., Rajapakse, J.C.: Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 73(13-15), 2353–2362 (2010)CrossRefGoogle Scholar
  30. 30.
    Mundra, P.A., Rajapakse, J.C.: Support vector based T-score for gene ranking. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 144–153. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Zhang, J., Jin, R., Yang, Y., Hauptmann, A.: Modified logistic regressionl an approximation to svm and its applications in large-scale text categorization. In: Proceedings of 20th International Conference on Machine Learning, ICML 2003 (2003)Google Scholar
  32. 32.
    Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)CrossRefGoogle Scholar
  33. 33.
    Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  34. 34.
    West, M., Blanchette, C., Dressman, H., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of National Academy of sciences 98(20), 11462–11467 (2001)CrossRefGoogle Scholar
  35. 35.
    Kuncheva, L.: A stability index for feature selection. In: Proceedings of the 25th IASTED International Conference on Artificial Intelligence and Applications, pp. 390–395 (2007)Google Scholar
  36. 36.
    Guyon, I., Elisseeff, A.: An introduction to feature extraction. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. STUDFUZZ, pp. 1–25. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  37. 37.
    Li, F., Yang, Y.: Analysis of recursive gene selection approaches from microarray data. Bioinformatics 21(19), 3741–3747 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Piyushkumar A. Mundra
    • 1
  • Jagath C. Rajapakse
    • 1
    • 2
    • 3
  • D. A. K. Maduranga
    • 1
  1. 1.Bioinformatics Research Center, School of Computer EngineeringNanyang Technological UniversitySingapore
  2. 2.Singapore-MIT AllianceSingapore
  3. 3.Department of Biological EngineeringMassachusetts Institute of TechnologyUSA

Personalised recommendations