Simultaneous Sample and Gene Selection Using T-score and Approximate Support Vectors

Mundra, Piyushkumar A.; Rajapakse, Jagath C.; Maduranga, D. A. K.

doi:10.1007/978-3-642-39159-0_8

Piyushkumar A. Mundra²⁴,
Jagath C. Rajapakse^24,25,26 &
D. A. K. Maduranga²⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7986))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1558 Accesses

Abstract

T-score, based on t-statistics between samples and disease classes, is a widely used filter criterion for gene selection from microarray data. However, classical T-score uses all the training samples but for both biological and computational reasons, selection of relevant samples for training is an important step in classification. Using a modified logistic regression approach, we propose a sample selection criterion based on T-score and develop a backward elimination approach for gene selection. The method is more stable and computationally less costly compared to support vector machine recursive feature elimination (SVM-RFE) methods.

Download to read the full chapter text

Chapter PDF

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Article Open access 23 March 2016

Statistical Approaches to Candidate Biomarker Panel Selection

Keywords

References

Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A.: Filter versus wrapper gene selection approaches in dna microarray domains. Artificial Intelligence Medicine 31, 91–103 (2004)
Article Google Scholar
Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(4), 1106–1119 (2012)
Article Google Scholar
Mundra, P.A., Rajapakse, J.C.: Svm-rfe with mrmr filter for gene selection. IEEE Transactions on Nanobioscience 9(1), 31–37 (2010)
Article Google Scholar
Rajapakse, J.C., Mundra, P.A.: Multiclass gene selection using pareto-fronts. IEEE/ACM Transactions on Computational Biology and Bioinformatics (accepted, 2013)
Google Scholar
Guyon, I., Weston, J., Barhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Cavill, R., Keun, H., Holmes, E., Lindon, J., Nicholson, J., Ebbels, T.: Genetic algorithms for simultaneous variable and sample selection in metabonomics. Bioinformatics 25(1), 112–118 (2009)
Article Google Scholar
Chakraborty, S.: Simultaneous cancer classification and gene selection with bayesian nearest neighbor method: An integrated approach. Computational Statistics & Data Analysis 53(4), 1462–1474 (2009)
Article MathSciNet MATH Google Scholar
Hapfelmeier, A., Ulm, K.: A new variable selection approach using random forests. Computational Statistics & Data Analysis 60, 50–69 (2013)
Article MathSciNet Google Scholar
Kira, K., Rendell, L.A.: A feature selection problem: traditional methods and a new algorithm. In: Proc. of the 10th National Conference on Artificial Intelligence, pp. 129–134 (1992)
Google Scholar
Wang, Y., Tetko, I., Hall, M., Frank, E., Facius, A., Mayer, K., Mewes, H.: Gene selection from microarray data for cancer classification - a machine learning approach. Computational Biology and Chemistry 29, 37–46 (2005)
Article MATH Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J Bioinformatics Computational Biology 3, 185–205 (2005)
Article Google Scholar
Tang, Y., Zhang, Y.Q., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE Trans on Computational Biology and Bioinformatics 4(3), 365–381 (2007)
Article Google Scholar
Tang, Y., Zhang, Y.Q., Huang, Z., Hu, X., Zhao, Y.: Recursive fuzzy granulation for gene subset extraction and cancer classification. IEEE Trans on Information Technology in Biomedicine 12(6), 723–730 (2008)
Article Google Scholar
Kai-Bo, D., Rajapakse, J., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4, 228–234 (2005)
Article Google Scholar
Yoon, S., Kim, S.: Adaboost-based multiple svm-rfe for classification of mammograms in ddsm. BMC Medical Informatics and Decision Making 9(S1), 693–708 (2009)
Google Scholar
Abeel, T., Helleputte, T., Van de Peer, Y., Sayes, Y., et al.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Google Scholar
Diaz-Uriarte, R., Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Google Scholar
Zou, H., Hastie, T.: The regularization and variable selection via the elastic net. J. Royal Stat. Society B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience Publications (1998)
Google Scholar
Freund, Y., Schapire, R.: A short introduction to boosting. J. Japanese Society for Artificial Intelligence 14(5), 771–780 (1999)
Google Scholar
Clarke, R., Ressom, H., Wang, A., Xuan, J., et al.: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nature Reviews Cancer 8, 37–49 (2008)
Article Google Scholar
Han, Y., Yu, L.: A variance reduction framework for stable feature selection. In: Proc. of the 10th IEEE International Conference on Data Mining (2010)
Google Scholar
Liu, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artificial Intelligence 159, 49–74 (2004)
Article MathSciNet MATH Google Scholar
Pechenizkiy, M., Puuronen, S., Tsymbal, A.: The impact of sample reduction on PCA-based feature extraction for supervised learning. In: Proc. of the 21st ACM Symposium on Applied Computing, pp. 553–558 (2006)
Google Scholar
Shen, Q., Mei, Z., Ye, B.X.: Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification. Computers in Biology and Medicine 39, 646–649 (2009)
Article Google Scholar
Lei, Y., Yue, H., Berens, M.: Stable gene selection from microarray data via sample weighting. IEEE Transactions on Computational Biology and Bioinformatics 9(1), 262–272 (2012)
Article Google Scholar
Somol, P., Novovicova, J.: Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Transactions on Pattern Analysis and machine intelligence 32(11), 1921–1939 (2010)
Article Google Scholar
Haury, A.C., Gestraud, P., Vert, J.P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. Plos One 6(12), e28210 (2011)
Google Scholar
Mundra, P.A., Rajapakse, J.C.: Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 73(13-15), 2353–2362 (2010)
Article Google Scholar
Mundra, P.A., Rajapakse, J.C.: Support vector based T-score for gene ranking. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 144–153. Springer, Heidelberg (2008)
Chapter Google Scholar
Zhang, J., Jin, R., Yang, Y., Hauptmann, A.: Modified logistic regressionl an approximation to svm and its applications in large-scale text categorization. In: Proceedings of 20th International Conference on Machine Learning, ICML 2003 (2003)
Google Scholar
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)
Article Google Scholar
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression. Science 286, 531–537 (1999)
Article Google Scholar
West, M., Blanchette, C., Dressman, H., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of National Academy of sciences 98(20), 11462–11467 (2001)
Article Google Scholar
Kuncheva, L.: A stability index for feature selection. In: Proceedings of the 25th IASTED International Conference on Artificial Intelligence and Applications, pp. 390–395 (2007)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to feature extraction. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. STUDFUZZ, pp. 1–25. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, F., Yang, Y.: Analysis of recursive gene selection approaches from microarray data. Bioinformatics 21(19), 3741–3747 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, Singapore
Piyushkumar A. Mundra, Jagath C. Rajapakse & D. A. K. Maduranga
Singapore-MIT Alliance, Singapore
Jagath C. Rajapakse
Department of Biological Engineering, Massachusetts Institute of Technology, USA
Jagath C. Rajapakse

Authors

Piyushkumar A. Mundra
View author publications
You can also search for this author in PubMed Google Scholar
Jagath C. Rajapakse
View author publications
You can also search for this author in PubMed Google Scholar
D. A. K. Maduranga
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Windsor, 5115 Lambton Tower, 401 Sunset Avenue, N9B 3P4, Windsor, ON, Canada
Alioune Ngom
I3S Research Lab., Nice Sophia Antipolis University, 06903, Sophia Antipolis Cedex, France
Enrico Formenti
LERIA - Faculté des Sciences, Université d’Angers, 2 Boulevard Lavoisier, 49045, Angers Cedex 01, France
Jin-Kao Hao
School of Electronics and Information Engineering, Tongji University, 201804, Shanghai, China
Xing-Ming Zhao
Institute for Computing and Information Sciences, Radboud University, 6500 GL, Nijmegen, The Netherlands
Twan van Laarhoven

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mundra, P.A., Rajapakse, J.C., Maduranga, D.A.K. (2013). Simultaneous Sample and Gene Selection Using T-score and Approximate Support Vectors. In: Ngom, A., Formenti, E., Hao, JK., Zhao, XM., van Laarhoven, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2013. Lecture Notes in Computer Science(), vol 7986. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39159-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-39159-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39158-3
Online ISBN: 978-3-642-39159-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Simultaneous Sample and Gene Selection Using T-score and Approximate Support Vectors

Abstract

Chapter PDF

Similar content being viewed by others

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Statistical Approaches to Candidate Biomarker Panel Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Simultaneous Sample and Gene Selection Using T-score and Approximate Support Vectors

Abstract

Chapter PDF

Similar content being viewed by others

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Statistical Approaches to Candidate Biomarker Panel Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation