Abstract
Extracting knowledge from gene expression data is still a major challenge. Relative expression algorithms use the ordering relationships for a small collection of genes and are successfully applied for micro-array classification. However, searching for all possible subsets of genes requires a significant number of calculations, assumptions and limitations. In this paper we propose an evolutionary algorithm for global induction of top-scoring pair decision trees. We have designed several specialized genetic operators that search for the best tree structure and the splits in internal nodes which involve pairwise comparisons of the gene expression values. Preliminary validation performed on real-life micro-array datasets is promising as the proposed solution is highly competitive to other relative expression algorithms and allows exploring much larger solution space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akaike, H.: A New Look at Statistical Model Identification. IEEE Transactions on Automatic Control 19, 716–723 (1974)
Breiman, L., Friedman, J.: Classification and Regression Trees. Wadsworth Int. Group (1984)
Cho, H.S., Kim, T.S.: cDNA Microarray Data Based Classification of Cancers Using Neural Networks and Genetic Algorithms. Nanotech 1 (2003)
Czajkowski, M., Kretowski, M.: Novel Extension of k − TSP Algorithm for Microarray Classification. In: Nguyen, N.T., Borzemski, L., Grzech, A., Ali, M. (eds.) IEA/AIE 2008. LNCS (LNAI), vol. 5027, pp. 456–465. Springer, Heidelberg (2008)
Czajkowski, M., Kretowski, M.: Top Scoring Pair Decision Tree for Gene Expression Data Analysis. In: Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol. 696, pp. 27–35 (2011)
Czajkowski, M., Grześ, M., Kretowski, M.: Multi-Test Decision Trees for Gene Expression Data Analysis. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 154–167. Springer, Heidelberg (2012)
Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Dudoit, S.J., Fridlyand, J.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)
Esposito, F., Malerba, D., Semeraro, G.: A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5), 476–491 (1997)
Geman, D., d’Avignon, C., Naiman, D.Q., Winslow, R.L.: Classifying gene expression profiles from pairwise mRNA comparisons. Statistical Applications in Genetics and Molecular Biology 3(19) (2004)
Grześ, M., Kretowski, M.: Decision Tree Approach to Microarray Data Analysis. Biocybernetics and Biomedical Engineering 27(3), 29–42 (2007)
Kent Ridge Bio-medical Dataset Repository, http://datam.i2r.a-star.edu.sg/datasets/index.html
Kretowski, M., Grześ, M.: Evolutionary Induction of Mixed Decision Trees. International Journal of Data Warehousing and Mining 3(4), 68–82 (2007)
Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Lin, X., Afsari, B., Marchionni, L., Cope, L., Parmigiani, G., Naiman, D., Geman, D.: The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations. BMC Bioinformatics 10(256) (2009)
Lockhart, D.J., Winzeler, E.A.: Genomics, gene expression and DNA arrays. Nature 405, 827–836 (2000)
Lu, Y., Han, J.: Cancer classification using gene expression data. Information Systems 28(4), 243–268 (2003)
Magis, A.T., Earls, J.C., Ko, Y., Eddy, J.A., Price, N.D.: Graphics processing unit implementations of relative expression analysis algorithms enable dramatic computational speedup. Bioinformatics 27(6), 872–873 (2011)
Magis, A.T., Price, N.D.: The top-scoring ‘N’ algorithm: a generalized relative expression classification method from small numbers of biomolecules. BMC Bioinformatics 13(1), 227 (2012)
Mao, Y., Zhou, X.: Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree With Gene Selection. Journal of Biomedicine and Biotechnology, 160–171 (2005)
Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer (1996)
Murthy, S.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)
Nelson, P.S.: Predicting prostate cancer behavior using transcript profiles. Journal of Urology 172, 28–32 (2004)
Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - A survey. IEEE Transactions on Systems, Man, and Cybernetics - Part C 35(4), 476–487 (2005)
Schwarz, G.: Estimating the Dimension of a Model. The Annals of Statistics 6, 461–464 (1978)
Shi, P., Ray, S., Zhu, Q., Kon, M.A.: Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. BMC Bioinformatics 12(375) (2011)
Simon, R., Radmacher, M.D.: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute 95, 14–18 (2003)
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, 75–83 (2003)
Tan, A.C., Naiman, D.Q.: Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21, 3896–3904 (2005)
Quinlan, R.: Inductive knowledge acquisition: A case study, vol. 9, pp. 157–173. Addison-Wesley (1987)
Yang, X., Liu, H.: Top Scoring Pair based methods for classification (BigTSP R package) (2012), http://cran.r-project.org
Yoon, S., Kim, S.: k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 151–159 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Czajkowski, M., Kretowski, M. (2013). Global Top-Scoring Pair Decision Tree for Gene Expression Data Analysis. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds) Genetic Programming. EuroGP 2013. Lecture Notes in Computer Science, vol 7831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37207-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-37207-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37206-3
Online ISBN: 978-3-642-37207-0
eBook Packages: Computer ScienceComputer Science (R0)