Artificial Intelligence Review

, Volume 50, Issue 2, pp 201–240 | Cite as

An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features

  • Cen WanEmail author
  • Alex A. Freitas


Hierarchical feature selection is a new research area in machine learning/data mining, which consists of performing feature selection by exploiting dependency relationships among hierarchically structured features. This paper evaluates four hierarchical feature selection methods, i.e., HIP, MR, SHSEL and GTD, used together with four types of lazy learning-based classifiers, i.e., Naïve Bayes, Tree Augmented Naïve Bayes, Bayesian Network Augmented Naïve Bayes and k-Nearest Neighbors classifiers. These four hierarchical feature selection methods are compared with each other and with a well-known “flat” feature selection method, i.e., Correlation-based Feature Selection. The adopted bioinformatics datasets consist of aging-related genes used as instances and Gene Ontology terms used as hierarchical features. The experimental results reveal that the HIP (Select Hierarchical Information Preserving Features) method performs best overall, in terms of predictive accuracy and robustness when coping with data where the instances’ classes have a substantially imbalanced distribution. This paper also reports a list of the Gene Ontology terms that were most often selected by the HIP method.


Hierarchical feature selection Classification Machine learning Data mining Bayesian classifiers K-Nearest Neighbors Biology of aging 



We thank Dr. João Pedro de Magalhães for his valuable general advice on the biology of aging for this Project. We also thank Pablo Silva for providing an implementation code of the SHSEL method. We also acknowledge the support of concurrency researchers at the University of Kent for access to the ‘CoSMoS’ cluster, funded by EPSRC Grants EP/E049419/1 and EP/E0535/1.


  1. Aha DW (1997) Lazy learning. Kluwer Academic Publishers, NorwellCrossRefzbMATHGoogle Scholar
  2. Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13):1600–1607CrossRefGoogle Scholar
  3. Barber D (2012) Bayesian reasoning and machine learning. Cambridge University Press, CambridgezbMATHGoogle Scholar
  4. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefzbMATHGoogle Scholar
  5. de Magalhães JP (2013) How ageing processes influence cancer. Nat Rev Cancer 13(5):357–365CrossRefGoogle Scholar
  6. de Magalhães JP, Budovsky A, Lehmann G, Costa J, Li Y, Fraifeld V, Church GM (2009) The human ageing genomic resources: online databases and tools for biogerontologists. Aging Cell 8(1):65–72CrossRefGoogle Scholar
  7. Demsǎr J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  8. Derrac J, Garcia S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18CrossRefGoogle Scholar
  9. Fang Y, Wang X, Michaelis EK, Fang J (2013) Classifying aging genes into DNA repair or non-DNA repair-related categories. Lecture notes in intelligent computing theories and technology, pp 20–29Google Scholar
  10. Fernandes M, Wan C, Tacutu R, Barardo D, Rajput A, Wang J, Thoppil H, Thornton D, Yang C, Freitas AA, de Magalhães JP (2016) Systematic analysis of the gerontome reveals links between aging and age-related diseases. Hum Mol Genet (in press). doi: 10.1093/hmg/ddw307
  11. Freitas AA, Vasieva O, de Magalhães JP (2011) A data mining approach for classifying DNA repair genes into ageing-related or non-ageing-related. BMC Genomics 12(27):1–11Google Scholar
  12. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163CrossRefzbMATHGoogle Scholar
  13. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  14. Hall MA (1998) Correlation-based feature subset selection for machine learning. PhD thesis, University of Waikato, Hamilton, New ZealandGoogle Scholar
  15. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, BerlinCrossRefzbMATHGoogle Scholar
  16. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood CliffszbMATHGoogle Scholar
  17. Jain AK, Zongker D (1997) Representation and recognition of handwritten digits using deformable templates. IEEE Trans Pattern Anal Mach Intell 19(12):1386–1391CrossRefGoogle Scholar
  18. Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  19. Jenatton R, Audibert JY, Bach F (2011) Structured variable selection with sparity-inducing norms. J Mach Learn Res 12:2777–2824MathSciNetzbMATHGoogle Scholar
  20. Jeong Y, Myaeng S (2013) Feature selection using a semantic hierarchy for event recognition and type classification. In: Proceedings of sixth international joint conference on natural language. Nagoya, Japan, pp 136–144Google Scholar
  21. Jiang L, Zhang H, Cai Z, Su J (2005) Learning tree augmented naive bayes for ranking. Database Syst Adv Appl 3453:688–698CrossRefGoogle Scholar
  22. Kenyon CJ (2010) The genetics of ageing. Nature 464(7288):504–512CrossRefGoogle Scholar
  23. Keogh EJ, Pazzani MJ (1999) Learning augmented bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, Florida, USA, pp 225–230Google Scholar
  24. Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, NorwellCrossRefzbMATHGoogle Scholar
  25. Lu S, Ye Y, Tsui R, Su H, Rexit R, Wesaratchakit S, Liu X, Hwa R (2013) Domain ontology-based feature reduction for high dimensional drug data and its application to 30-day heart failure readmission prediction. In: Proceedings of the ninth international conference conference on collaborative computing: networking, applications and worksharing (Collaboratecom). Austin, USA, pp 478–484Google Scholar
  26. Martins AFT, Smith NA, Aguiar PMQ, Figueiredo MAT (2011) Structured sparsity in structured prediction. In: Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011). Edinburgh, UK, pp 1500–1511Google Scholar
  27. Pereira RB, Plastino A, Zadrozny B, de C Merschmann LH LH, Freitas AA (2011) Lazy attribute selection: choosing attributes at classification time. Intell Data Anal 15(5):715–732Google Scholar
  28. Ristoski P, Paulheim H (2014) Feature selection in hierarchical feature spaces. In: Proceedings of seventeenth international conference on discovery science. Bled, Slovenia, pp 288–300Google Scholar
  29. Sohal RS, Weindruch R (1996) Oxidative stress, caloric restriction, and aging. Science 273(5271):59–63CrossRefGoogle Scholar
  30. Sohal RS, Ku HH, Agarwal S, Forster MJ, Lal H (1994) Oxidative damage, mitochondrial oxidant generation and antioxidant defenses during aging and in response to food restriction in the mouse. Mech Ageing Dev 74(1–2):121–133CrossRefGoogle Scholar
  31. Stanfill C, Waltz D (1986) Toward memory-based reasoning. Commun ACM 29(12):1213–1228CrossRefGoogle Scholar
  32. Tacutu R, Craig T, Budovsky A, Wuttke D, Lehmann G, Taranukha D, Costa J, Fraifeld VE, de Magalhães JP (2013) Human ageing genomic resources: integrated databases and tools for the biology and genetics of ageing. Nucl Acids Res 41(D1):D1027–D1033CrossRefGoogle Scholar
  33. The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29Google Scholar
  34. Tyner SD, Venkatachalam S, Choi J, Jones S, Ghebranious N, Igelmann H, Lu X, Soron G, Cooper B, Brayton C, Park SH, Thompson T, Karsenty G, Bradley A, Donehower LA (2002) p53 mutant mice that display early ageing-associated phenotypes. Nature 415(6867):45–53CrossRefGoogle Scholar
  35. Vijg J, Campisi J (2008) Puzzles, promises and a cure for ageing. Nature 454(7208):1065–1071CrossRefGoogle Scholar
  36. Walker G, Houthoofd K, Vanfleteren JR, Gems D (2005) Dietary restriction in \(C. elegans\): from rate-of-living effects to nutrient sensing pathways. Mech Ageing Dev 126(9):929–937CrossRefGoogle Scholar
  37. Wan C (2015) Novel hierarchical feature selection methods for classification and their application to datasets of ageing-related genes. PhD thesis, University of Kent, Canterbury, United KingdomGoogle Scholar
  38. Wan C, Freitas AA (2013) Prediction of the pro-longevity or anti-longevity effect of Caenorhabditis Elegans genes based on Bayesian classification methods. In: Proceedings of IEEE international conference on bioinformatics and biomedicine (BIBM 2013), Shanghai, China, pp 373–380Google Scholar
  39. Wan C, Freitas AA (2015) Two methods for constructing a gene ontology-based feature selection network for a Bayesian network classifier and applications to datasets of aging-related genes. In: Proceedings of the sixth ACM conference on bioinformatics, computational biology and health informatics (ACM-BCB 2015). Atlanta, USA, pp 27–36Google Scholar
  40. Wan C, Freitas AA, de Magalhães JP (2015) Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods. IEEE/ACM Trans Comput Biol Bioinf 12(2):262–275CrossRefGoogle Scholar
  41. Wang B, Mckay R, Abbass H, Barlow M (2003) A comparative study for domain ontology guided feature extraction. In: Proceedings of the twenty-sixth Australasian computer science conference. Adelaide, Australia, pp 69–78Google Scholar
  42. Wood JG, Rogina B, Lavu S, Howitz K, Helfand SL, Tatar M, Sinclair D (2004) Sirtuin activators mimic caloric restriction and delay ageing in metazoans. Nature 430:686–689CrossRefGoogle Scholar
  43. Ye J, Liu J (2012) Sparse methods for biomedical data. ACM SIGKDD Explor Newsl 14(1):4–15CrossRefGoogle Scholar
  44. Zhang H, Ling CX (2001) An improved learning algorithm for augmented naive bayes. Adv Knowl Discov Data Min 2035:581–586zbMATHGoogle Scholar
  45. Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37(6):3468–3497MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity College LondonLondonUK
  2. 2.School of ComputingUniversity of KentCanterburyUK

Personalised recommendations