Enhancing Random Forests Performance in Microarray Data Classification

  • Nicoletta Dessì
  • Gabriele Milia
  • Barbara Pes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7885)


Random forests are receiving increasing attention for classification of microarray datasets. We evaluate the effects of a feature selection process on the performance of a random forest classifier as well as on the choice of two critical parameters, i.e. the forest size and the number of features chosen at each split in growing trees. Results of our experiments suggest that parameters lower than popular default values can lead to effective and more parsimonious classification models. Growing few trees on small subsets of selected features, while randomly choosing a single variable at each split, results in classification performance that compares well with state-of-art studies.


Microarray data classification Random Forests Feature selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amaratunga, D., Cabrera, J., Lee, Y.S.: Enriched random forest. Bioinformatics 24, 2010–2014 (2008)CrossRefGoogle Scholar
  2. 2.
    Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  3. 3.
    Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: PNAS, vol. 96, pp. 6745–6750 (1999)Google Scholar
  4. 4.
  5. 5.
    Braga-Neto, U., Dougherty, E.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20, 374–380 (2004)CrossRefGoogle Scholar
  6. 6.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  7. 7.
    Statnikov, A., Wang, L., Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 9, 319 (2008)CrossRefGoogle Scholar
  8. 8.
    Dìaz-Uriarte, R., Alvarez de Andrés, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Nicoletta Dessì
    • 1
  • Gabriele Milia
    • 1
  • Barbara Pes
    • 1
  1. 1.Dipartimento di Matematica e InformaticaUniversità degli Studi di CagliariCagliariItaly

Personalised recommendations