Skip to main content

Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8492))

Included in the following conference series:

Abstract

Disease risk prediction is an important task in biomedicine and bioinformatics. To resolve the problem of high-dimensional features space and highly feature redundancy and to improve the intelligibility of data mining results, a new wrapper method of feature selection based on random forest variables importance measures and support vector machine was proposed. The proposed method combined sequence backward searching approach and sequence forward searching approach. Feature selection starts with the entire set of features in the dataset. At every iteration, two feature subsets are gained. One feature subset removes those most unimportant features and the most important feature at the same time, which is used to train random forest and to compute feature importance for next feature selection. Another feature subset removes only those most unimportant features while remains the most important feature, which is used as the optimal feature subset to train SVM classifier. Finally, the feature subset with the highest SVM classification accuracy was regarded as optimal feature subset. The experimental results on 11 UCI datasets, a real clinical data sets and a gene expression dataset show that the proposed algorithm can generate the smaller feature subset while improve the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Qi, Y.: Random Forest for Bioinformatics. In: Ensemble Machine Learning, pp. 307–323 (2012)

    Google Scholar 

  2. Inza, I., Larranaga, P., Blanco, R.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine 31(2), 91–103 (2008)

    Article  Google Scholar 

  3. Tsymbal, A., Puuronen, S.: Ensemble feature selection with the simple Bayesian classification. Information Fusion 4(2), 87–100 (2010)

    Article  Google Scholar 

  4. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Bishop, C.M.: Bootstrap. Pattern Recognition and Machine Learning. Springer, Singapore (2006)

    Google Scholar 

  6. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  7. Breiman, L., Friedman, J.H., Olshen, R.A., et al.: Classification and Regression Trees. Chapman&Hall (1993)

    Google Scholar 

  8. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A.: Conditional variable importance for random forests. BMC Bioinformatics 9, 307 (2008)

    Article  Google Scholar 

  9. Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: A survey and results of new tests. Pattern Recognition 44, 330–349 (2011)

    Article  Google Scholar 

  10. Liu, H., Li, J.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2012)

    Google Scholar 

  11. Wang, A., Wan, G., Cheng, Z., et al.: Incremental Learning Extremely Random Forest Classifier for Online Learning. Journal of Software 22(9), 2059–2074 (2011)

    Article  MATH  Google Scholar 

  12. Díaz-Uriarte, R., de Andrés, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)

    Article  Google Scholar 

  13. Pang, H., George, S.L., Hui, K., Tong, T.: Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(5), 1422–1431 (2012)

    Article  Google Scholar 

  14. Dessì, N., Milia, G., Pes, B.: Pre-filtering Features in Random Forests for Microarray Data Classification. In: New Frontiers in Mining Complex Patterns (NFMCP 2012). vol. 60 (2012)

    Google Scholar 

  15. Anaissi, A., Kennedy, P.J., Goyal, M., Catchpoole, D.R.: A balanced iterative random forest for gene selection from microarray data. BMC Bioinformatics 14, 261 (2013)

    Google Scholar 

  16. Yi, C., Li, J., Zhu, C.: A kind of feature selection based on classification accuracy of SVM. Journal of Shandong University 45(7), 119–124 (2010)

    MATH  Google Scholar 

  17. UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/

  18. Torgo, L.: Data Mining with R: Learning with Case Studies. Luis Chapman & Hall/CRC (2010)

    Google Scholar 

  19. Jiang, S., Zheng, Q., Zhang, Q.: Clustering-Based Feature Selection. Acta Electronica Sinica 36(12), 157–160 (2008)

    Google Scholar 

  20. Liu, Y., Wang, G., Zhu, X.: Feature selection based on adaptive multi-population genetic algorithm. Journal of Jilin University 41(6), 1690–1693 (2011)

    MathSciNet  Google Scholar 

  21. Zhang, J., He, Z., Wang, J.: Hybrid Feature Selection Algorithm Based on Adaptive Ant Colony Algorithm. Journal of System Simulation 21(6), 1605–1614 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, J., Yao, D., Zhan, X., Zhan, X. (2014). Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine. In: Basu, M., Pan, Y., Wang, J. (eds) Bioinformatics Research and Applications. ISBRA 2014. Lecture Notes in Computer Science(), vol 8492. Springer, Cham. https://doi.org/10.1007/978-3-319-08171-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08171-7_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08170-0

  • Online ISBN: 978-3-319-08171-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics