Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine

Yang, Jing; Yao, Dengju; Zhan, Xiaojuan; Zhan, Xiaorong

doi:10.1007/978-3-319-08171-7_1

Jing Yang²¹,
Dengju Yao^21,22,
Xiaojuan Zhan²³ &
…
Xiaorong Zhan²⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8492))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

2395 Accesses
11 Citations

Abstract

Disease risk prediction is an important task in biomedicine and bioinformatics. To resolve the problem of high-dimensional features space and highly feature redundancy and to improve the intelligibility of data mining results, a new wrapper method of feature selection based on random forest variables importance measures and support vector machine was proposed. The proposed method combined sequence backward searching approach and sequence forward searching approach. Feature selection starts with the entire set of features in the dataset. At every iteration, two feature subsets are gained. One feature subset removes those most unimportant features and the most important feature at the same time, which is used to train random forest and to compute feature importance for next feature selection. Another feature subset removes only those most unimportant features while remains the most important feature, which is used as the optimal feature subset to train SVM classifier. Finally, the feature subset with the highest SVM classification accuracy was regarded as optimal feature subset. The experimental results on 11 UCI datasets, a real clinical data sets and a gene expression dataset show that the proposed algorithm can generate the smaller feature subset while improve the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Qi, Y.: Random Forest for Bioinformatics. In: Ensemble Machine Learning, pp. 307–323 (2012)
Google Scholar
Inza, I., Larranaga, P., Blanco, R.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine 31(2), 91–103 (2008)
Article Google Scholar
Tsymbal, A., Puuronen, S.: Ensemble feature selection with the simple Bayesian classification. Information Fusion 4(2), 87–100 (2010)
Article Google Scholar
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Bishop, C.M.: Bootstrap. Pattern Recognition and Machine Learning. Springer, Singapore (2006)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., et al.: Classification and Regression Trees. Chapman&Hall (1993)
Google Scholar
Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A.: Conditional variable importance for random forests. BMC Bioinformatics 9, 307 (2008)
Article Google Scholar
Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: A survey and results of new tests. Pattern Recognition 44, 330–349 (2011)
Article Google Scholar
Liu, H., Li, J.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2012)
Google Scholar
Wang, A., Wan, G., Cheng, Z., et al.: Incremental Learning Extremely Random Forest Classifier for Online Learning. Journal of Software 22(9), 2059–2074 (2011)
Article MATH Google Scholar
Díaz-Uriarte, R., de Andrés, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Article Google Scholar
Pang, H., George, S.L., Hui, K., Tong, T.: Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(5), 1422–1431 (2012)
Article Google Scholar
Dessì, N., Milia, G., Pes, B.: Pre-filtering Features in Random Forests for Microarray Data Classification. In: New Frontiers in Mining Complex Patterns (NFMCP 2012). vol. 60 (2012)
Google Scholar
Anaissi, A., Kennedy, P.J., Goyal, M., Catchpoole, D.R.: A balanced iterative random forest for gene selection from microarray data. BMC Bioinformatics 14, 261 (2013)
Google Scholar
Yi, C., Li, J., Zhu, C.: A kind of feature selection based on classification accuracy of SVM. Journal of Shandong University 45(7), 119–124 (2010)
MATH Google Scholar
UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/
Torgo, L.: Data Mining with R: Learning with Case Studies. Luis Chapman & Hall/CRC (2010)
Google Scholar
Jiang, S., Zheng, Q., Zhang, Q.: Clustering-Based Feature Selection. Acta Electronica Sinica 36(12), 157–160 (2008)
Google Scholar
Liu, Y., Wang, G., Zhu, X.: Feature selection based on adaptive multi-population genetic algorithm. Journal of Jilin University 41(6), 1690–1693 (2011)
MathSciNet Google Scholar
Zhang, J., He, Z., Wang, J.: Hybrid Feature Selection Algorithm Based on Adaptive Ant Colony Algorithm. Journal of System Simulation 21(6), 1605–1614 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Jing Yang & Dengju Yao
School of Software, Harbin University of Science and Technology, Harbin, China
Dengju Yao
College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
Xiaojuan Zhan
Department of Endocrinology, First Affiliated Hospital, Harbin Medical University, Harbin, China
Xiaorong Zhan

Authors

Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dengju Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojuan Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaorong Zhan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Johns Hopkins University, Computer Science Department, Baltimore, MD 21218, USA and National Science Foundation, 1115, CCF, USA
Mitra Basu
Department of Computer Science, Georgia State University, 30303, Atlanta, GA, USA
Yi Pan
School of Information Science and Engineering, Central South University, 410083, Changsha, China
Jianxin Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, J., Yao, D., Zhan, X., Zhan, X. (2014). Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine. In: Basu, M., Pan, Y., Wang, J. (eds) Bioinformatics Research and Applications. ISBRA 2014. Lecture Notes in Computer Science(), vol 8492. Springer, Cham. https://doi.org/10.1007/978-3-319-08171-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-08171-7_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08170-0
Online ISBN: 978-3-319-08171-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics