Skip to main content

Improving Classification of Imbalanced Student Dataset Using Ensemble Method of Voting, Bagging, and Adaboost with Under-Sampling Technique

  • Conference paper
  • First Online:
IT Convergence and Security 2017

Abstract

Student imbalanced data is one of the problems in data mining community. To state the student dropout problem, an ensemble method with under-sampling technique is applied for improved the performance of classification of imbalanced student dataset. Mutual information for feature selection methods is used to find a significant feature. Voting, bagging, and adaboost technique in the ensemble method are used with decision tree (C4.5) and artificial neural network (ANN) classifiers to classify student in point of research objective. The result of this experiment evaluated by overall accuracy, precision, and recall. Bagging technique by random forest gave the best result in terms of overall accuracy is 74.57% and the recall of the prediction in the class (low) which we interested is 95.61%. This experiment extremely useful not only finding a useful knowledge for student and academic planning and management but also improving classification for imbalanced data which is the most effective way to state the classify student performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  2. Rashu, R.I., Haq, N., Rahman, R.M.: Data mining approaches to predict final grade by overcoming class imbalance problem. In: IEEE, 17th International Conference on Computer and Information Technology (ICCIT), pp. 14–19 (2014)

    Google Scholar 

  3. Jishan, S.T., Rashu, R.I., Haque, N., Rahman, R.M.: Improving accuracy of students’ final grade prediction model using optimal equal width binning and synthetic minority over-sampling technique. Decis. Anal. 2(1), 1 (2015)

    Article  Google Scholar 

  4. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 539–550 (2009)

    Article  Google Scholar 

  5. Liu, T.Y.: Easyensemble and feature selection for imbalance data sets. In Bioinformatics. In: IEEE International Joint Conference on Systems Biology and Intelligent Computing IJCBS 2009, pp. 517–520 (2009)

    Google Scholar 

  6. Lima, R.F., Pereira, A.C.M.: A fraud detection model based on feature selection and undersampling applied to Web payment systems. In: 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 3, pp. 219–222 (2015)

    Google Scholar 

  7. Yin, H., Gai, K., Wang, Z.: A classification algorithm based on ensemble feature selections for imbalanced-class dataset. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), pp. 245–249 (2016)

    Google Scholar 

  8. Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707 (2013)

  9. Lam-On, N., Boongoen, T.: Using cluster ensemble to improve classification of student dropout in Thai university. In: IEEE 15th International Symposium on Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), pp. 452–457 (2014)

    Google Scholar 

  10. Govindarajan, M.: Analysis of bagged ensemble classifiers for blogger data. In: IEEE, International Conference in Computing Technologies and Intelligent Data Engineering (ICCTIDE), pp. 1–5 (2016)

    Google Scholar 

  11. Kulkarni, S., Kelkar, V.: Classification of multispectral satellite images using ensemble techniques of bagging, boosting and adaboost. In: IEEE 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA), pp. 253–258 (2014)

    Google Scholar 

  12. Mirza, B., Lin, Z., Cao, J., Lai, X.: Voting based weighted online sequential extreme learning machine for imbalance multi-class classification. In: IEEE 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 565–568 (2015)

    Google Scholar 

  13. Fazelpour, A., Khoshgoftaar, T. M., Dittman, D. J., Naplitano, A.: Investigating the variation of ensemble size on bagging-based classifier performance in imbalanced bioinformatics datasets. In: 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), pp. 377–383 (2016)

    Google Scholar 

  14. Kaur, P., Negi, V.: Techniques based upon boosting to counter class imbalance problem—a survey. In: IEEE, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 2620–2623 (2016)

    Google Scholar 

  15. Ruangthong, P., Jaiyen, S.: Hybrid ensembles of decision trees and Bayesian network for class imbalance problem. In: IEEE 2016 8th International Conference on Knowledge and Smart Technology (KST), pp. 39–42 (2016)

    Google Scholar 

  16. Webb, G.I.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)

    Article  Google Scholar 

  17. Mustafa, G., Niu, Z., Yousif, A., Tarus, J.: Distribution based ensemble for class imbalance learning. In: IEEE 2015 Fifth International Conference on Innovative Computing Technology (INTECH), pp. 5–10 (2015)

    Google Scholar 

  18. Punlumjeak, W., Rachburee, N., Arunrerk, J.: Big data analytics: student performance prediction using feature selection and machine learning on microsoft azure platform. J. Telecommun. Electron. Comput. Eng. JTEC 9(1–4), 113–117 (2017)

    Google Scholar 

Download references

Acknowledgements

We would like to thanks to Rajamangala University of Technology Thanyaburi, Pathumthani, Thailand for providing the student data for conduct this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wattana Punlumjeak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Punlumjeak, W., Rugtanom, S., Jantarat, S., Rachburee, N. (2018). Improving Classification of Imbalanced Student Dataset Using Ensemble Method of Voting, Bagging, and Adaboost with Under-Sampling Technique. In: Kim, K., Kim, H., Baek, N. (eds) IT Convergence and Security 2017. Lecture Notes in Electrical Engineering, vol 449. Springer, Singapore. https://doi.org/10.1007/978-981-10-6451-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6451-7_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6450-0

  • Online ISBN: 978-981-10-6451-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics