Advertisement

Exploring Feature-Level Duplications on Imbalanced Data Using Stochastic Diffusion Search

  • Haya Abdullah AlhakbaniEmail author
  • Mohammad Majid al-Rifaie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10207)

Abstract

One of the computer algorithms inspired by swarm intelligence is stochastic diffusion search (SDS). SDS uses some of the processes and techniques found in swarm to solve search and optimisation problems. In this paper, a hybrid approach is proposed to deal with real-world imbalanced data. The proposed model involves oversampling the minority class, undersampling the majority class as well as optimising the parameters of the classifier, Support Vector Machine (SVM). The proposed model uses Synthetic Minority Over-sampling Technique (SMOTE) to perform the oversampling and the agents of a swarm intelligence technique, SDS, to perform an ‘informed’ undersampling on the majority classes. In addition to comparing the agents-led undersampling with random undersampling, the results are contrasted against other best known techniques on nine real-world datasets. Moreover, the behaviour of SDS agents in this context is also analysed.

Keywords

Swarm intelligence Agents Class imbalance Stochastic diffusion search SVM 

References

  1. 1.
    Al-Rifaie, M.M., Bishop, J.M.: Stochastic diffusion search review. J. Behav. Robot. 3, 155–173 (2013)Google Scholar
  2. 2.
    Al-Rifaie, M.M., Alhakbani, H.A.: Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions. In: 2016 SAI Computing Conference (SAI), pp. 446–451 (2016). doi: 10.1109/SAI.2016.7556019. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7556019&isnumber=7555953
  3. 3.
    Bishop, J.: Stochastic searching networks. In: Proceedings of 1st IEE Conference on Artificial Neural Networks, pp. 329–331, London, UK (1989)Google Scholar
  4. 4.
    Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 4626–4636 (2009)CrossRefGoogle Scholar
  5. 5.
    Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005). doi: 10.1007/0-387-25465-X_40 CrossRefGoogle Scholar
  6. 6.
    Chawla, N.V.: Data mining for imbalanced datasets: an Overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Heidelberg (2009). doi: 10.1007/978-0-387-09823-4_45 CrossRefGoogle Scholar
  7. 7.
    Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM SIGKDD Explor. Newslett. 6(1), 30–39 (2004)CrossRefGoogle Scholar
  8. 8.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003). http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
  9. 9.
    Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)CrossRefGoogle Scholar
  10. 10.
    Lesperance, Y., Wagnerg, G., Birmingham, W., r Bollacke, K., Nareyek, A., Walser, J.P., Aha, D., Finin, T., Grosof, B., Japkowicz, N., et al.: Aaai 2000 workshop reports. AI Mag. 22(1), 127 (2001)Google Scholar
  11. 11.
    Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: Svms modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 281–288 (2009)CrossRefGoogle Scholar
  12. 12.
    Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)Google Scholar
  13. 13.
    Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proceedings of the 17th International Conference on Machine Learning. Citeseer (2000)Google Scholar
  14. 14.
    Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newslett. 6(1), 7–19 (2004)CrossRefGoogle Scholar
  15. 15.
    Zhang, X., Li, Y.: A positive-biased nearest neighbour algorithm for imbalanced classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS, vol. 7819, pp. 293–304. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37456-2_25 CrossRefGoogle Scholar
  16. 16.
    Zieba, M., Tomczak, J.M., Lubicz, M., Swiatek, J.: Boosted svm for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl. Soft Comput. 14, 99–108 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Haya Abdullah Alhakbani
    • 1
    Email author
  • Mohammad Majid al-Rifaie
    • 1
  1. 1.Department of Computing, GoldsmithsUniversity of LondonLondonUK

Personalised recommendations