Advertisement

Barricaded Boundary Minority Oversampling LS-SVM for a Biased Binary Classification

  • Hmayag PartamianEmail author
  • Yara Rizk
  • Mariette Awad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)

Abstract

Classifying biased datasets with linearly non-separable features has been a challenge in pattern recognition because traditional classifiers, usually biased and skewed towards the majority class, often produce sub-optimal results. However, if biased or unbalanced data is not processed appropriately, any information extracted from such data risks being compromised. Least Squares Support Vector Machines (LS-SVM) is known for its computational advantage over SVM, however, it suffers from the lack of sparsity of the support vectors: it learns the separating hyper-plane based on the whole dataset and often produces biased hyper-planes with imbalanced datasets. Motivated to contribute a novel approach for the supervised classification of imbalanced datasets, we propose Barricaded Boundary Minority Oversampling (BBMO) that oversamples the minority samples at the boundary in the direction of the closest majority samples to remove LS-SVM’s bias due to data imbalance. Two variations of BBMO are studied: BBMO1 for the linearly separable case which uses the Lagrange multipliers to extract boundary samples from both classes, and the generalized BBMO2 for the non-linear case which uses the kernel matrix to extract the closest majority samples to each minority sample. In either case, BBMO computes the weighted means as new synthetic minority samples and appends them to the dataset. Experiments on different synthetic and real-world datasets show that BBMO with LS-SVM improved on other methods in the literature and motivates follow on research.

Keywords

Biased datasets Linearly separable features Weighted means Barricaded boundary minority oversampling Kernel matrix 

References

  1. 1.
    Ajeeb, N., Nayal, A., Awad, M.: Minority svm for linearly separable imbalanced datasets. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–5. IEEE (2013)Google Scholar
  2. 2.
    Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30115-8_7CrossRefGoogle Scholar
  3. 3.
    Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17 (2011)Google Scholar
  4. 4.
    Awad, M., Motai, Y., Näppi, J., Yoshida, H.: A clinical decision support framework for incremental polyps classification in virtual colonoscopy. Algorithms 3(1), 1–20 (2010)CrossRefGoogle Scholar
  5. 5.
    Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008)CrossRefGoogle Scholar
  6. 6.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_43CrossRefGoogle Scholar
  7. 7.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)CrossRefGoogle Scholar
  8. 8.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  9. 9.
    Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines (2000)Google Scholar
  10. 10.
    Das, B.: Implementation of smoteboost algorithm used to handle class imbalance problem in data (2012). https://www.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
  11. 11.
    Di Martino, M., Decia, F., Molinelli, J., Fernández, A.: Improving electric fraud detection using class imbalance strategies. In: ICPRAM (2), pp. 135–141 (2012)Google Scholar
  12. 12.
    Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)Google Scholar
  13. 13.
    Hajj, N., Awad, M.: Isolated handwriting recognition via multi-stage support vector machines. In: 6th IEEE International Conference on Intelligent Systems, pp. 152–157. IEEE (2012)Google Scholar
  14. 14.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005).  https://doi.org/10.1007/11538059_91CrossRefGoogle Scholar
  15. 15.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  16. 16.
    Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: An SVM for improved classification of imbalanced data. In: Sattar, A., Kang, B. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 264–273. Springer, Heidelberg (2006).  https://doi.org/10.1007/11941439_30CrossRefGoogle Scholar
  17. 17.
    Khanna, R., Awad, M.: Efficient learning machines: theories, concepts, and applications for engineers and system designers. Apress (2015)Google Scholar
  18. 18.
    Köknar-Tezel, S., Latecki, L.J.: Improving svm classification on imbalanced data sets in distance spaces. In: 9th International Conference on Data Mining, pp. 259–267. IEEE (2009)Google Scholar
  19. 19.
    Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)Google Scholar
  20. 20.
    Kowalczyk, A., Raskutti, B.: One class svm for yeast regulation prediction. ACM SIGKDD Explor. Newsl. 4(2), 99–100 (2002)CrossRefGoogle Scholar
  21. 21.
    Li, P., Chan, K.L., Fang, W.: Hybrid kernel machine ensemble for imbalanced data sets. In: 18th International Conference on Pattern Recognition, vol. 1, pp. 1108–1111. IEEE (2006)Google Scholar
  22. 22.
    Lichman, M.: UCI machine learning repository (2013)Google Scholar
  23. 23.
    Nayal, A., Jomaa, H., Awad, M.: Kerminsvm for imbalanced datasets with a case study on arabic comics classification. Eng. Appl. Artif. Intell. 59, 159–169 (2017)CrossRefGoogle Scholar
  24. 24.
    Ou, Y.Y., Hung, H.G., Oyang, Y.J.: A study of supervised learning with multivariate analysis on unbalanced datasets. In: International Joint Conference on Neural Networks, pp. 2201–2205. IEEE (2006)Google Scholar
  25. 25.
    Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRefGoogle Scholar
  26. 26.
    Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMS: a case study. ACM Sigkdd Explor. Newsl. 6(1), 60–69 (2004)CrossRefGoogle Scholar
  27. 27.
    Rizk, Y., Mitri, N., Awad, M.: An ordinal kernel trick for a computationally efficient support vector machine. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 3930–3937. IEEE (2014)Google Scholar
  28. 28.
    Rizk, Y., Partamian, H., Awad, M.: Toward real-time seismic feature analysis for bright spot detection: a distributed approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. (2017)Google Scholar
  29. 29.
    Saab, S.A., Mitri, N., Awad, M.: Ham or spam? a comparative study for some content-based classification algorithms for email filtering. In: 17th IEEE Mediterranean Electrotechnical Conference, pp. 339–343 (2014)Google Scholar
  30. 30.
    Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)CrossRefGoogle Scholar
  31. 31.
    Stefanowski, J., Wilk, S.: Improving rule based classifiers induced by modlem by selective pre-processing of imbalanced data. In: Proceedings of the RSKD Workshop at ECML/PKDD, Warsaw, pp. 54–65. Citeseer (2007)Google Scholar
  32. 32.
    Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)CrossRefGoogle Scholar
  33. 33.
    Tang, Y., Zhang, Y.Q., Chawla, N.V., Krasser, S.: SVMS modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 281–288 (2009)CrossRefGoogle Scholar
  34. 34.
    Tax, D.M., Duin, R.P.: Support vector domain description. Pattern Recognit. Lett. 20(11), 1191–1199 (1999)CrossRefGoogle Scholar
  35. 35.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer science & business media, Berlin (2013)Google Scholar
  36. 36.
    Veropoulos, K., Campbell, C., Cristianini, N., et al.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55–60 (1999)Google Scholar
  37. 37.
    Wang, X., Matwin, S., Japkowicz, N., Liu, X.: Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In: Zaïane, O.R., Zilles, S. (eds.) AI 2013. LNCS (LNAI), vol. 7884, pp. 174–186. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38457-8_15CrossRefGoogle Scholar
  38. 38.
    Wu, G., Chang, E.Y.: Adaptive feature-space conformal transformation for imbalanced-data learning. In: International Conference on Machine Learning, pp. 816–823 (2003)Google Scholar
  39. 39.
    Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 workshop on learning from imbalanced data sets II, pp. 49–56. Washington (2003)Google Scholar
  40. 40.
    Wu, G., Chang, E.Y.: KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17(6), 786–795 (2005)CrossRefGoogle Scholar
  41. 41.
    Yang, J., Bouzerdoum, A., Phung, S.L.: A training algorithm for sparse LS-SVM using compressive sampling. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 2054–2057. IEEE (2010)Google Scholar
  42. 42.
    Yang, P., Xu, L., Zhou, B.B., Zhang, Z., Zomaya, A.Y.: A particle swarm based hybrid system for imbalanced medical data sampling. BMC Genomics 10(3), S34 (2009)CrossRefGoogle Scholar
  43. 43.
    Zhuang, L., Dai, H.: Parameter optimization of kernel-based one-class classifier on imbalance learning. J. Comput. 1(7), 32–40 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringAmerican University of BeirutBeirutLebanon

Personalised recommendations