A Comparison of Weighted Support Vector Machine (WSVM), One-Step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for Mislabeled Data

Dzulkifli, Syarizul Amri Mohd; Salleh, Mohd. Najib Mohd.; Bahrudin, Ida Aryanie

doi:10.1007/978-3-030-36056-6_43

Syarizul Amri Mohd Dzulkifli¹⁸,
Mohd. Najib Mohd. Salleh¹⁸ &
Ida Aryanie Bahrudin¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 978))

Included in the following conference series:

International Conference on Soft Computing and Data Mining

785 Accesses

Abstract

Labeling error can occur for various reasons such as the subjective nature of the labeling task, the lack of information to determine the true label of a given example and data entry error. Labeling errors were categorized as mislabeled, unlabeled, partially labeled, incompletely labeled and illegible label. In this study, the focus will be on mislabeled data. The problem of dealing with mislabeled data and in particular of constructing a classifier from such data has been approached from a number of different directions. Therefore, developing learning algorithms that effectively and efficiently deal with mislabeled data is a great practical importance and key aspect in machine learning. Support Vector Machine (SVM) has been widely accepted to be one of the most effective techniques in machine learning algorithms. One of the main drawbacks of SVM is it depends on only a small part of the data points (support vectors) and it treats all training data of a given class equally. To address this problem, one of the solution is the Weighted Support Vector Machines (WSVM). Wu & Liu proposed two different WSVM namely one-step WSVM (OWSVM) and iteratively WSVM (IWSVM). In this paper, a comparison of Weighted Support Vector Machine (WSVM), One-step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for mislabeled data has been done to see the classification accuracy of each of the method. The three methods were compared based on correctly labeled, mislabeled data, data within margin, mislabeled data within margin and classification accuracy for eight KEEL repository datasets using 20% noise in training data. Based on the experimental results, the performance of OWSVM is better than both WSVM and IWSVM based on the correctly labeled, mislabeled data, data within margin, mislabeled data within margin and classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Reddy M (2018) Ground Truth Gold—Intelligent data labeling and annotation. The Hive
Google Scholar
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167
Article Google Scholar
Frénay B, Kabán A (2014) A comprehensive introduction to label noise. In: European Symposium on Artificial Neural Networks. Comput Intell Mach Learn 23–25
Google Scholar
Wagar EA, Stankovic AK, Raab S, Nakhleh RE, Walsh MK (2008) Specimen labeling errors: a Q-probes analysis of 147 clinical laboratories. Arch Pathol Lab Med
Google Scholar
Bootkrajang J, Kabán A (2012) Label-noise robust logistic regression and its applications. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Google Scholar
Bootkrajang J, Kabán A (2013) Classification of mislabelled microarrays using robust sparse logistic regression. Bioinformatics
Google Scholar
Bootkrajang J (2016) A generalised label noise model for classification in the presence of annotation errors. Neurocomputing
Google Scholar
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Networks Learn Syst 25(5):845–869
Article Google Scholar
Liu T, Tao D (2015) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461
Article Google Scholar
Almasi ON, Rouhani M (2016) Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turkish J Electr Eng Comput Sci 24(1):219–233
Article Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop On Computational Learning Theory-COLT ’92, pp 144–152
Google Scholar
Vapnik VN (1995) The nature of statistical learning theory, vol 8
Google Scholar
Sabzevari M (2015) Ensemble learning in the presence of noise
Google Scholar
Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recognit Artif Intell 21(5):961–976
Article Google Scholar
Fan H, Ramamohanarao K (2005) A weighting scheme based on emerging patterns for weighted support vector machines. In: 2005 IEEE International Conference on Granular Computing, pp 435–440
Google Scholar
Tian J, Gu H, Liu W, Gao C (2011) Robust prediction of protein subcellular localization combining PCA and WSVMs. Comput Biol Med 41(8):648–652
Article Google Scholar
Wu Y, Liu Y (2013) Adaptively weighted large margin classifiers. J Comput Graph Stat 22(2):37–41
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Universiti Tun Hussein Onn Malaysia, 86400, Batu Pahat, Parit Raja, Johor, Malaysia
Syarizul Amri Mohd Dzulkifli, Mohd. Najib Mohd. Salleh & Ida Aryanie Bahrudin

Authors

Syarizul Amri Mohd Dzulkifli
View author publications
You can also search for this author in PubMed Google Scholar
Mohd. Najib Mohd. Salleh
View author publications
You can also search for this author in PubMed Google Scholar
Ida Aryanie Bahrudin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syarizul Amri Mohd Dzulkifli .

Editor information

Editors and Affiliations

Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia
Rozaida Ghazali
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia
Nazri Mohd Nawi
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia
Mustafa Mat Deris
School of Information Technology, Deakin University, Geelong Waurn Ponds Campus, VIC, Australia
Jemal H. Abawajy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dzulkifli, S.A.M., Salleh, M.N.M., Bahrudin, I.A. (2020). A Comparison of Weighted Support Vector Machine (WSVM), One-Step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for Mislabeled Data. In: Ghazali, R., Nawi, N., Deris, M., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2020. Advances in Intelligent Systems and Computing, vol 978. Springer, Cham. https://doi.org/10.1007/978-3-030-36056-6_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-36056-6_43
Published: 05 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36055-9
Online ISBN: 978-3-030-36056-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics