Abstract
Labeling error can occur for various reasons such as the subjective nature of the labeling task, the lack of information to determine the true label of a given example and data entry error. Labeling errors were categorized as mislabeled, unlabeled, partially labeled, incompletely labeled and illegible label. In this study, the focus will be on mislabeled data. The problem of dealing with mislabeled data and in particular of constructing a classifier from such data has been approached from a number of different directions. Therefore, developing learning algorithms that effectively and efficiently deal with mislabeled data is a great practical importance and key aspect in machine learning. Support Vector Machine (SVM) has been widely accepted to be one of the most effective techniques in machine learning algorithms. One of the main drawbacks of SVM is it depends on only a small part of the data points (support vectors) and it treats all training data of a given class equally. To address this problem, one of the solution is the Weighted Support Vector Machines (WSVM). Wu & Liu proposed two different WSVM namely one-step WSVM (OWSVM) and iteratively WSVM (IWSVM). In this paper, a comparison of Weighted Support Vector Machine (WSVM), One-step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for mislabeled data has been done to see the classification accuracy of each of the method. The three methods were compared based on correctly labeled, mislabeled data, data within margin, mislabeled data within margin and classification accuracy for eight KEEL repository datasets using 20% noise in training data. Based on the experimental results, the performance of OWSVM is better than both WSVM and IWSVM based on the correctly labeled, mislabeled data, data within margin, mislabeled data within margin and classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Reddy M (2018) Ground Truth Gold—Intelligent data labeling and annotation. The Hive
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167
Frénay B, Kabán A (2014) A comprehensive introduction to label noise. In: European Symposium on Artificial Neural Networks. Comput Intell Mach Learn 23–25
Wagar EA, Stankovic AK, Raab S, Nakhleh RE, Walsh MK (2008) Specimen labeling errors: a Q-probes analysis of 147 clinical laboratories. Arch Pathol Lab Med
Bootkrajang J, Kabán A (2012) Label-noise robust logistic regression and its applications. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Bootkrajang J, Kabán A (2013) Classification of mislabelled microarrays using robust sparse logistic regression. Bioinformatics
Bootkrajang J (2016) A generalised label noise model for classification in the presence of annotation errors. Neurocomputing
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Networks Learn Syst 25(5):845–869
Liu T, Tao D (2015) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461
Almasi ON, Rouhani M (2016) Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turkish J Electr Eng Comput Sci 24(1):219–233
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop On Computational Learning Theory-COLT ’92, pp 144–152
Vapnik VN (1995) The nature of statistical learning theory, vol 8
Sabzevari M (2015) Ensemble learning in the presence of noise
Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recognit Artif Intell 21(5):961–976
Fan H, Ramamohanarao K (2005) A weighting scheme based on emerging patterns for weighted support vector machines. In: 2005 IEEE International Conference on Granular Computing, pp 435–440
Tian J, Gu H, Liu W, Gao C (2011) Robust prediction of protein subcellular localization combining PCA and WSVMs. Comput Biol Med 41(8):648–652
Wu Y, Liu Y (2013) Adaptively weighted large margin classifiers. J Comput Graph Stat 22(2):37–41
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dzulkifli, S.A.M., Salleh, M.N.M., Bahrudin, I.A. (2020). A Comparison of Weighted Support Vector Machine (WSVM), One-Step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for Mislabeled Data. In: Ghazali, R., Nawi, N., Deris, M., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2020. Advances in Intelligent Systems and Computing, vol 978. Springer, Cham. https://doi.org/10.1007/978-3-030-36056-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-36056-6_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36055-9
Online ISBN: 978-3-030-36056-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)