Skip to main content

A Comparison of Weighted Support Vector Machine (WSVM), One-Step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for Mislabeled Data

  • Conference paper
  • First Online:
Recent Advances on Soft Computing and Data Mining (SCDM 2020)

Abstract

Labeling error can occur for various reasons such as the subjective nature of the labeling task, the lack of information to determine the true label of a given example and data entry error. Labeling errors were categorized as mislabeled, unlabeled, partially labeled, incompletely labeled and illegible label. In this study, the focus will be on mislabeled data. The problem of dealing with mislabeled data and in particular of constructing a classifier from such data has been approached from a number of different directions. Therefore, developing learning algorithms that effectively and efficiently deal with mislabeled data is a great practical importance and key aspect in machine learning. Support Vector Machine (SVM) has been widely accepted to be one of the most effective techniques in machine learning algorithms. One of the main drawbacks of SVM is it depends on only a small part of the data points (support vectors) and it treats all training data of a given class equally. To address this problem, one of the solution is the Weighted Support Vector Machines (WSVM). Wu & Liu proposed two different WSVM namely one-step WSVM (OWSVM) and iteratively WSVM (IWSVM). In this paper, a comparison of Weighted Support Vector Machine (WSVM), One-step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for mislabeled data has been done to see the classification accuracy of each of the method. The three methods were compared based on correctly labeled, mislabeled data, data within margin, mislabeled data within margin and classification accuracy for eight KEEL repository datasets using 20% noise in training data. Based on the experimental results, the performance of OWSVM is better than both WSVM and IWSVM based on the correctly labeled, mislabeled data, data within margin, mislabeled data within margin and classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Reddy M (2018) Ground Truth Gold—Intelligent data labeling and annotation. The Hive

    Google Scholar 

  2. Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167

    Article  Google Scholar 

  3. Frénay B, Kabán A (2014) A comprehensive introduction to label noise. In: European Symposium on Artificial Neural Networks. Comput Intell Mach Learn 23–25

    Google Scholar 

  4. Wagar EA, Stankovic AK, Raab S, Nakhleh RE, Walsh MK (2008) Specimen labeling errors: a Q-probes analysis of 147 clinical laboratories. Arch Pathol Lab Med

    Google Scholar 

  5. Bootkrajang J, Kabán A (2012) Label-noise robust logistic regression and its applications. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    Google Scholar 

  6. Bootkrajang J, Kabán A (2013) Classification of mislabelled microarrays using robust sparse logistic regression. Bioinformatics

    Google Scholar 

  7. Bootkrajang J (2016) A generalised label noise model for classification in the presence of annotation errors. Neurocomputing

    Google Scholar 

  8. Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Networks Learn Syst 25(5):845–869

    Article  Google Scholar 

  9. Liu T, Tao D (2015) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461

    Article  Google Scholar 

  10. Almasi ON, Rouhani M (2016) Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turkish J Electr Eng Comput Sci 24(1):219–233

    Article  Google Scholar 

  11. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop On Computational Learning Theory-COLT ’92, pp 144–152

    Google Scholar 

  12. Vapnik VN (1995) The nature of statistical learning theory, vol 8

    Google Scholar 

  13. Sabzevari M (2015) Ensemble learning in the presence of noise

    Google Scholar 

  14. Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recognit Artif Intell 21(5):961–976

    Article  Google Scholar 

  15. Fan H, Ramamohanarao K (2005) A weighting scheme based on emerging patterns for weighted support vector machines. In: 2005 IEEE International Conference on Granular Computing, pp 435–440

    Google Scholar 

  16. Tian J, Gu H, Liu W, Gao C (2011) Robust prediction of protein subcellular localization combining PCA and WSVMs. Comput Biol Med 41(8):648–652

    Article  Google Scholar 

  17. Wu Y, Liu Y (2013) Adaptively weighted large margin classifiers. J Comput Graph Stat 22(2):37–41

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syarizul Amri Mohd Dzulkifli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dzulkifli, S.A.M., Salleh, M.N.M., Bahrudin, I.A. (2020). A Comparison of Weighted Support Vector Machine (WSVM), One-Step WSVM (OWSVM) and Iteratively WSVM (IWSVM) for Mislabeled Data. In: Ghazali, R., Nawi, N., Deris, M., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2020. Advances in Intelligent Systems and Computing, vol 978. Springer, Cham. https://doi.org/10.1007/978-3-030-36056-6_43

Download citation

Publish with us

Policies and ethics