Classification Method Utilizing Reliably Labeled Data

Nakata, Kouta; Sakurai, Shigeaki; Orihara, Ryohei

doi:10.1007/978-3-540-85563-7_20

Classification Method Utilizing Reliably Labeled Data

Kouta Nakata¹,
Shigeaki Sakurai¹ &
Ryohei Orihara¹

Conference paper

1918 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5177))

Abstract

Making an accurate classifier needs accurate labeling, and accurate labeling needs accurate domain knowledge, experience and criteria, that is, experts to label. In reality, having such experts label all data that we need is often impossible because it requires of the high cost, and sometimes we have to make use of ’cheaper’ data labeled by non-experts. In such case, experts’ and non-experts’ data are not discriminated in learning, even if mislabeled data in non-experts’ data may make the resultant classifier poor. In this paper, we propose a classification method utilizing reliably labeled data. We utilize the previous knowledge of how reliable persons have given the labels, and set the degrees of label confidence on non-experts’ data based on neighboring reliable experts data. The degrees of confidence are reflected in learning as data with higher confidence make a greater contribution to the classifier. With these assumptions, the results of experiments with publicly available data suggest that our method can make a more precise classifier than the conventional method that adopts all data equally.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Newman, D.J., Asuncion, A.: UCI machine learning repository (2007)
Google Scholar
Akkus, A., Guvenir, H.A.: K nearest neighbor classification on feature projections. In: Proc. 13th International Conf. on Machine Learning, pp. 12–19 (1996)
Google Scholar
Bay, S.D.: Combining nearest neighbor classifiers through multiple feature subsets. In: Proc. 15th International Conf. on Machine Learning, pp. 37–45 (1998)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
MathSciNet Google Scholar
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: Proc. 16th International Conf. on Machine Learning, pp. 97–105 (1999)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. 13th International Conf. on Machine Learning, pp. 148–156 (1996)
Google Scholar
Masnadi-Shirazi, H., Vasconcelos, N.: Asymmetric boosting. In: Proc. 24th International Conf. on Machine Learning, pp. 609–619 (2007)
Google Scholar
Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proc. 17th International Conf. on Machine Learning, pp. 983–990 (2000)
Google Scholar
Wang, F., Zhang, C., Shen, H.C., Wang, J.: Semi-supervised classification using linear neighborhood propagation. CVPR 1, 160–167 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

System Engineering Laboratory, Corporate Research & Developement Center, Toshiba Corporation,
Kouta Nakata, Shigeaki Sakurai & Ryohei Orihara

Authors

Kouta Nakata
View author publications
You can also search for this author in PubMed Google Scholar
Shigeaki Sakurai
View author publications
You can also search for this author in PubMed Google Scholar
Ryohei Orihara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ignac Lovrek Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakata, K., Sakurai, S., Orihara, R. (2008). Classification Method Utilizing Reliably Labeled Data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85563-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-85563-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85562-0
Online ISBN: 978-3-540-85563-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics