Building High-Performance Classifiers Using Positive and Unlabeled Examples for Text Classification

Ke, Ting; Yang, Bing; Zhen, Ling; Tan, Junyan; Li, Yi; Jing, Ling

doi:10.1007/978-3-642-31362-2_21

Ting Ke¹⁹,
Bing Yang¹⁹,
Ling Zhen¹⁹,
Junyan Tan¹⁹,
Yi Li²⁰ &
…
Ling Jing¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7368))

Included in the following conference series:

International Symposium on Neural Networks

3326 Accesses
7 Citations

Abstract

This paper studies the problem of building text classifiers using only positive and unlabeled examples. At present, many techniques for solving this problem were proposed, such as Biased-SVM which is the existing popular method and its classification performance is better than most of two-step techniques. In this paper, an improved iterative classification approach is proposed which is the extension of Biased-SVM. The first iteration of our developed approach is Biased-SVM and the next iterations are to identify confident positive examples from the unlabeled examples. Then an extra penalty factor is given to weight these confident positive examples error. Experiments show that it is effective for text classification and outperforms the Biased-SVM and other two step techniques.

Supported by the National Natural Science Foundation of China (10971223, 11071252), Chinese Universities Scientific Fund(2011JS039, 2012Y130).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cortes, C., Vapnik, V.: Support vector network. J. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text Classification without Negative Examples Revisit. IEEE Transactions on Knowledge and Data Engineering 18(1), 6–20 (2006)
Article Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Manevitz, L., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)
Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the 12th International Machine Learning Conference, Lake Tahoe, US, pp. 331–339 (1995)
Google Scholar
Lee, W.S., Liu, B.: Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In: Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, pp. 448–455 (2003)
Google Scholar
Li, X., Liu, B.: Learning to Classify Text Using Positive and Unlabeled Data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 587–594 (2003)
Google Scholar
Li, X.-L., Liu, B., Ng, S.-K.: Learning to Classify Documents with Only a Small Positive Training Set. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 201–213. Springer, Heidelberg (2007)
Chapter Google Scholar
Li, X., Liu, B., Ng, S.: Negative Training Data can be Harmful to Text Classification. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Massachusetts, USA, pp. 218–228 (2010)
Google Scholar
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially Supervised Classification of Text Documents. In: Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, pp. 387–394 (2002)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, United States, pp. 179–188 (2003)
Google Scholar
Nigam, K., McCallum, A.K., Thrun, S.: Learning to Classify Text from Labeled and Unlabeled Documents. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 792–799. AAAI Press, United States (1998)
Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents Using EM. Mach. Learn. 39, 103–134 (2000)
Article MATH Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computer Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Yu, H., Han, J., Chang, K.C.C.: PEBL: Positive Example-Based learning for web page classification using SVM. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 239–248. ACM, United States (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, College of Science, China Agricultural University, 100083, Beijing, P.R. China
Ting Ke, Bing Yang, Ling Zhen, Junyan Tan & Ling Jing
Department of Mathematics, School of Science, Beijing University of Posts and Telecommunications, 100876, Beijing, P.R. China
Yi Li

Authors

Ting Ke
View author publications
You can also search for this author in PubMed Google Scholar
Bing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Zhen
View author publications
You can also search for this author in PubMed Google Scholar
Junyan Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Ling Jing
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mechanical & Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
Jun Wang
School of Electrical and Computer Engineering, Oklahoma State University, 74078, Stillwater, OK, USA
Gary G. Yen
Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, 1678, Nicosia, Cyprus
Marios M. Polycarpou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ke, T., Yang, B., Zhen, L., Tan, J., Li, Y., Jing, L. (2012). Building High-Performance Classifiers Using Positive and Unlabeled Examples for Text Classification. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds) Advances in Neural Networks – ISNN 2012. ISNN 2012. Lecture Notes in Computer Science, vol 7368. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31362-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-31362-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31361-5
Online ISBN: 978-3-642-31362-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics