Skip to main content

Building High-Performance Classifiers Using Positive and Unlabeled Examples for Text Classification

  • Conference paper
Advances in Neural Networks – ISNN 2012 (ISNN 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7368))

Included in the following conference series:

Abstract

This paper studies the problem of building text classifiers using only positive and unlabeled examples. At present, many techniques for solving this problem were proposed, such as Biased-SVM which is the existing popular method and its classification performance is better than most of two-step techniques. In this paper, an improved iterative classification approach is proposed which is the extension of Biased-SVM. The first iteration of our developed approach is Biased-SVM and the next iterations are to identify confident positive examples from the unlabeled examples. Then an extra penalty factor is given to weight these confident positive examples error. Experiments show that it is effective for text classification and outperforms the Biased-SVM and other two step techniques.

Supported by the National Natural Science Foundation of China (10971223, 11071252), Chinese Universities Scientific Fund(2011JS039, 2012Y130).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cortes, C., Vapnik, V.: Support vector network. J. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  2. Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text Classification without Negative Examples Revisit. IEEE Transactions on Knowledge and Data Engineering 18(1), 6–20 (2006)

    Article  Google Scholar 

  3. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  4. Manevitz, L., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)

    Google Scholar 

  5. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the 12th International Machine Learning Conference, Lake Tahoe, US, pp. 331–339 (1995)

    Google Scholar 

  6. Lee, W.S., Liu, B.: Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In: Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, pp. 448–455 (2003)

    Google Scholar 

  7. Li, X., Liu, B.: Learning to Classify Text Using Positive and Unlabeled Data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 587–594 (2003)

    Google Scholar 

  8. Li, X.-L., Liu, B., Ng, S.-K.: Learning to Classify Documents with Only a Small Positive Training Set. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 201–213. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Li, X., Liu, B., Ng, S.: Negative Training Data can be Harmful to Text Classification. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Massachusetts, USA, pp. 218–228 (2010)

    Google Scholar 

  10. Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially Supervised Classification of Text Documents. In: Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, pp. 387–394 (2002)

    Google Scholar 

  11. Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, United States, pp. 179–188 (2003)

    Google Scholar 

  12. Nigam, K., McCallum, A.K., Thrun, S.: Learning to Classify Text from Labeled and Unlabeled Documents. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 792–799. AAAI Press, United States (1998)

    Google Scholar 

  13. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Documents Using EM. Mach. Learn. 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  14. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computer Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  15. Yu, H., Han, J., Chang, K.C.C.: PEBL: Positive Example-Based learning for web page classification using SVM. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 239–248. ACM, United States (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ke, T., Yang, B., Zhen, L., Tan, J., Li, Y., Jing, L. (2012). Building High-Performance Classifiers Using Positive and Unlabeled Examples for Text Classification. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds) Advances in Neural Networks – ISNN 2012. ISNN 2012. Lecture Notes in Computer Science, vol 7368. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31362-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31362-2_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31361-5

  • Online ISBN: 978-3-642-31362-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics