Skip to main content

Unsupervised Multi-label Text Classification Using a World Knowledge Ontology

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7301))

Included in the following conference series:

Abstract

The development of text classification techniques has been largely promoted in the past decade due to the increasing availability and widespread use of digital documents. Usually, the performance of text classification relies on the quality of categories and the accuracy of classifiers learned from samples. When training samples are unavailable or categories are unqualified, text classification performance would be degraded. In this paper, we propose an unsupervised multi-label text classification method to classify documents using a large set of categories stored in a world ontology. The approach has been promisingly evaluated by compared with typical text classification methods, using a real-world document collection and based on the ground truth encoded by human experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bekkerman, R., Gavish, M.: High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 231–239 (2011)

    Google Scholar 

  2. Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 333–342 (2010)

    Google Scholar 

  3. Camous, F., Blott, S., Smeaton, A.: Ontology-Based MEDLINE Document Classification. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 439–452. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Chan, L.M.: Library of Congress Subject Headings: Principle and Application. Libraries Unlimited (2005)

    Google Scholar 

  5. Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of The 19th International Joint Conference for Artificial Intelligence, pp. 1048–1053 (2005)

    Google Scholar 

  6. Houle, M.E., Grira, N.: A correlation-based model for unsupervised feature selection. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 897–900 (2007)

    Google Scholar 

  7. Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: KDD 2009: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–396 (2009)

    Google Scholar 

  8. Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD 2008 Workshop on Discovery Challenge (2008)

    Google Scholar 

  9. Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: Proceedings of 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 753–762 (2010)

    Google Scholar 

  10. Malik, H.H., Kender, J.R.: Classifying high-dimensional text and web data using very short patterns. In: Proceedings of the 2008 8th IEEE International Conference on Data Mining, ICDM 2008, pp. 923–928 (2008)

    Google Scholar 

  11. Rocha, L., Mourão, F., Pereira, A., Gonçalves, M.A., Meira Jr., W.: Exploiting temporal contexts in text classification. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 243–252 (2008)

    Google Scholar 

  12. Tao, X., Li, Y., Zhong, N.: A personalized ontology model for web information gathering. IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society Digital Library 23(4), 496–511 (2011)

    Article  Google Scholar 

  13. Wang, P., Domeniconi, C.: Building semantic kernels for text classification using wikipedia. In: KDD 2008: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–721 (2008)

    Google Scholar 

  14. Yan, Y., Okazaki, N., Matsuo, Y., Yang, Z., Ishizuka, M.: Unsupervised relation extraction by mining wikipedia texts using information from the web. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, vol. 2, pp. 1021–1029 (2009)

    Google Scholar 

  15. Yang, B., Sun, J.-T., Wang, T., Chen, Z.: Effective multi-label active learning for text classification. In: KDD 2009: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 917–926 (2009)

    Google Scholar 

  16. Yang, T., Jin, R., Jain, A.K., Zhou, Y., Tong, W.: Unsupervised transfer classification: application to text categorization. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 1159–1168 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tao, X., Li, Y., Lau, R.Y.K., Wang, H. (2012). Unsupervised Multi-label Text Classification Using a World Knowledge Ontology. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30217-6_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30217-6_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30216-9

  • Online ISBN: 978-3-642-30217-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics