Skip to main content

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9624))

  • 1140 Accesses

Abstract

We propose a Label Propagation based algorithm for weakly supervised text classification. We construct a graph where each document is represented by a node and edge weights represent similarities among the documents. Additionally, we discover underlying topics using Latent Dirichlet Allocation (LDA) and enrich the document graph by including the topics in the form of additional nodes. The edge weights between a topic and a text document represent level of “affinity” between them. Our approach does not require document level labelling, instead it expects manual labels only for topic nodes. This significantly minimizes the level of supervision needed as only a few topics are observed to be enough for achieving sufficiently high accuracy. The Label Propagation Algorithm is employed on this enriched graph to propagate labels among the nodes. Our approach combines the advantages of Label Propagation (through document-document similarities) and Topic Modelling (for minimal but smart supervision). We demonstrate the effectiveness of our approach on various datasets and compare with state-of-the-art weakly supervised text classification approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://qwone.com/~jason/20Newsgroups/.

  2. 2.

    http://www.cs.waikato.ac.nz/ml/weka/.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  3. Chaney, A.J.B., Blei, D.M.: Visualizing topic models. In: ICWSM (2012)

    Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR, pp. 595–602 (2008)

    Google Scholar 

  6. Druck, G., Settles, B., McCallum, A.: Active learning by labeling features. In: EMNLP, pp. 81–90 (2009)

    Google Scholar 

  7. Godbole, S., Harpale, A., Sarawagi, S., Chakrabarti, S.: Document classification through interactive supervision of document and term labels. In: PKDD, pp. 185–196 (2004)

    Google Scholar 

  8. Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: NIPS (2004)

    Google Scholar 

  9. Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(Suppl. 1), 5228–5235 (2004)

    Article  Google Scholar 

  10. Heinrich, G.: Parameter estimation for text analysis. Technical report, University of Leipzig (2008)

    Google Scholar 

  11. Hingmire, S., Chakraborti, S.: Topic labeled text classification: a weakly supervised approach. In: SIGIR, pp. 385–394. ACM (2014)

    Google Scholar 

  12. Hingmire, S., Chougule, S., Palshikar, G.K., Chakraborti, S.: Document classification by topic labeling. In: SIGIR, pp. 877–880. ACM (2013)

    Google Scholar 

  13. Huang, A.: Similarity measures for text document clustering. In: Proceedings of 6th New Zealand Computer Science Research Student Conference (NZCSRSC 2008), pp. 49–56 (2008)

    Google Scholar 

  14. Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML, pp. 200–209 (1999)

    Google Scholar 

  15. Liu, B., Li, X., Lee, W.S., Yu, P.S.: Text classification by labeling words. In: Proceedings of 19th National Conference on Artificial Intelligence, pp. 425–430 (2004)

    Google Scholar 

  16. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. - Special issue on Information Retrieval 39(2-3), 103-134 (2000)

    Google Scholar 

  17. Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on features and instances. JMLR 7, 1655–1686 (2006)

    MathSciNet  MATH  Google Scholar 

  18. Razavi, A.H., Inkpen, D., Brusilovsky, D., Bogouslavski, L.: General topic annotation in social networks: a latent Dirichlet allocation approach. In: Zaïane, O.R., Zilles, S. (eds.) AI 2013. LNCS (LNAI), vol. 7884, pp. 293–300. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38457-8_29

    Chapter  Google Scholar 

  19. Schapire, R.E., Rochery, M., Rahim, M.G., Gupta, N.K.: Incorporating prior knowledge into boosting. In: ICML, pp. 538–545 (2002)

    Google Scholar 

  20. Settles, B.: Active learning literature survey. Computer Sciences Technical report 1648, University of Wisconsin–Madison (2009)

    Google Scholar 

  21. Subramanya, A., Bilmes, J.: Soft-supervised learning for text classification. In: EMNLP, pp. 1090–1099. Association for Computational Linguistics (2008)

    Google Scholar 

  22. Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE Trans. Knowl. Data Eng. 20(1), 55–67 (2008)

    Article  Google Scholar 

  23. Wu, X., Srihari, R.: Incorporating prior knowledge with weighted margin support vector machines. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–333 (2004)

    Google Scholar 

  24. Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Citeseer (2002)

    Google Scholar 

  25. Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon University (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nitin Ramrakhiyani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pawar, S., Ramrakhiyani, N., Hingmire, S., Palshikar, G.K. (2018). Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics