Abstract
We propose a Label Propagation based algorithm for weakly supervised text classification. We construct a graph where each document is represented by a node and edge weights represent similarities among the documents. Additionally, we discover underlying topics using Latent Dirichlet Allocation (LDA) and enrich the document graph by including the topics in the form of additional nodes. The edge weights between a topic and a text document represent level of “affinity” between them. Our approach does not require document level labelling, instead it expects manual labels only for topic nodes. This significantly minimizes the level of supervision needed as only a few topics are observed to be enough for achieving sufficiently high accuracy. The Label Propagation Algorithm is employed on this enriched graph to propagate labels among the nodes. Our approach combines the advantages of Label Propagation (through document-document similarities) and Topic Modelling (for minimal but smart supervision). We demonstrate the effectiveness of our approach on various datasets and compare with state-of-the-art weakly supervised text classification approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Chaney, A.J.B., Blei, D.M.: Visualizing topic models. In: ICWSM (2012)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)
Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR, pp. 595–602 (2008)
Druck, G., Settles, B., McCallum, A.: Active learning by labeling features. In: EMNLP, pp. 81–90 (2009)
Godbole, S., Harpale, A., Sarawagi, S., Chakrabarti, S.: Document classification through interactive supervision of document and term labels. In: PKDD, pp. 185–196 (2004)
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: NIPS (2004)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(Suppl. 1), 5228–5235 (2004)
Heinrich, G.: Parameter estimation for text analysis. Technical report, University of Leipzig (2008)
Hingmire, S., Chakraborti, S.: Topic labeled text classification: a weakly supervised approach. In: SIGIR, pp. 385–394. ACM (2014)
Hingmire, S., Chougule, S., Palshikar, G.K., Chakraborti, S.: Document classification by topic labeling. In: SIGIR, pp. 877–880. ACM (2013)
Huang, A.: Similarity measures for text document clustering. In: Proceedings of 6th New Zealand Computer Science Research Student Conference (NZCSRSC 2008), pp. 49–56 (2008)
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML, pp. 200–209 (1999)
Liu, B., Li, X., Lee, W.S., Yu, P.S.: Text classification by labeling words. In: Proceedings of 19th National Conference on Artificial Intelligence, pp. 425–430 (2004)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. - Special issue on Information Retrieval 39(2-3), 103-134 (2000)
Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on features and instances. JMLR 7, 1655–1686 (2006)
Razavi, A.H., Inkpen, D., Brusilovsky, D., Bogouslavski, L.: General topic annotation in social networks: a latent Dirichlet allocation approach. In: Zaïane, O.R., Zilles, S. (eds.) AI 2013. LNCS (LNAI), vol. 7884, pp. 293–300. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38457-8_29
Schapire, R.E., Rochery, M., Rahim, M.G., Gupta, N.K.: Incorporating prior knowledge into boosting. In: ICML, pp. 538–545 (2002)
Settles, B.: Active learning literature survey. Computer Sciences Technical report 1648, University of Wisconsin–Madison (2009)
Subramanya, A., Bilmes, J.: Soft-supervised learning for text classification. In: EMNLP, pp. 1090–1099. Association for Computational Linguistics (2008)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE Trans. Knowl. Data Eng. 20(1), 55–67 (2008)
Wu, X., Srihari, R.: Incorporating prior knowledge with weighted margin support vector machines. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–333 (2004)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Citeseer (2002)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon University (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Pawar, S., Ramrakhiyani, N., Hingmire, S., Palshikar, G.K. (2018). Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-75487-1_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75486-4
Online ISBN: 978-3-319-75487-1
eBook Packages: Computer ScienceComputer Science (R0)