Advertisement

Multi-label dataless text classification with topic modeling

  • Daochen Zha
  • Chenliang Li
Regular Paper
  • 16 Downloads

Abstract

Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few seed words relevant to each category, SMTM conducts multi-label classification for a collection of documents without any labeled document. In SMTM, each category is associated with a single category-topic which covers the meaning of the category. To accommodate with multi-label documents, we explicitly model the category sparsity in SMTM by using spike and slab prior and weak smoothing prior. That is, without using any threshold tuning, SMTM automatically selects the relevant categories for each document. To incorporate the supervision of the seed words, we propose a seed-guided biased GPU (i.e., generalized Pólya urn) sampling procedure to guide the topic inference of SMTM. Experiments on two public datasets show that SMTM achieves better classification accuracy than state-of-the-art alternatives and even outperforms supervised solutions in some scenarios.

Keywords

Dataless text classification Topic model Multi-label text classification Spike and slab prior 

Notes

Acknowledgements

This research was supported by National Natural Science Foundation of China (Nos. 61872278, 61502344), Natural Science Foundation of Hubei Province (No. 2017CFB502), Natural Scientific Research Program of Wuhan University (No. 2042017kf0225). Chenliang Li is the corresponding author.

References

  1. 1.
    Belanger D, McCallum A (2016) Structured prediction energy networks. In: Proceedings of the 36th annual international conference on machine learning, pp 983–992Google Scholar
  2. 2.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  3. 3.
    Chang MW, Ratinov LA, Roth D, Srikumar V (2008) Importance of semantic representation: dataless classification. In: Proceedings of the 23rd AAAI conference on artificial intelligence, pp 830–835Google Scholar
  4. 4.
    Chemudugunta C, Smyth P, Steyvers M (2007) Modeling general and specific aspects of documents with a probabilistic topic model. In: NIPS, pp 241–248Google Scholar
  5. 5.
    Chen G, Ye D, Xing Z, Chen J, Cambria E (2017) Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: Proceedings of the 2017 international joint conference on neural networks, pp 2377–2383Google Scholar
  6. 6.
    Chen X, Xia Y, Jin P, Carroll J (2015) Dataless text classification with descriptive lda. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2224–2231Google Scholar
  7. 7.
    Chen Z, Liu B (2014) Mining topics in documents: standing on the shoulders of big data. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1116–1125Google Scholar
  8. 8.
    Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Leveraging multi-domain prior knowledge in topic models. In: Proceedings of the 23rd international joint conference on artificial intelligence, pp 2071–2077Google Scholar
  9. 9.
    Cissé M, Al-Shedivat M, Bengio S (2016) Adios: architectures deep in output space. In: Proceedings of the 36th annual international conference on machine learning, pp 2770–2779Google Scholar
  10. 10.
    Druck G, Mann G, McCallum A (2008) Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 595–602Google Scholar
  11. 11.
    Fan RE, Lin CJ (2007) A study on threshold selection for multi-label classification. Department of Computer Science, National Taiwan University, pp 1–23Google Scholar
  12. 12.
    Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th international joint conference on artificial intelligence, pp 1606–1611Google Scholar
  13. 13.
    Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: Proceedings of the 14th ACM international conference on information and knowledge management, ACM, pp 195–200Google Scholar
  14. 14.
    Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235 (suppl 1)CrossRefGoogle Scholar
  15. 15.
    Heinrich G (2004) Parameter estimation for text analysis. Technical reportGoogle Scholar
  16. 16.
    Ishwaran H, Rao JS (2005) Spike and slab variable selection: Frequentist and Bayesian strategies. Ann Stat 33:730–773MathSciNetCrossRefGoogle Scholar
  17. 17.
    Ji S, Tang L, Yu S, Ye J (2008) Extracting shared subspace for multi-label classification. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 381–389Google Scholar
  18. 18.
    Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Mach Learn ECML–98:137–142Google Scholar
  19. 19.
    Ko Y, Seo J (2004) Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques. In: Proceedings of the 42nd annual meeting on association for computational linguistics, p 255Google Scholar
  20. 20.
    Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: Proceedings of the 35th annual international conference on machine learning, pp 957–966Google Scholar
  21. 21.
    Lacoste-Julien S, Sha F, Jordan MI (2009) Disclda: discriminative learning for dimensionality reduction and classification. In: Proceedings of the 23rd annual conference on neural information processing systems, pp 897–904Google Scholar
  22. 22.
    Li C, Wang B, Pavlu V, Aslam J (2016a) Conditional bernoulli mixtures for multi-label classification. In: International conference on machine learning, pp 2482–2491Google Scholar
  23. 23.
    Li C, Wang H, Zhang Z, Sun A, Ma Z (2016b) Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the 39th International ACM SIGIR conference on research and development in information retrieval, pp 165–174Google Scholar
  24. 24.
    Li C, Xing J, Sun A, Ma Z (2016c) Effective document labeling with very few seed words: a topic model approach. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 85–94Google Scholar
  25. 25.
    Li C, Duan Y, Wang H, Zhang Z, Sun A, Ma Z (2017) Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans Inf Syst 36(2):11:1–11:30CrossRefGoogle Scholar
  26. 26.
    Li C, Zhou W, Ji F, Duan Y, Chen H (2018a) A deep relevance model for zero-shot document filtering. In: Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, vol 1, Long Papers, pp 2300–2310Google Scholar
  27. 27.
    Li X, Guo Y (2013) Active learning with multi-label svm classification. In: Proceedings of the 23rd international joint conference on artificial intelligence, pp 1479–1485Google Scholar
  28. 28.
    Li X, Yang B (2018) A pseudo label based dataless Naive Bayes algorithm for text classification with seed words. In: Proceedings of the 27th international conference on computational linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pp 1908–1917Google Scholar
  29. 29.
    Li X, Li C, Chi J, Jihong O, Li C (2018b) Dataless text classification: A topic modeling approach with document manifold. In: Proceedings of the 27th ACM international on conference on information and knowledge managementGoogle Scholar
  30. 30.
    Lin T, Tian W, Mei Q, Cheng H (2014) The dual-sparse topic model: mining focused topics and focused terms in short text. In: Proceedings of the 23rd international conference on world wide web, pp 539–550Google Scholar
  31. 31.
    Liu B, Li X, Lee WS, Yu PS (2004) Text classification by labeling words. In: Proceedings of the 19th AAAI conference on artificial intelligence, pp 425–430Google Scholar
  32. 32.
    Liu J, Chang WC, Wu Y, Yang Y (2017) Deep learning for extreme multi-label text classification. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 115–124Google Scholar
  33. 33.
    Mahmoud H (2008) Pólya urn models. CRC Press, Boca RatonCrossRefGoogle Scholar
  34. 34.
    Mcauliffe JD, Blei DM (2008) Supervised topic models. In: Proceedings of the 22nd annual conference on neural information processing systems, pp 121–128Google Scholar
  35. 35.
    Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: WWW, pp 171–180Google Scholar
  36. 36.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th annual conference on neural information processing systems, pp 3111–3119Google Scholar
  37. 37.
    Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 262–272Google Scholar
  38. 38.
    Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, pp 248–256Google Scholar
  39. 39.
    Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 457–465Google Scholar
  40. 40.
    Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRefGoogle Scholar
  41. 41.
    Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1):157–208MathSciNetCrossRefGoogle Scholar
  42. 42.
    Soleimani H, Miller DJ (2016) Semi-supervised multi-label topic models for document classification and sentence labeling. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 105–114Google Scholar
  43. 43.
    Song Y, Roth D (2014) On dataless hierarchical text classification. In: Proceedings of the 28th AAAI conference on artificial intelligence, pp 2224–2231Google Scholar
  44. 44.
    Song Y, Upadhyay S, Peng H, Roth D (2016) Cross-lingual dataless classification for many languages. In: Proceedings of the 25th international joint conference on artificial intelligence, pp 2901–2907Google Scholar
  45. 45.
    Sun YY, Zhang Y, Zhou ZH (2010) Multi-label learning with weak label. In: Proceedings of the 24th AAAI conference on artificial intelligence, pp 593–598Google Scholar
  46. 46.
    Tao X, Li Y, Lau RY, Wang H (2012) Unsupervised multi-label text classification using a world knowledge ontology. In: Proceedings of the 2012 Pacific-Asia conference on knowledge discovery and data mining, pp 480–492Google Scholar
  47. 47.
    Tsoumakas G, Katakis I (2006) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13CrossRefGoogle Scholar
  48. 48.
    Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, pp 667–685Google Scholar
  49. 49.
    Wang B, Li C, Pavlu V, Aslam J (2017) Regularizing model complexity and label structure for multi-label text classification. arXiv preprint arXiv:1705.00740
  50. 50.
    Wang C, Blei DM (2009) Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In: Proceedings of the 23rd annual conference on neural information processing systems, pp 1982–1989Google Scholar
  51. 51.
    Wang S, Chen Z, Fei G, Liu B, Emery S (2016) Targeted topic modeling for focused analysis. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1235–1244Google Scholar
  52. 52.
    Yang B, Sun JT, Wang T, Chen Z (2009) Effective multi-label active learning for text classification. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 917–926Google Scholar
  53. 53.
    Zhu J, Ahmed A, Xing EP (2009) Medlda: maximum margin supervised topic models for regression and classification. In: Proceedings of the 26th annual international conference on machine learning, pp 1257–1264Google Scholar
  54. 54.
    Zubiaga A, García-Plaza AP, Fresno V, Martínez R (2009) Content-based clustering for tag cloud visualization. In: Proceedings of the 2009 international conference on advances in network analysis and mining, pp 316–319Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer ScienceWuhan UniversityWuhanChina
  2. 2.School of Cyber Science and EngineeringWuhan UniversityWuhanChina

Personalised recommendations