Skip to main content

Topic Representation using Semantic-Based Patterns

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1127))

Abstract

Topic modelling is the state of the art technique for understanding, organizing, and extracting information from text collections. Traditional topic modeling approaches apply probabilistic techniques to generate the list of topics from collections. Nevertheless, human understands, summarizes and discovers the topics based on the meaning of the content. Hence, the quality of the topic models can be improved by grasping the meaning from the content. In this paper, we propose an approach to identify sets of meaningful terms based on ontology, called Semantic-based Patterns, which represent the content of a collection of documents. A set of related semantic-based patterns can be used to represent a latent topic in the collection. The proposed Topic Representation using Semantic-based Patterns aims to generate semantically meaningful patterns based on ontology rather than term co-occurrence as what existing topic modelling methods do. The semantically meaningful patterns were evaluated by applying the information filtering to semantic-based topic representation. The semantic based patterns were used as features for information filtering and were evaluated by comparing against popular information filtering baseline systems. Topic quality was evaluated in terms of topic coherence and perplexity. The experimental results verified that the quality of the proposed patterns was better than features used in baseline systems for information filtering. Further, the quality of topic representation outperforms the generated topics of other topic modeling approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Evans, J.A., Aceves, P.: Machine translation: mining text for social theory. Ann. Rev. Sociol. 42(1), 21–50 (2016)

    Article  Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Blei, D.M.: Probabilistic topic models. Commun. ACM 427, 77–84 (2012)

    Article  Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 1999. ACM Press (1999)

    Google Scholar 

  5. Le, Q.V., Mikolov, T.: Distributed Representations of Sentences and Documents ([n. d.])

    Google Scholar 

  6. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space ([n. d.])

    Google Scholar 

  7. Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: ICML, pp. 25–32. ACM (2009)

    Google Scholar 

  8. Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. IEEE Trans. Knowl. Data Eng. 27(6), 1629–1642 (2015)

    Article  Google Scholar 

  9. Yao, L., Zhang, Y., Wei, B., Qian, H., Wang, Y.: Incorporating probabilistic knowledge into topic models. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 586–597. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_46

    Chapter  Google Scholar 

  10. Tang, Y.K., Mao, X.L., Huang, H., Shi, X., Wen, G.: Conceptualization topic modeling. Multimed. Tools Appl. 77, 3455–3471 (2017)

    Article  Google Scholar 

  11. Gao, Y., Li, Y., Lau, R.Y.K., Xu, Y., Bashar, M.A.: Finding semantically valid and relevant topics by association-based topic selection model. ACM Trans. Intelligent Syst. Technol. 9(1), 1–22 (2017)

    Article  Google Scholar 

  12. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)

    Article  Google Scholar 

  13. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase. In: Proceedings of International Conference on Management of Data - SIGMOD 2012. ACM Press (2012)

    Google Scholar 

  14. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)

    Article  Google Scholar 

  15. Wang, X., McCallum, A., Wei, X.: Topical N-grams: phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining (2007)

    Google Scholar 

  16. Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Discovering coherent topics using general knowledge. In: CIKM, pp. 209–218. ACM (2013)

    Google Scholar 

  17. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  18. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dakshi Kapugama Geeganage .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Geeganage, D.K., Xu, Y., Li, Y. (2019). Topic Representation using Semantic-Based Patterns. In: Le, T., et al. Data Mining. AusDM 2019. Communications in Computer and Information Science, vol 1127. Springer, Singapore. https://doi.org/10.1007/978-981-15-1699-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1699-3_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1698-6

  • Online ISBN: 978-981-15-1699-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics