The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models

Rizun, Nina; Taranenko, Yurii; Waloszek, Wojciech

doi:10.1007/978-3-319-69548-8_5

Nina Rizun¹¹,
Yurii Taranenko¹² &
Wojciech Waloszek¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 786))

Included in the following conference series:

International Conference on Knowledge Engineering and the Semantic Web

806 Accesses
6 Citations

Abstract

This paper presents the algorithm of modelling and analysis of Latent Semantic Relations inside the argumentative type of documents collection. The novelty of the algorithm consists in using a systematic approach: in the combination of the probabilistic Latent Dirichlet Allocation (LDA) and Linear Algebra based Latent Semantic Analysis (LSA) methods; in considering each document as a complex of topics, defined on the basis of separate analysis of the particular paragraphs. The algorithm contains the following stages: modelling and analysis of Latent Semantic Relations consistently on LDA- and LSA-based levels; rules-based adjustment of the results of the two levels of analysis. The verification of the proposed algorithm for subjectively positive and negative Polish-language film reviews corpuses was conducted. The level of the recall rate and precision indicator, as a result of case study, allowed to draw the conclusions about the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Wokingham (2011). Second edition (1999)
Google Scholar
Bahl, L., Baker, J., Jelinek, E., Mercer, R.: Perplexity – a measure of the difficulty of speech recognition tasks. In: Program, 94th Meeting of the Acoustical Society of America, vol. 62, p. S63 (1977)
Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Blei, D.: Introduction to probabilistic topic models. Comm. ACM 55(4), 77–84 (2012)
Article Google Scholar
Ali, D., Juanzi, L., Lizhu, Z., Faqir, M.: Knowledge discovery through directed probabilistic topic models: a survey. In: Proceedings of Frontiers of Computer Science in China, pp. 280–301 (2010)
Google Scholar
Blei, D.: Topic modeling, http://www.cs.princeton.edu/~blei/topicmodeling.html
Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S.: Using latent semantic analysis to improve information retrieval. In: Proceedings of CHI 1988: Conference on Human Factors in Computing, pp. 281–285. ACM, New York (1988)
Google Scholar
Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by Latent Semantic Analysis (1990), http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf
Eden, L.: Matrix Methods in Data Mining and Pattern Recognition. SIAM (2007)
Google Scholar
Furnas, G.W., Deerwester, S., Dumais, S.T., Landauer, T.K., Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of ACM SIGIR Conference, pp. s.465–s.480. ACM, New York (1998)
Google Scholar
Salton, G., Michael, J.: McGill Introduction to Modern Information Retrieval. McGraw-Hill Computer Science Series, vol. XV, 448 p. McGraw-Hill, New York (1983)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3) (1999)
Google Scholar
Gramacki, J., Gramacki, A.: Metody algebraiczne w zadaniach eksploracji danych na przykładzie automatycznego analizowania treści dokumentów. In: XVI Konferencja PLOUG, pp. 227–249 (2010)
Google Scholar
Kapłanski, P., Rizun, N., Taranenko, Y., Seganti, A.: Text-mining similarity approximation operators for opinion mining in bi tools. In: Proceeding of the 11th Scientific Conference “Internet in the Information Society-2016”, pp. 121–141. University of Dąbrowa Górnicza (2016)
Google Scholar
Canini, K.R., Shi, L., Griffiths, T.: Online inference of topics with latent dirichlet allocation. J. Mach. Learn. Res. Proc. Track 5, 65–72 (2009)
Google Scholar
Tomanek, K.: Analiza sentymentu – metoda analizy danych jakościowych. Przykład zastosowania oraz ewaluacja słownika RID i metody klasyfikacji Bayesa w analizie danych jakościowych, Przegląd Socjologii Jakościowej, pp. 118–136 (2014), www.przegladsocjologiijakosciowej.org
Aggarwal, C., Zhai, X.: Mining Text Data. Springer, New York (2012)
Google Scholar
Leticia, H.A.: Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers, Doctor of Philosophy (Management Science), 226 p. (2011)
Google Scholar
Papadimitrious, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61, 217–235 (2000)
Article MathSciNet MATH Google Scholar
Rizun, N., Kapłanski, P., Taranenko, Y.: Development and research of the text messages semantic clustering methodology. In: 2016, Third European Network Intelligence Conference, vol. # 33, pp. 180–187. ENIC (2016)
Google Scholar
Rizun, N., Kapłanski, P., Taranenko, Y.: Method of a Two-Level Text-Meaning Similarity Approximation of the Customers’ Opinions. Economic Studies – Scientific Papers. University of Economics in Katowice, vol. 296, pp. 64–85 (2016)
Google Scholar
Rizun, N., Taranenko, Y.: Development of the algorithm of polish language film reviews preprocessing. In: Proceeding of the 2nd International Conference on Information Technologies in Management, Rocznik Naukowy Wydziału Zarządzania WSM (in print) (2017)
Google Scholar
Rui, X., Donald, C., Wunsch, I.I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), s.613–s.620 (1975)
Google Scholar
Hofman, T.: Probabilistic latent semantic analysis. In: UAI 1999, pp. 289–296 (1999); Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR, pp. 50–57 (1999)
Google Scholar
Mika, T.: Term Weighting in Short Documents for Document Categorization, Keyword Extraction and Query Expansion. PhD Thesis, Series of Publications A, Report A-2013-1 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Informatics in Management, Faculty of Management and Economics, Gdansk University of Technology, Gdańsk, Poland
Nina Rizun
Department of Applied Linguistics and Methods of Teaching Foreign Languages, Alfred Nobel University, Dnipro, Ukraine
Yurii Taranenko
Department of Software Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdańsk, Poland
Wojciech Waloszek

Authors

Nina Rizun
View author publications
You can also search for this author in PubMed Google Scholar
Yurii Taranenko
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Waloszek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nina Rizun .

Editor information

Editors and Affiliations

West Pomeranian University of Technology in Szczecin, Szczecin, Poland
Przemysław Różewski
University of Bonn, Bonn, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rizun, N., Taranenko, Y., Waloszek, W. (2017). The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models. In: Różewski, P., Lange, C. (eds) Knowledge Engineering and Semantic Web. KESW 2017. Communications in Computer and Information Science, vol 786. Springer, Cham. https://doi.org/10.1007/978-3-319-69548-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-69548-8_5
Published: 18 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69547-1
Online ISBN: 978-3-319-69548-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics