Abstract
Nowadays, searching for a topic on the Internet can be a frustrating experience because of all the excessive information. Thus, a strategy for automatically classifying the results can improve user experience and work efficiency. Latent Semantic Indexing (LSI) algorithm is used to classify documents by meaning due to its effectiveness. However, there is a problem with the implementation of this algorithm. LSI is computationally intensive because the cost is directly related to the number of documents. In particular, the Singular Value Decomposition (SVD) that is mainly used in LSI is unscalable in terms of both memory and computation time. One possible solution is to use more powerful computational resources, such as multiple computing nodes. In this paper, a novel distributed architecture for the LSI algorithm is proposed. It is based on the use of microservices in a Google Cloud environment. We evaluated the performances of the proposed Cloud-based LSI, and comparison is made with standalone LSI. The results show the benefits of using distributed systems based on runtime, concurrency, and processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baird, H.S.: Fast algorithm for LSI artwork analysis. In: Papers on Twenty-Five Years of Electronic Design Automation, pp. 154–162 (1988)
Cohen, E., Fiat, A., Kaplan, H.: Associative search in peer to peer networks: harnessing latent semantics. Comput. Netw. 51(8), 1861–1881 (2007)
Bermúdez, J.G.: Diseño de elementos software con tecnologías basadas en componentes (2015)
Geewax, J.: Google Cloud Platform in Action. Manning Publications, Shelter Island (2018)
Heyman, G., Vulic, I., Moens, M.F.: C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content. Data Min. Knowl. Disc. 30(5), 1299–1323 (2016)
Liu, F., Ma, F., Li, M., Huang, L.: Distributed information retrieval based on hierarchical semantic overlay network. In: Jin, H., Pan, Y., Xiao, N., Sun, J. (eds.) GCC 2004. LNCS, vol. 3251, pp. 657–664. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30208-7_88
Liu, Y., Jing, W., Liu, Y., Lv, L., Qi, M., Xiang, Y.: A sliding window-based dynamic load balancing for heterogeneous hadoop clusters. Concurr. Comput. Pract. Exp. 29(3), e3763 (2017)
Liu, Y., Li, M., Khan, M., Qi, M.: A mapreduce based distributed lsi for scalable information retrieval. Comput. Inform. 33(2), 259–280 (2014)
Maarala, A.I., Rautiainen, M., Salmi, M., Pirttikangas, S., Riekki, J.: Low latency analytics for streaming traffic data with apache spark. In: IEEE International Conference on Big Data, pp. 2855–2858. IEEE (2015)
Mbah, R.B.K., Rege, M., Misra, B.: Using spark and scala for discovering latent trends in job markets. In: 3rd International Conference on Compute and Data Analysis, pp. 55–62 (2019)
García, J.N.: Orquestación de contenedores con Kubernetes. B.S. thesis (2018)
Peter, R., Shivapratap, G., Divya, G., Soman, K.: Evaluation of SVD and NMF methods for latent semantic analysis. Int. J. Recent Trends Eng. 1(3), 308 (2009)
Soriano, J., Au, T., Banks, D.: Text mining in computational advertising. Stat. Anal. Data Min. 6(4), 273–285 (2013)
Sosa Erazo, M.V., Zambonino Altamirano, M.A.: Estado de arte de" Latent Semantic Index" con una prueba experimental. B.S. thesis (2018)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. ACM SIGCOMM Comput. Commun. Rev. 33(4), 175–186 (2003)
Thorleuchter, D., Van den Poel, D.: Weak signal identification with semantic web mining. Expert Syst. Appl. 40(12), 4978–4985 (2013)
Thorleuchter, D., Van den Poel, D.: Semantic compared cross impact analysis. Expert Syst. Appl. 41(7), 3477–3483 (2014)
Zhang, S., Wu, G., Chen, G., Xu, L.: On building and updating distributed LSI for P2P systems. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds.) ISPA 2005. LNCS, vol. 3759, pp. 9–16. Springer, Heidelberg (2005). https://doi.org/10.1007/11576259_2
Zhang, W., Yoshida, T., Tang, X.: A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Syst. Appl. 38(3), 2758–2765 (2011)
Acknowledgments
This work was supported by IDEIAGEOCA Research Group of Universidad Politécnica Salesiana in Quito, Ecuador.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Proaño, J., Reinoso, A., Juma, J. (2020). Latent Semantic Index: A Microservices Architecture. In: Narváez, F., Vallejo, D., Morillo, P., Proaño, J. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2019. Communications in Computer and Information Science, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-030-46785-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-46785-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46784-5
Online ISBN: 978-3-030-46785-2
eBook Packages: Computer ScienceComputer Science (R0)