Abstract
In this paper, we compare two models of document similarity search over a text stream of articles which are collected daily from online News sites. The first model uses the word to vector (Word2Vec), neural-network-based document embedding is known as the document to vector (Doc2Vec) and k-NN technique to perform similarity search in a tree structure called M-Tree. The second model applies Gensim model to do the same job of document similarity search. We use the metric which measures the accuracy of the documents similar to document d when considering if they are in the same category with the document d or not. We also do the experiment and evaluation, analyze experimental results, discuss and propose solutions for improvement. Our main contributions are to compare the two solutions in performing document similarity queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zheng, Y., Lu, R., Shao, J.: Achieving efficient and privacy-preserving k-NN query for outsourced eHealthcare data. J. Med. Syst. 43(5), 123 (2019)
Kamarulzalis, A.H., Abdullah, M.A.A.: An improvement algorithm for iris classification by using linear support vector machine (LSVM), k-nearest neighbours (k-NN) and random nearest neighbors (RNN). J. Math. Comput. Sci. 5(1), 32–38 (2019)
Liu, Z.-G., et al.: A new pattern classification improvement method with local quality matrix based on K-NN. Knowl.-Based Syst. 164, 336–347 (2019)
Hong Phuong, L., Thi Minh Huyên, N., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of vietnamese texts. In: Language and Automata Theory and Applications, p. 240 (2008)
Hong, T.V.T., Do, P.: Developing a graph-based system for storing, exploiting and visualizing text stream. In: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing. ACM (2018)
Streiner, D.L., Cairney, J.: What’s under the ROC? An introduction to receiver operating characteristics curves. Can. J. Psychiatry 52(2), 121–128 (2007)
Acknowledgments
This research is funded by Thu Dau Mot university, Binh Duong, and Vietnam National University Ho Chi Minh City (VNU-HCMC) under the grant number B2017-26-02.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hong, T.V.T., Do, P. (2020). Comparing Two Models of Document Similarity Search over a Text Stream of Articles from Online News Sites. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-33585-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33584-7
Online ISBN: 978-3-030-33585-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)