Abstract
Taking Chinese as the language object, after analyzing the current Chinese word segmentation algorithm and Lucene relevance ranking algorithm, an improved word segmentation algorithm and an improved relevance ranking algorithm based on Lucene full-text search toolkit were proposed. This paper also uses distributed storage, parallel computing, inverted indexing and retrieval techniques to analyze and design a search engine for digital information in the network to provide users with fast and accurate search service for massive digital information. The experimental analysis compares the speed of word segmentation and word segmentation by comparing various word segmentation algorithms and compares their response time, the number of hits, the accuracy and the recall rate of the keyword search results. The experimental results show that the system greatly improves the information Search speed to ensure the accuracy of search results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Erciyes, K.: Parallel and distributed computing. In: Erciyes, K. (ed.) Distributed and Sequential Algorithms for Bioinformatics. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-24966-7_4
Ding, G.Q., Lin, M.: Research the key technologies of the Mongolian full-text retrieval based on Lucene. Appl. Mech. Mater. 347–350, 2185–2190 (2013)
Malekimajd, M., Ardagna, D., Ciavotta, M., et al.: Optimal map reduce job capacity allocation in cloud systems. ACM SIGMETRICS Perform. Eval. Rev. 42(4), 51–61 (2015)
Wang, H.W., Wang, W., Meng, Y.: Countering page ranking spam for search engine based on text content and link structure analysis. Syst. Eng. Theory Pract. 35(2), 445–457 (2015). Xitong Gongcheng Lilun Yu Shijian
Gennaro, C.: Large scale deep convolutional neural network features search with Lucene (2016)
Stalnaker, D., Zanibbi, R.: Math expression retrieval using an inverted index over symbol pairs. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 9402, pp. 940207–940207-12 (2015)
Procházka, P., Holub, J.: Positional inverted self-index. In: Data Compression Conference, pp. 627–627. IEEE (2016)
Wei, D., Hong, M., Song, Y.: Research of the Mongolian synergistic index technology based on Lucene. In: IEEE International Conference on Software Engineering and Service Science, pp. 322–325. IEEE (2015)
Gupta, D., Singh, D.: User preference based page ranking algorithm. In: International Conference on Computing, Communication and Automation, pp. 166–171. IEEE (2017)
Beebe, N.L., Liu, L.: Ranking algorithms for digital forensic string search hits. Digit. Investig. 11(S2), S124–S132 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, R. (2018). Research on Key Technology of Distributed Indexing and Retrieval System Based on Lucene. In: Li, K., Li, W., Chen, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2017. Communications in Computer and Information Science, vol 874. Springer, Singapore. https://doi.org/10.1007/978-981-13-1651-7_23
Download citation
DOI: https://doi.org/10.1007/978-981-13-1651-7_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1650-0
Online ISBN: 978-981-13-1651-7
eBook Packages: Computer ScienceComputer Science (R0)