Abstract
To solve the problem of rapid retrieval of Chinese-Hmong mixed text, a multi-pattern matching algorithm in double-bytes unit combined with the idea of AC algorithm and the mismatch processing strategy of Horspool algorithm is proposed for the Chinese-Hmong mixed strings. In this algorithm, a deterministic finite automaton is constructed based on the pattern-set according to the idea of AC algorithm, and the moving distance of the pattern is calculated by the bad-character rule of the Horspool algorithm, and the text is only traversed once to complete the quick search task of all patterns by using the finite automata. The experimental results show that the proposed algorithm has a good performance in multi-pattern matching for Chinese-Hmong mixed texts in different scale, even for the mixed texts containing more than 100,000 characters, the matching efficiency is also significantly higher than the AC algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yang, Z.B., Luo, H.Y.: On the folk coinage of characters of the Miao people in Xiangxi area. J. Jishou Univ. (Soc. Sci. Edn.) 29(6), 130–134 (2008)
Zeng, L., Mo, L.P., Liu, B.Y., et al.: Extended Horspool algorithm and its application in square Hmong string pattern matching. J. Jishou Univ. (Nat. Sci. Edn.) 39(4), 150–156 (2018)
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Han, G.H., Zeng, C.: Theoretical research of KMP algorithm. Microelectron. Comput. 30(4), 30–33 (2013)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Cole, R., Hariharan, R., Paterson, M., Zwick, U.: Tighter lower bounds on the exact complexity of string matching. SIAM J. Comput. 24(6), 30–45 (1995)
Cole, R., Hariharan, R.: Tighter upper bounds on the exact complexity of string matching. SIAM J. Comput. 26(3), 803–856 (1997)
Zhao, X., He, L.F., Wang, X., et al.: An efficient pattern matching algorithm for string searching. J. Shanxi Univ. Sci. Technol. (Nat. Sci. Edn.) 35(1), 183–187 (2017)
Guibas, L.J., Odlyzko, A.M.: A new proof of the linearity of the Boyer-Moore string searching algorithm. SIAM J. Comput. 9(4), 672–682 (1980)
Sunday, D.M.: A very fast substring search algorithm. Commun. ACM 33(8), 132–142 (1990)
Wang, W.X.: Research and improvement of the BM pattern matching algorithm. J. Shanxi Normal Univ. (Nat. Sci. Edn.) 32(1), 37–39 (2017)
Horspool, R.N.: Practical fast searching in strings. Softw.-Pract. Exper. 10(6), 501–506 (1980)
Acknowledgments
This work was supported by the National Natural Science Foundation of Hunan Province (No. 2019JJ40234), the Natural Science Foundation of China (No. 61462029), the Research Study and Innovative Experimental Project for College Students in Hunan Province (No. 20180599) and the Research Study and Innovative Experimental Project for College Students in Jishou University (No. JDCX20180122).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
He, SP., Mo, LP., Kang, DW. (2019). A Multi-pattern Matching Algorithm for Chinese-Hmong Mixed Strings. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)