Abstract
This paper [5] deals with the problem of estimating, using enhanced AI techniques, a transmitted string X * by processing the corresponding string Y, which is a noisy version of X *. We assume that Y contains substitution, insertion and deletion errors, and that X * is an element of a finite (possibly large) dictionary, H. The best estimate X + of X * is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X, Y) between X and Y, for all X ∈ H. In this paper, we show how we can evaluate D(X, Y) for every X ∈ H simultaneously, when the edit distances are general and the maximum number of errors is not given a priori, and when H is stored as a trie. We first introduce a new scheme, Clustered Beam Search (CBS), a heuristic-based search approach that enhances the well known Beam Search (BS) techniques [33] contained in Artificial Intelligence (AI). It builds on BS with respect to the pruning time. The new technique is compared with the Depth First Search (DFS) trie-based technique [36] (with respect to time and accuracy) using large and small dictionaries. The results demonstrate a marked improvement up to (75%) with respect to the total number of operations needed on three benchmark dictionaries, while yielding an accuracy comparable to the optimal. Experiments are also done to show the benefits of the CBS over the BS when the search is done on the trie. The results also demonstrate a marked improvement (more than 91%) for large dictionaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acharya, A., Zhu, H., Shen, K.: Adaptive algorithms for cache-efficient trie search. In: ACM and SIAM Workshop on Algorithm Engineering and Experimentation (January 1999)
Amengual, J.C., Vidal, E.: Efficient error-correcting viterbi parsing. IEEE Transactions on Communications 20(10), 1109–1116 (1998)
Amengual, J.C., Vidal, E.: The viterbi algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(10), 268–278 (1998)
Badr, G., Oommen, B.J.: Enhancing trie-based syntactic pattern recognition using ai heuristic search strategies. Unabridged version of the present paper
Badr, G., Oommen, B.J.: Search-enhanced trie-based syntactic pattern recognition of sequences (2005), (Patent)
Baeza-Yates, R., Navarro, G.: Fast approximate string matching in a dictionary. In: Proceedings of the 5th South American Symposium on String Processing and Information Retrieval (SPIRE 1998), pp. 14–22. IEEE CS Press, Los Alamitos (1998)
Bentley, J., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans (January 1997)
Bocchieri, E.: A study of the beam-search algorithm for large vocabulary continuous speech recognition and methods for improved efficiency. In: Proc. Eurospeech, vol. 3, pp. 1521–1524 (1993)
Bouloutas, A., Hart, G.W., Schwartz, M.: Two extensions of the viterbi algorithm. IEEE Transactions on Information Theory 37(2), 430–436 (1991)
Bunke, H.: Fast approximate matching of words against a dictionary. Computing 55(1), 75–89 (1995)
Clement, J., Flajolet, P., Vallee, B.: The analysis of hybrid trie structures. In: Proc. Annual A CM-SIAM Symp. on Discrete Algorithms, San Francisco, California, pp. 531–539 (1998)
Cole, R., Gottieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, Chicago, IL, USA, June 2004, pp. 91–100 (2004)
Dewey, G.: Relative Frequency of English Speech Sounds. Harvard Univ. Press, Cambridge (1923)
Du, M., Chang, S.: An approach to designing very fast approximate string matching algorithms. IEEE Transactions on Knowledge and Data Engineering 6(4), 620–633 (1994)
Favata, J.T.: Offline general handwritten word recognition using an approximate beam matching algorithm. IEEE Transactions on pattern Analysis and Machine Intelligence 23(9), 1009–1021 (2001)
Feng, Z., Huo, Q.: Confidence guided progressive search and fast match techniques for high performance chinese/english ocr. In: Proceedings of the 16th International Conference on Pattern Recognition, August 2002, vol. 3, pp. 89–92 (2002)
Forney, G.D.: The viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)
Kashyap, R.L., Oommen, B.J.: An effective algorithm for string correction using generalized edit distances -i. description of the algorithm and its optimality. Inf. Sci. 23(2), 123–142 (1981)
Laface, P., Vair, C., Fissore, L.: A fast segmental viterbi algorithm for large vocabulary recognition. In: Proceeding ICASSP 1995, May 1995, vol. 1, pp. 560–563 (1995)
Liu, C., Koga, M., Fujisawa, H.: Lexicon-driven segmentation and recognition of handwritten character strings for japanese address reading. IEEE Transactions on pattern Analysis and Machine Intelligence 24(11), 1425–1437 (2002)
Liu, F., Afify, M., Jiang, H., Siohan, O.: A new verification-based fast-match approach to large vocabulary continuous speech recognition. In: Proceedings of European Conference on Speech Communication and Technology, September 2001, pp. 1425–1437 (2001)
Luger, G.F., Stubblefield, W.A.: Artificial Intelligence Structure and Strategies for Complex Problem Solving. Addison-Wesley, Reading (1998)
Manke, S., Finke, M., Waibel, A.: A fast technique for large vocabulary on-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (September 1996)
Miclet, L.: Grammatical inference. Syntactic and Structural Pattern Recognition and Applications, 237–290 (1990)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Oflazer, K.: Error-tolerant finite state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1), 73–89 (1996)
Oommen, B.J., Badr, G.: Dictionary-based syntactic pattern recognition using tries. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 251–259. Springer, Heidelberg (2004)
Oommen, B.J., Kashyap, R.L.: A formal theory for optimal and information theoretic syntactic pattern recognition. Pattern Recognition 31, 1159–1177 (1998)
Oommen, B.J., Loke, R.K.S.: Syntactic pattern recognition involving traditional and generalized transposition errors: Attaining the information theoretic bound. Submitted for Pubication
Pearl, J.: Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley, Reading (1984)
Perez-Cortes, J.C., Amengual, J.C., Arlandis, J., Llobet, R.: Stochastic error correcting parsing for ocr post-processing. In: International Conference on Pattern Recognition ICPR 2000 (2000)
Risvik, K.M.: Search system and method for retrieval of data, and the use thereof in a search engine, United States Patent (April 2002)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (2003)
Sankoff, D., Kruskal, J.B.: Time Warps, String Edits and Macromolecules: The Theory and practice of Sequence Comparison. Addison-Wesley, Reading (1983)
Schulz, K., Mihov, S.: Fast string correction with levenshtein-automata. International Journal of Document Analysis and Recognition 5(1), 67–85 (2002)
Shang, H., Merrettal, T.: Tries for approximate string matching. IEEE Transactions on Knowledge and Data Engineering 8(4), 540–547 (1996)
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13, 260–269 (1967)
Wolff, J.G.: A scaleable technique for best-match retrieval of sequential information using metrics-guided search. Journal of Information Science 20(1), 16–28 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Badr, G., Oommen, B.J. (2005). Enhancing Trie-Based Syntactic Pattern Recognition Using AI Heuristic Search Strategies. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_1
Download citation
DOI: https://doi.org/10.1007/11551188_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)