Skip to main content

Enhancing Trie-Based Syntactic Pattern Recognition Using AI Heuristic Search Strategies

  • Conference paper
Pattern Recognition and Data Mining (ICAPR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3686))

Included in the following conference series:

Abstract

This paper [5] deals with the problem of estimating, using enhanced AI techniques, a transmitted string X * by processing the corresponding string Y, which is a noisy version of X *. We assume that Y contains substitution, insertion and deletion errors, and that X * is an element of a finite (possibly large) dictionary, H. The best estimate X  +  of X * is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X, Y) between X and Y, for all XH. In this paper, we show how we can evaluate D(X, Y) for every XH simultaneously, when the edit distances are general and the maximum number of errors is not given a priori, and when H is stored as a trie. We first introduce a new scheme, Clustered Beam Search (CBS), a heuristic-based search approach that enhances the well known Beam Search (BS) techniques [33] contained in Artificial Intelligence (AI). It builds on BS with respect to the pruning time. The new technique is compared with the Depth First Search (DFS) trie-based technique [36] (with respect to time and accuracy) using large and small dictionaries. The results demonstrate a marked improvement up to (75%) with respect to the total number of operations needed on three benchmark dictionaries, while yielding an accuracy comparable to the optimal. Experiments are also done to show the benefits of the CBS over the BS when the search is done on the trie. The results also demonstrate a marked improvement (more than 91%) for large dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acharya, A., Zhu, H., Shen, K.: Adaptive algorithms for cache-efficient trie search. In: ACM and SIAM Workshop on Algorithm Engineering and Experimentation (January 1999)

    Google Scholar 

  2. Amengual, J.C., Vidal, E.: Efficient error-correcting viterbi parsing. IEEE Transactions on Communications 20(10), 1109–1116 (1998)

    Google Scholar 

  3. Amengual, J.C., Vidal, E.: The viterbi algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(10), 268–278 (1998)

    Article  Google Scholar 

  4. Badr, G., Oommen, B.J.: Enhancing trie-based syntactic pattern recognition using ai heuristic search strategies. Unabridged version of the present paper

    Google Scholar 

  5. Badr, G., Oommen, B.J.: Search-enhanced trie-based syntactic pattern recognition of sequences (2005), (Patent)

    Google Scholar 

  6. Baeza-Yates, R., Navarro, G.: Fast approximate string matching in a dictionary. In: Proceedings of the 5th South American Symposium on String Processing and Information Retrieval (SPIRE 1998), pp. 14–22. IEEE CS Press, Los Alamitos (1998)

    Chapter  Google Scholar 

  7. Bentley, J., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans (January 1997)

    Google Scholar 

  8. Bocchieri, E.: A study of the beam-search algorithm for large vocabulary continuous speech recognition and methods for improved efficiency. In: Proc. Eurospeech, vol. 3, pp. 1521–1524 (1993)

    Google Scholar 

  9. Bouloutas, A., Hart, G.W., Schwartz, M.: Two extensions of the viterbi algorithm. IEEE Transactions on Information Theory 37(2), 430–436 (1991)

    Article  MathSciNet  Google Scholar 

  10. Bunke, H.: Fast approximate matching of words against a dictionary. Computing 55(1), 75–89 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  11. Clement, J., Flajolet, P., Vallee, B.: The analysis of hybrid trie structures. In: Proc. Annual A CM-SIAM Symp. on Discrete Algorithms, San Francisco, California, pp. 531–539 (1998)

    Google Scholar 

  12. Cole, R., Gottieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, Chicago, IL, USA, June 2004, pp. 91–100 (2004)

    Google Scholar 

  13. Dewey, G.: Relative Frequency of English Speech Sounds. Harvard Univ. Press, Cambridge (1923)

    Google Scholar 

  14. Du, M., Chang, S.: An approach to designing very fast approximate string matching algorithms. IEEE Transactions on Knowledge and Data Engineering 6(4), 620–633 (1994)

    Article  Google Scholar 

  15. Favata, J.T.: Offline general handwritten word recognition using an approximate beam matching algorithm. IEEE Transactions on pattern Analysis and Machine Intelligence 23(9), 1009–1021 (2001)

    Article  Google Scholar 

  16. Feng, Z., Huo, Q.: Confidence guided progressive search and fast match techniques for high performance chinese/english ocr. In: Proceedings of the 16th International Conference on Pattern Recognition, August 2002, vol. 3, pp. 89–92 (2002)

    Google Scholar 

  17. Forney, G.D.: The viterbi algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  18. Kashyap, R.L., Oommen, B.J.: An effective algorithm for string correction using generalized edit distances -i. description of the algorithm and its optimality. Inf. Sci. 23(2), 123–142 (1981)

    Article  Google Scholar 

  19. Laface, P., Vair, C., Fissore, L.: A fast segmental viterbi algorithm for large vocabulary recognition. In: Proceeding ICASSP 1995, May 1995, vol. 1, pp. 560–563 (1995)

    Google Scholar 

  20. Liu, C., Koga, M., Fujisawa, H.: Lexicon-driven segmentation and recognition of handwritten character strings for japanese address reading. IEEE Transactions on pattern Analysis and Machine Intelligence 24(11), 1425–1437 (2002)

    Article  Google Scholar 

  21. Liu, F., Afify, M., Jiang, H., Siohan, O.: A new verification-based fast-match approach to large vocabulary continuous speech recognition. In: Proceedings of European Conference on Speech Communication and Technology, September 2001, pp. 1425–1437 (2001)

    Google Scholar 

  22. Luger, G.F., Stubblefield, W.A.: Artificial Intelligence Structure and Strategies for Complex Problem Solving. Addison-Wesley, Reading (1998)

    Google Scholar 

  23. Manke, S., Finke, M., Waibel, A.: A fast technique for large vocabulary on-line handwriting recognition. In: International Workshop on Frontiers in Handwriting Recognition (September 1996)

    Google Scholar 

  24. Miclet, L.: Grammatical inference. Syntactic and Structural Pattern Recognition and Applications, 237–290 (1990)

    Google Scholar 

  25. Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)

    Article  Google Scholar 

  26. Oflazer, K.: Error-tolerant finite state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1), 73–89 (1996)

    Google Scholar 

  27. Oommen, B.J., Badr, G.: Dictionary-based syntactic pattern recognition using tries. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR&SPR 2004. LNCS, vol. 3138, pp. 251–259. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  28. Oommen, B.J., Kashyap, R.L.: A formal theory for optimal and information theoretic syntactic pattern recognition. Pattern Recognition 31, 1159–1177 (1998)

    Article  Google Scholar 

  29. Oommen, B.J., Loke, R.K.S.: Syntactic pattern recognition involving traditional and generalized transposition errors: Attaining the information theoretic bound. Submitted for Pubication

    Google Scholar 

  30. Pearl, J.: Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley, Reading (1984)

    Google Scholar 

  31. Perez-Cortes, J.C., Amengual, J.C., Arlandis, J., Llobet, R.: Stochastic error correcting parsing for ocr post-processing. In: International Conference on Pattern Recognition ICPR 2000 (2000)

    Google Scholar 

  32. Risvik, K.M.: Search system and method for retrieval of data, and the use thereof in a search engine, United States Patent (April 2002)

    Google Scholar 

  33. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (2003)

    Google Scholar 

  34. Sankoff, D., Kruskal, J.B.: Time Warps, String Edits and Macromolecules: The Theory and practice of Sequence Comparison. Addison-Wesley, Reading (1983)

    Google Scholar 

  35. Schulz, K., Mihov, S.: Fast string correction with levenshtein-automata. International Journal of Document Analysis and Recognition 5(1), 67–85 (2002)

    Article  MATH  Google Scholar 

  36. Shang, H., Merrettal, T.: Tries for approximate string matching. IEEE Transactions on Knowledge and Data Engineering 8(4), 540–547 (1996)

    Article  Google Scholar 

  37. Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13, 260–269 (1967)

    Article  MATH  Google Scholar 

  38. Wolff, J.G.: A scaleable technique for best-match retrieval of sequential information using metrics-guided search. Journal of Information Science 20(1), 16–28 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Badr, G., Oommen, B.J. (2005). Enhancing Trie-Based Syntactic Pattern Recognition Using AI Heuristic Search Strategies. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_1

Download citation

  • DOI: https://doi.org/10.1007/11551188_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28757-5

  • Online ISBN: 978-3-540-28758-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics