Skip to main content

Average-Case Analysis of Approximate Trie Search

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Abstract

For the exact search of a pattern of length m in a database of n strings of (arbitrary) length the trie data structure allows an optimal lookup time of \(\ensuremath{O\left(m\right)}\). If errors are allowed between the pattern and the database strings, no such structure with reasonable size is known. Using a trie some work can be saved and running times superior to the comparison with every string in the database can be achieved. We investigate a comparison-based model where “errors” and “matches” are defined between pairs of characters. When comparing two characters, let p be the probability of an error. Between any two strings we bound the number of errors by D, which we consider a function of n. We study the average-case complexity of the number of comparisons for searching in a trie in dependence of the parameters p and D. Our analysis yields the asymptotic behavior for memoryless sources with uniform probabilities. It turns out that there is a jump in the average-case complexity at certain thresholds for p and D. Our results can be applied for any comparison-based error model, for instance, mismatches (Hamming distance), don’t cares, or geometric character distances.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A., Szpankowski, W.: Self-alignments in words and their applications. Journal of Algorithms 13, 446–467 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  2. Baeza-Yates, R.A., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. Journal of the ACM 43(6), 915–936 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  3. Baeza-Yates, R.A., Gonnet, G.H.: A fast algorithm on average for all-against-all sequence matching. String Processing and Information Retrieval Symp. SPIRE, 16–23 (1999)

    Google Scholar 

  4. Briandais, R.D.L.: File searching using variable length keys. In: Proc. of the Western Joint Computer Conference, March 1959, pp. 295–298 (1959)

    Google Scholar 

  5. Buchner, A., Täubig, H.: A fast method for motif detection and searching in a protein structure database. Technical Report TUM-I0314, Fakultät für Informatik, TU München (September 2003)

    Google Scholar 

  6. Buchner, A., Täubig, H., Griebsch, J.: A fast method for motif detection and searching in a protein structure database. In: Proceedings of the German Conference on Bioinformatics (GCB 2003), October 2003, vol.2, pp. 186–188 (2003)

    Google Scholar 

  7. Cobbs, L.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)

    Google Scholar 

  8. Fields, J.L.: The uniform asymptotic expansion of a ratio of Gamma functions. In: Proc. of the Int. Conf. on Constructive Function Theory, Varna, May 1970, pp. 171–176 (1970)

    Google Scholar 

  9. Flajolet, P., Puech, C.: Partial match retrieval of multidimensional data. J. ACM 33(2), 371–407 (1986)

    Article  MathSciNet  Google Scholar 

  10. Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)

    Article  Google Scholar 

  11. Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)

    Google Scholar 

  12. Kirschenhofer, P.: A note on alternating sums. Electronic Journal of Combinatorics 3(2) (1996)

    Google Scholar 

  13. Knuth, D.E.: The Art of Computer Programming – Sorting and Searching, 2nd edn., February 1998, vol. 3. Addison-Wesley, Reading (1998)

    Google Scholar 

  14. Maaß, M.G.: Average-case analysis of approximate trie search. Technical Report TUM-I0405, Fakultät für Informatik, TU München (March 2004)

    Google Scholar 

  15. Morrison, D.R.: PATRICIA – practical algorithm to retrieve information coded in alphanumeric. J. of the ACM 15(4), 514–534 (1968)

    Article  MathSciNet  Google Scholar 

  16. Navarro, G.: Approximate Text Searching. PhD thesis, University of Chile, Dept. of Computer Science, University of Chile, Santiago, Chile (1998)

    Google Scholar 

  17. Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms (JDA) 1(1), 205–209 (2000) ,Special Issue on Matching Patterns

    MathSciNet  Google Scholar 

  18. Nörlund, N.E.: Vorlesungen über Differenzenrechnung. Springer, Berlin (1924)

    MATH  Google Scholar 

  19. Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computer Linguist 22(1), 73–89 (1996)

    Google Scholar 

  20. Pittel, B.: Paths in a random digital tree: Limiting distributions. Adv. Appl. Prob. 18, 139–155 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  21. Prodinger, H., Szpankowski, W (Guest Editors): Theoretical Computer Science, vol. 144(1–2). Elsevier, Amsterdam (1995)

    Google Scholar 

  22. Schulz, K.U., Mihov, S.: Fast string correction with Levenshtein automata. Int. J. on Document Analysis and Recognition (IJDAR) 5, 67–85 (2002)

    Article  MATH  Google Scholar 

  23. Szpankowski, W.: The evaluation of an alternative sum with applications to the analysis of some data structures. Information Processing Letters 28, 13–19 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  24. Szpankowski, W.: Some results on v-ary asymmetric tries. J. of Algorithms 9, 224–244 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  25. Szpankowski, W.: Average Case Analysis of Algorithms on Sequences, 1st edn. Interscience. Wiley, Chichester (2000)

    Google Scholar 

  26. Tricomi, F.G., Erdélyi, A.: The asymptotic expansion of a ratio of Gamma functions. Pacific J. of Mathematics 1, 133–142 (1951)

    MATH  Google Scholar 

  27. Ukkonen, E.: Approximate string-matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 228–242. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  28. Werner, F.A.O., Durstewitz, G., Habermann, F.A., Thaller, G., Krämer, W., Kollers, S., Buitkamp, J., Georges, M., Brem, G., Mosner, J., Fries, R.: Detection and characterization of SNPs useful for identity control and parentage testing in major European dairy breeds. In: Animal Genetics (2003) (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maaß, M.G. (2004). Average-Case Analysis of Approximate Trie Search. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27801-6_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22341-2

  • Online ISBN: 978-3-540-27801-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics