Average-Case Analysis of Approximate Trie Search

Maaß, Moritz G.

doi:10.1007/978-3-540-27801-6_36

Average-Case Analysis of Approximate Trie Search

Moritz G. Maaß¹⁸

Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Abstract

For the exact search of a pattern of length m in a database of n strings of (arbitrary) length the trie data structure allows an optimal lookup time of \(\ensuremath{O\left(m\right)}\). If errors are allowed between the pattern and the database strings, no such structure with reasonable size is known. Using a trie some work can be saved and running times superior to the comparison with every string in the database can be achieved. We investigate a comparison-based model where “errors” and “matches” are defined between pairs of characters. When comparing two characters, let p be the probability of an error. Between any two strings we bound the number of errors by D, which we consider a function of n. We study the average-case complexity of the number of comparisons for searching in a trie in dependence of the parameters p and D. Our analysis yields the asymptotic behavior for memoryless sources with uniform probabilities. It turns out that there is a jump in the average-case complexity at certain thresholds for p and D. Our results can be applied for any comparison-based error model, for instance, mismatches (Hamming distance), don’t cares, or geometric character distances.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apostolico, A., Szpankowski, W.: Self-alignments in words and their applications. Journal of Algorithms 13, 446–467 (1992)
Article MATH MathSciNet Google Scholar
Baeza-Yates, R.A., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. Journal of the ACM 43(6), 915–936 (1996)
Article MATH MathSciNet Google Scholar
Baeza-Yates, R.A., Gonnet, G.H.: A fast algorithm on average for all-against-all sequence matching. String Processing and Information Retrieval Symp. SPIRE, 16–23 (1999)
Google Scholar
Briandais, R.D.L.: File searching using variable length keys. In: Proc. of the Western Joint Computer Conference, March 1959, pp. 295–298 (1959)
Google Scholar
Buchner, A., Täubig, H.: A fast method for motif detection and searching in a protein structure database. Technical Report TUM-I0314, Fakultät für Informatik, TU München (September 2003)
Google Scholar
Buchner, A., Täubig, H., Griebsch, J.: A fast method for motif detection and searching in a protein structure database. In: Proceedings of the German Conference on Bioinformatics (GCB 2003), October 2003, vol.2, pp. 186–188 (2003)
Google Scholar
Cobbs, L.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)
Google Scholar
Fields, J.L.: The uniform asymptotic expansion of a ratio of Gamma functions. In: Proc. of the Int. Conf. on Constructive Function Theory, Varna, May 1970, pp. 171–176 (1970)
Google Scholar
Flajolet, P., Puech, C.: Partial match retrieval of multidimensional data. J. ACM 33(2), 371–407 (1986)
Article MathSciNet Google Scholar
Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)
Article Google Scholar
Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)
Google Scholar
Kirschenhofer, P.: A note on alternating sums. Electronic Journal of Combinatorics 3(2) (1996)
Google Scholar
Knuth, D.E.: The Art of Computer Programming – Sorting and Searching, 2nd edn., February 1998, vol. 3. Addison-Wesley, Reading (1998)
Google Scholar
Maaß, M.G.: Average-case analysis of approximate trie search. Technical Report TUM-I0405, Fakultät für Informatik, TU München (March 2004)
Google Scholar
Morrison, D.R.: PATRICIA – practical algorithm to retrieve information coded in alphanumeric. J. of the ACM 15(4), 514–534 (1968)
Article MathSciNet Google Scholar
Navarro, G.: Approximate Text Searching. PhD thesis, University of Chile, Dept. of Computer Science, University of Chile, Santiago, Chile (1998)
Google Scholar
Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms (JDA) 1(1), 205–209 (2000) ,Special Issue on Matching Patterns
MathSciNet Google Scholar
Nörlund, N.E.: Vorlesungen über Differenzenrechnung. Springer, Berlin (1924)
MATH Google Scholar
Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computer Linguist 22(1), 73–89 (1996)
Google Scholar
Pittel, B.: Paths in a random digital tree: Limiting distributions. Adv. Appl. Prob. 18, 139–155 (1986)
Article MATH MathSciNet Google Scholar
Prodinger, H., Szpankowski, W (Guest Editors): Theoretical Computer Science, vol. 144(1–2). Elsevier, Amsterdam (1995)
Google Scholar
Schulz, K.U., Mihov, S.: Fast string correction with Levenshtein automata. Int. J. on Document Analysis and Recognition (IJDAR) 5, 67–85 (2002)
Article MATH Google Scholar
Szpankowski, W.: The evaluation of an alternative sum with applications to the analysis of some data structures. Information Processing Letters 28, 13–19 (1988)
Article MATH MathSciNet Google Scholar
Szpankowski, W.: Some results on v-ary asymmetric tries. J. of Algorithms 9, 224–244 (1988)
Article MATH MathSciNet Google Scholar
Szpankowski, W.: Average Case Analysis of Algorithms on Sequences, 1st edn. Interscience. Wiley, Chichester (2000)
Google Scholar
Tricomi, F.G., Erdélyi, A.: The asymptotic expansion of a ratio of Gamma functions. Pacific J. of Mathematics 1, 133–142 (1951)
MATH Google Scholar
Ukkonen, E.: Approximate string-matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 228–242. Springer, Heidelberg (1993)
Chapter Google Scholar
Werner, F.A.O., Durstewitz, G., Habermann, F.A., Thaller, G., Krämer, W., Kollers, S., Buitkamp, J., Georges, M., Brem, G., Mosner, J., Fries, R.: Detection and characterization of SNPs useful for identity control and parentage testing in major European dairy breeds. In: Animal Genetics (2003) (to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik, TU München, Boltzmannstr. 3, D-85748, Garching, Germany
Moritz G. Maaß

Authors

Moritz G. Maaß
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Suleyman Cenk Sahinalp
Google Inc., 76 9th Av, 4th Fl., 10011, New York, NY
S. Muthukrishnan
Tom Sawyer Software, 94612, Oakland, CA, USA
Ugur Dogrusoz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maaß, M.G. (2004). Average-Case Analysis of Approximate Trie Search. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-27801-6_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics