Abstract
For the exact search of a pattern of length m in a database of n strings of (arbitrary) length the trie data structure allows an optimal lookup time of \(\ensuremath{O\left(m\right)}\). If errors are allowed between the pattern and the database strings, no such structure with reasonable size is known. Using a trie some work can be saved and running times superior to the comparison with every string in the database can be achieved. We investigate a comparison-based model where “errors” and “matches” are defined between pairs of characters. When comparing two characters, let p be the probability of an error. Between any two strings we bound the number of errors by D, which we consider a function of n. We study the average-case complexity of the number of comparisons for searching in a trie in dependence of the parameters p and D. Our analysis yields the asymptotic behavior for memoryless sources with uniform probabilities. It turns out that there is a jump in the average-case complexity at certain thresholds for p and D. Our results can be applied for any comparison-based error model, for instance, mismatches (Hamming distance), don’t cares, or geometric character distances.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Apostolico, A., Szpankowski, W.: Self-alignments in words and their applications. Journal of Algorithms 13, 446–467 (1992)
Baeza-Yates, R.A., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. Journal of the ACM 43(6), 915–936 (1996)
Baeza-Yates, R.A., Gonnet, G.H.: A fast algorithm on average for all-against-all sequence matching. String Processing and Information Retrieval Symp. SPIRE, 16–23 (1999)
Briandais, R.D.L.: File searching using variable length keys. In: Proc. of the Western Joint Computer Conference, March 1959, pp. 295–298 (1959)
Buchner, A., Täubig, H.: A fast method for motif detection and searching in a protein structure database. Technical Report TUM-I0314, Fakultät für Informatik, TU München (September 2003)
Buchner, A., Täubig, H., Griebsch, J.: A fast method for motif detection and searching in a protein structure database. In: Proceedings of the German Conference on Bioinformatics (GCB 2003), October 2003, vol.2, pp. 186–188 (2003)
Cobbs, L.: Fast approximate matching using suffix trees. In: Galil, Z., Ukkonen, E. (eds.) CPM 1995. LNCS, vol. 937, pp. 41–54. Springer, Heidelberg (1995)
Fields, J.L.: The uniform asymptotic expansion of a ratio of Gamma functions. In: Proc. of the Int. Conf. on Constructive Function Theory, Varna, May 1970, pp. 171–176 (1970)
Flajolet, P., Puech, C.: Partial match retrieval of multidimensional data. J. ACM 33(2), 371–407 (1986)
Fredkin, E.: Trie memory. Communications of the ACM 3(9), 490–499 (1960)
Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)
Kirschenhofer, P.: A note on alternating sums. Electronic Journal of Combinatorics 3(2) (1996)
Knuth, D.E.: The Art of Computer Programming – Sorting and Searching, 2nd edn., February 1998, vol. 3. Addison-Wesley, Reading (1998)
Maaß, M.G.: Average-case analysis of approximate trie search. Technical Report TUM-I0405, Fakultät für Informatik, TU München (March 2004)
Morrison, D.R.: PATRICIA – practical algorithm to retrieve information coded in alphanumeric. J. of the ACM 15(4), 514–534 (1968)
Navarro, G.: Approximate Text Searching. PhD thesis, University of Chile, Dept. of Computer Science, University of Chile, Santiago, Chile (1998)
Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms (JDA) 1(1), 205–209 (2000) ,Special Issue on Matching Patterns
Nörlund, N.E.: Vorlesungen über Differenzenrechnung. Springer, Berlin (1924)
Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computer Linguist 22(1), 73–89 (1996)
Pittel, B.: Paths in a random digital tree: Limiting distributions. Adv. Appl. Prob. 18, 139–155 (1986)
Prodinger, H., Szpankowski, W (Guest Editors): Theoretical Computer Science, vol. 144(1–2). Elsevier, Amsterdam (1995)
Schulz, K.U., Mihov, S.: Fast string correction with Levenshtein automata. Int. J. on Document Analysis and Recognition (IJDAR) 5, 67–85 (2002)
Szpankowski, W.: The evaluation of an alternative sum with applications to the analysis of some data structures. Information Processing Letters 28, 13–19 (1988)
Szpankowski, W.: Some results on v-ary asymmetric tries. J. of Algorithms 9, 224–244 (1988)
Szpankowski, W.: Average Case Analysis of Algorithms on Sequences, 1st edn. Interscience. Wiley, Chichester (2000)
Tricomi, F.G., Erdélyi, A.: The asymptotic expansion of a ratio of Gamma functions. Pacific J. of Mathematics 1, 133–142 (1951)
Ukkonen, E.: Approximate string-matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 228–242. Springer, Heidelberg (1993)
Werner, F.A.O., Durstewitz, G., Habermann, F.A., Thaller, G., Krämer, W., Kollers, S., Buitkamp, J., Georges, M., Brem, G., Mosner, J., Fries, R.: Detection and characterization of SNPs useful for identity control and parentage testing in major European dairy breeds. In: Animal Genetics (2003) (to appear)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maaß, M.G. (2004). Average-Case Analysis of Approximate Trie Search. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-27801-6_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive