Abstract
We study multi-pattern matching in a scenario where the pattern set is to be matched to several texts and hence indexing the pattern set is affordable. This kind of scenarios arise, for example, in metagenomics, where pattern set represents DNA of several species and the goal is to find out which species are represented in the sample and in which quantity. We develop a generic search method that exploits bidirectional indexes both for the pattern set and texts, and analyze the best and worst case running time of the method on worst case text. We show that finding the instance of the search method with minimum best case running time on worst case text is NP-hard. The positive result is that an instance with logarithm-factor approximation to minimum best case running time can be found in polynomial time using a bidirectional index called affix tree. We further show that affix trees can be simulated, in reduced space, using bidirectional variant of compressed suffix trees.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report Technical Report 124, Digital Equipment Corporation (1994)
Clark, D.R.: Compact pat trees. PhD thesis, Waterloo, Ont., Canada, Canada (1998)
Li, R., et al.: Soap2. Bioinformatics 25(15), 1966–1967 (2009)
Ferragina, P., Manzini, G.: Indexing compressed texts. Journal of the ACM 52(4), 552–581 (2005)
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theor. Comput. Sci. 410(51), 5354–5364 (2009)
Fischer, J., Mäkinen, V., Välimäki, N.: Space efficient string mining under frequency constraints. In: ICDM, pp. 193–202 (2008)
Gagie, T., Karhu, K., Kärkkäinen, J., Mäkinen, V., Salmela, L., Tarhio, J.: Indexed Multi-pattern Matching. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 399–407. Springer, Heidelberg (2012)
Gagie, T., Puglisi, S.J., Turpin, A.: Range Quantile Queries: Another Virtue of Wavelet Trees. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 1–6. Springer, Heidelberg (2009)
Handelsman, J., Rondon, M.R., Brady, S.F., Clardy, J., Goodman, R.: Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry & Biology 5, 245–249 (1998)
Hui, L.C.K.: Color set size problem with application to string matching. In: Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, pp. 230–243. Springer, London (1992)
Jacobson, G.: Succinct Static Data Structures. PhD thesis. Carnegie–Mellon University, CMU-CS-89-112 (1989)
Karhu, K.: Improving exact search of multiple patterns from a compressed suffix array. In: Holub, J., Žďárek, J. (eds.) Proceedings of the Prague Stringology Conference 2011, pp. 226–231. Czech Technical University in Prague, Czech Republic (2011)
Karhu, K., Mäkinen, V.: Practical multi-pattern matching with bidirectional indexes. Submitted manuscript (2012)
Lam, T.W., Li, R., Tam, A., Wong, S., Wu, E., Yiu, S.M.: High throughput short read alignment via bi-directional BWT. In: IEEE International Conference on Bioinformatics and Biomedicine, vol. 0, pp. 31–36 (2009)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology 10(3), R25 (2009)
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Maaß, M.G.: Linear bidirectional on-line construction of affix trees. Algorithmica 37(1), 43–74 (2003)
Mäkinen, V., Välimäki, N., Laaksonen, A., Katainen, R.: Unified View of Backward Backtracking in Short Read Mapping. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 182–195. Springer, Heidelberg (2010)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)
Russo, L.M.S., Navarro, G., Oliveira, A.L.: Fully compressed suffix trees. ACM Trans. Algorithms 7, 53:1–53:34 (2011)
Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comp. Sys. 41, 589–607 (2007)
Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional Search in a String with Wavelet Trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010)
Stoye, J.: Affix trees. Technical Report 2000-04, Faculty of Technology, Bielefeld University (2000), http://www.techfak.uni-bielefeld.de/~stoye/rpublications/report00-04.pdf
Vazirani, V.V.: Approximation Algorithms. Springer (2001)
Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gog, S., Karhu, K., Kärkkäinen, J., Mäkinen, V., Välimäki, N. (2012). Multi-pattern Matching with Bidirectional Indexes. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds) Computing and Combinatorics. COCOON 2012. Lecture Notes in Computer Science, vol 7434. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32241-9_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-32241-9_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32240-2
Online ISBN: 978-3-642-32241-9
eBook Packages: Computer ScienceComputer Science (R0)