Abstract
For a string w over an alphabet Σ, we consider a composite data structure called the all-suffixes directed acyclic word graph (ASDAWG). ASDAWG(w) has |w| + 1 initial nodes, and the dag induced by all reachable nodes from the k-th initial node conforms with DAWG(w[k:]), where w[k:] denotes the k-th suffix of w. We prove that the size of the minimum ASDAWG(w) (MASDAWG(w)) is Θ(|w|) for |Σ| = 1, and is Θ(|w|2) for |Σ|≥ 2. Moreover, we introduce an on-line algorithm which directly constructs MASDAWG(w) for given w, whose running time is linear with respect to its size. We also demonstrate some application problems, beginning-sensitive pattern matching, region-sensitive pattern matching, and VLDC-pattern matching, for which AS-DAWGs are useful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Angluin. Finding patterns common to a set of strings. J. Comput. Sys. Sci., 21:46–62, 1980.
R. A. Baeza-Yates. Searching subsequences (note). Theoretical Computer Science, 78(2):363–376, Jan. 1991.
A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen, and J. Seiferas. The smallest automaton recognizing the subwords of a text. Theoretical Computer Science, 40:31–55, 1985.
A. Blumer, J. Blumer, D. Haussler, R. McConnell, and A. Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578–595, 1987.
M. Crochemore. Transducers and repetitions. Theoretical Computer Science, 45:63–86, 1986.
M. Crochemore and Z. Troníček. Directed acyclic subsequence graph for multiple texts. Technical Report IGM-99-13, Institut Gaspard-Monge, June 1999.
M. Crochemore and R. Vérin. On compact directed acyclic word graphs. In J. Mycielski, G. Rozenberg, and A. Salomaa, editors, Structures in Logic and Computer Science, volume 1261 of LNCS, pages 192–211. Springer-Verlag, 1997.
R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. of 32nd ACM Symposium on Theory of Computing (STOC’00), pages 397–406, 2000.
M. Hirao, H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa. A practical algorithm to find the best subsequence patterns. In S. Arikawa and S. Morishita, editors, Proc. The Third International Conference on Discovery Science, volume 1967 of LNAI, pages 141–154. Springer-Verlag, 2000.
M. Hirao, S. Inenaga, A. Shinohara, M. Takeda, and S. Arikawa. A practical algorithm to find the best episode patterns. In K. P. Jantke and A. Shinohara, editors, Proc. The Fourth International Conference on Discovery Science, volume 2226 of LNAI, pages 435–440. Springer-Verlag, 2001.
S. Inenaga, H. Hoshino, A. Shinohara, M. Takeda, S. Arikawa, G. Mauri, and G. Pavesi. On-line construction of compact directed acyclic word graphs. In A. Amir and G. M. Landau, editors, Proc. 12th Annual Symposium on Combinatorial Pattern Matching (CPM’01), volume 2089 of LNCS, pages 169–180. Springer-Verlag, 2001.
J. Kärkkäinen. Suffix cactus: A cross between suffix tree and suffix array. In Z. Galil and E. Ukkonen, editors, Proc. 6th Annual Symposium on Combinatorial Pattern Matching (CPM’95), volume 973 of LNCS, pages 191–204. Springer-Verlag, 1995.
V. Mäkinen. Compact suffix array. In R. Giancarlo and D. Sankoff, editors, Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM’00), volume 1848 of LNCS, pages 305–319. Springer-Verlag, 2000.
U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Compt., 22(5):935–948, 1993.
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episode in sequences. In U. M. Fayyad and R. Uthurusamy, editors, Proc. 1st International Conference on Knowledge Discovery and Data Mining, pages 210–215. AAAI Press, Aug. 1995.
E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262–272, Apr. 1976.
D. Revuz. Minimization of acyclic deterministic automata in linear time. Theoretical Computer Science, 92(1):181–189, Jan. 1992.
K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proc. of 11th International Symposium on Algorithms and Computation (ISAAC’00), volume 1969 of LNCS, pages 410–421. Springer-Verlag, 2000.
Z. Troníček. Episode matching. In A. Amir and G. M. Landau, editors, Proc. 12th Annual Symposium on Combinatorial Pattern Matching (CPM’01), volume 2089 of LNCS, pages 143–146. Springer-Verlag, 2001.
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.
P. Weiner. Linear pattern matching algorithms. In Proc. 14th Annual Symposium on Switching and Automata Theory, pages 1–11, Oct. 1973.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Inenaga, S., Takeda, M., Shinohara, A., Hoshino, H., Arikawa, S. (2002). The Minimum DAWG for All Suffixes of a String and Its Applications. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_14
Download citation
DOI: https://doi.org/10.1007/3-540-45452-7_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43862-5
Online ISBN: 978-3-540-45452-6
eBook Packages: Springer Book Archive