Abstract
In this paper we consider the of a string in which and, for i > 1, iff k is the largest integer such that . The prefix array is closely related to the : an integer array [1..n] such that iff the length of the longest border of is k. Border arrays or their variants are used in many string algorithms and prefix arrays can be used directly for pattern-matching. It is well known that for regular strings provides all the information that does; we show however that for indeterminate strings (those containing entries that match a subset of the alphabet) actually provides more information, in fact still enabling all the borders of every prefix of to be specified. Since a lot of the entries of are expected to be zeros, it is natural to represent in compressed form using integer arrays and , where m is the number of nonzero entries in and iff the \(j^{\mbox{th}}\) nonzero entry in occurs in position and takes the value . The expected value of m is n/σ− 1, where σ is the alphabet size. The straightforward way of computing POS/LEN requires computing first, therefore requires O(n) extra space. We describe two Θ(n)-time algorithms PL1 & PL2 to compute POS/LEN for regular strings using only 8m bytes of storage in addition to the n bytes required for . PL1 requires about one-third the time of the standard border array algorithm MP on English-language strings; PL2 executes faster than MP on both English-language and highly periodic strings on {a,b}. For indeterminate strings, we describe an extension IPL of PL1 that computes POS/LEN in O(n 2) worst-case time (though generally much faster), still using only 8m bytes of additional storage. For both regular and indeterminate strings, the compressed form of can be used for efficient pattern-matching.
This work was supported in part by the Natural Sciences & Engineering Research Council of Canada. The authors thank Maxime Crochemore for helpful discussions and anonymous referees for valuable suggestions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Comm. Assoc. Comput. Mach. 18(6), 333–340 (1975)
Boyer, R.S., Strother Moore, J.: A fast string searching algorithm. Comm. Assoc. Comput. Mach. 20(10), 762–772 (1977)
Blanchet-Sadri, F., Hegstrom, R.A.: Partial words and a theorem of Fine and Wilf revisited. Theoret. Comput. Sci. 270(1/2), 401–409 (2002)
Crochemore, M.: private communication
Crochemore, M., Czumaj, A., Ga̧sieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12, 247–267 (1994)
Cole, R., Hariharan, R.: Tree Pattern Matching to Subset Matching in Linear Time. SIAM J. Comput. 32(4), 1056–1066 (2003)
Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic O(n log 3 m) time. In: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 245–254 (1999)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithmique du Texte, Vuibert, 347 p. (2001)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)
Duval, J.-P.: Factorizing words over an ordered alphabet. J. Algs. 4, 363–381 (1983)
Duval, J.-P., Lecroq, T., Lefebvre, A.: Border array on bounded alphabet. J. Automata, Languages & Combinatorics 10(1), 51–60 (2005)
Fischer, M.J., Paterson, M.S.: String-matching and other products. Complexity of Computation, Proc. SIAM-AMS 7, 113–125 (1974)
Franek, F., Gao, S., Lu, W., Ryan, P.J., Smyth, W.F., Sun, Y., Yang, L.: Verifying a border array in linear time. J. Combinatorial Maths. & Combinatorial Comput. 42, 223–236 (2002)
Franek, F., Simpson, R.J., Smyth, W.F.: The maximum number of runs in a string. In: Miller, M., Park, K. (eds.) Proc. 14th Australasian Workshop on Combinatorial Algs., pp. 36–45 (2003)
Galil, Z.: On improving the worst case running time of the Boyer- Moore string matching algorithm. Comm. Assoc. Comput. Mach. 22(9), 505–508 (1979)
Holub, J., Smyth, W.F.: Algorithms on indeterminate strings. In: Proc. 14th Australasian Workshop on Combinatorial Algs., pp. 36–45 (2003)
Holub, J., Smyth, W.F., Wang, S.: Hybrid pattern-matching algorithms on indeterminate strings. In: Daykin, J., Mohamed, M., Steinhoefel, K. (eds.) Texts in Algorithmics. King’s College London Series, pp. 115–133 (2006)
Holub, J., Smyth, W.F., Wang, S.: Fast pattern-matching on indeterminate strings. J. Discrete Algs. 6(1), 37–50 (2008)
Iliopoulos, C.S., Mohamed, M., Mouchard, L., Perdikuri, K.G., Smyth, W.F., Tsakalidis, A.K.: String regularities with don’t cares. Nordic J. Comput. 10(1), 40–51 (2003)
Iliopoulos, C.S., Sohel Rahman, M., Voráček, M., Vagner, L.: The constrained longest common subsequence problem for degenerate strings. Implementation and Application of Automata, 309–311 (2007)
Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Li, Y., Smyth, W.F.: Computing the cover array in linear time. Algorithmica 32(1), 95–106 (2002)
Main, M.G.: Detecting leftmost maximal periodicities. Discrete Applied Maths. 25, 145–153 (1989)
Main, M.G., Lorentz, R.J.: An O(nlogn) algorithm for finding all repetitions in a string. J. Algs. 5, 422–432 (1984)
Morris, J.H., Pratt, V.R.: A Linear Pattern-Matching Algorithm, Tech. Rep. 40, University of California, Berkeley (1970)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smyth, W.F., Wang, S. (2008). New Perspectives on the Prefix Array. In: Amir, A., Turpin, A., Moffat, A. (eds) String Processing and Information Retrieval. SPIRE 2008. Lecture Notes in Computer Science, vol 5280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89097-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-89097-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89096-6
Online ISBN: 978-3-540-89097-3
eBook Packages: Computer ScienceComputer Science (R0)