Skip to main content

New Perspectives on the Prefix Array

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5280))

Included in the following conference series:

Abstract

In this paper we consider the of a string in which and, for i > 1, iff k is the largest integer such that . The prefix array is closely related to the : an integer array [1..n] such that iff the length of the longest border of is k. Border arrays or their variants are used in many string algorithms and prefix arrays can be used directly for pattern-matching. It is well known that for regular strings provides all the information that does; we show however that for indeterminate strings (those containing entries that match a subset of the alphabet) actually provides more information, in fact still enabling all the borders of every prefix of to be specified. Since a lot of the entries of are expected to be zeros, it is natural to represent in compressed form using integer arrays and , where m is the number of nonzero entries in and iff the \(j^{\mbox{th}}\) nonzero entry in occurs in position and takes the value . The expected value of m is n/σ− 1, where σ is the alphabet size. The straightforward way of computing POS/LEN requires computing first, therefore requires O(n) extra space. We describe two Θ(n)-time algorithms PL1 & PL2 to compute POS/LEN for regular strings using only 8m bytes of storage in addition to the n bytes required for . PL1 requires about one-third the time of the standard border array algorithm MP on English-language strings; PL2 executes faster than MP on both English-language and highly periodic strings on {a,b}. For indeterminate strings, we describe an extension IPL of PL1 that computes POS/LEN in O(n 2) worst-case time (though generally much faster), still using only 8m bytes of additional storage. For both regular and indeterminate strings, the compressed form of can be used for efficient pattern-matching.

This work was supported in part by the Natural Sciences & Engineering Research Council of Canada. The authors thank Maxime Crochemore for helpful discussions and anonymous referees for valuable suggestions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  2. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Comm. Assoc. Comput. Mach.  18(6), 333–340 (1975)

    MathSciNet  MATH  Google Scholar 

  3. Boyer, R.S., Strother Moore, J.: A fast string searching algorithm. Comm. Assoc. Comput. Mach. 20(10), 762–772 (1977)

    MATH  Google Scholar 

  4. Blanchet-Sadri, F., Hegstrom, R.A.: Partial words and a theorem of Fine and Wilf revisited. Theoret. Comput. Sci. 270(1/2), 401–409 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  5. Crochemore, M.: private communication

    Google Scholar 

  6. Crochemore, M., Czumaj, A., Ga̧sieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12, 247–267 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cole, R., Hariharan, R.: Tree Pattern Matching to Subset Matching in Linear Time. SIAM J. Comput. 32(4), 1056–1066 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic O(n log 3 m) time. In: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 245–254 (1999)

    Google Scholar 

  9. Crochemore, M., Hancart, C., Lecroq, T.: Algorithmique du Texte, Vuibert, 347 p. (2001)

    Google Scholar 

  10. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)

    Book  MATH  Google Scholar 

  11. Duval, J.-P.: Factorizing words over an ordered alphabet. J. Algs. 4, 363–381 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  12. Duval, J.-P., Lecroq, T., Lefebvre, A.: Border array on bounded alphabet. J. Automata, Languages & Combinatorics 10(1), 51–60 (2005)

    MathSciNet  MATH  Google Scholar 

  13. Fischer, M.J., Paterson, M.S.: String-matching and other products. Complexity of Computation, Proc. SIAM-AMS 7, 113–125 (1974)

    MathSciNet  MATH  Google Scholar 

  14. Franek, F., Gao, S., Lu, W., Ryan, P.J., Smyth, W.F., Sun, Y., Yang, L.: Verifying a border array in linear time. J. Combinatorial Maths. & Combinatorial Comput.  42, 223–236 (2002)

    MathSciNet  MATH  Google Scholar 

  15. Franek, F., Simpson, R.J., Smyth, W.F.: The maximum number of runs in a string. In: Miller, M., Park, K. (eds.) Proc. 14th Australasian Workshop on Combinatorial Algs., pp. 36–45 (2003)

    Google Scholar 

  16. Galil, Z.: On improving the worst case running time of the Boyer- Moore string matching algorithm. Comm. Assoc. Comput. Mach. 22(9), 505–508 (1979)

    MathSciNet  MATH  Google Scholar 

  17. Holub, J., Smyth, W.F.: Algorithms on indeterminate strings. In: Proc. 14th Australasian Workshop on Combinatorial Algs., pp. 36–45 (2003)

    Google Scholar 

  18. Holub, J., Smyth, W.F., Wang, S.: Hybrid pattern-matching algorithms on indeterminate strings. In: Daykin, J., Mohamed, M., Steinhoefel, K. (eds.) Texts in Algorithmics. King’s College London Series, pp. 115–133 (2006)

    Google Scholar 

  19. Holub, J., Smyth, W.F., Wang, S.: Fast pattern-matching on indeterminate strings. J. Discrete Algs. 6(1), 37–50 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Iliopoulos, C.S., Mohamed, M., Mouchard, L., Perdikuri, K.G., Smyth, W.F., Tsakalidis, A.K.: String regularities with don’t cares. Nordic J. Comput.  10(1), 40–51 (2003)

    MathSciNet  MATH  Google Scholar 

  21. Iliopoulos, C.S., Sohel Rahman, M., Voráček, M., Vagner, L.: The constrained longest common subsequence problem for degenerate strings. Implementation and Application of Automata, 309–311 (2007)

    Google Scholar 

  22. Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput.  6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  23. Li, Y., Smyth, W.F.: Computing the cover array in linear time. Algorithmica 32(1), 95–106 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. Main, M.G.: Detecting leftmost maximal periodicities. Discrete Applied Maths.  25, 145–153 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  25. Main, M.G., Lorentz, R.J.: An O(nlogn) algorithm for finding all repetitions in a string. J. Algs. 5, 422–432 (1984)

    Article  MATH  Google Scholar 

  26. Morris, J.H., Pratt, V.R.: A Linear Pattern-Matching Algorithm, Tech. Rep. 40, University of California, Berkeley (1970)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smyth, W.F., Wang, S. (2008). New Perspectives on the Prefix Array. In: Amir, A., Turpin, A., Moffat, A. (eds) String Processing and Information Retrieval. SPIRE 2008. Lecture Notes in Computer Science, vol 5280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89097-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89097-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89096-6

  • Online ISBN: 978-3-540-89097-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics