Advertisement

Abstract

We first give a representation of a suffix tree that uses \(n \lg n + O(n)\) bits of space and supports searching for a pattern in the given text (from a fixed size alphabet) in O(m) time, where n is the size of the text and m is the size of the pattern. The structure is quite simple and answers a question raised by Muthukrishnan in [17]. Previous compact representations of suffix trees had a higher lower order term in space and had some expectation assumption [3], or required more time for searching [5]. Then, surprisingly, we show that we can even do better, by developing a structure that uses a suffix array (and so \(n \lceil \lg n \rceil \) bits) and an additional o(n) bits. String searching can be done in this structure also in O(m) time. Besides supporting string searching, we can also report the number of occurrences of the pattern in the same time using no additional space. In this case the space occupied by the structures is much less compared to many of the previously known structures to do this. When the size of the alphabet k is not a constant, our structures can be easily extended, using standard tricks, to those that use the same space but take \(O(m \lg k)\) time for string searching or to those that use an additional \(O(m \lg k)\) bits but take the same O(m) time for searching.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apostolico, A., Preparata, F.P.: Structural properties of the string statistics problem. Journal of Computer and System Sciences 31, 394–411 (1985)Google Scholar
  2. 2.
    Cardenas, A.F.: Analysis and performance of inverted data base structures. Communications of The ACM 18(5), 253–263 (1975)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Clark, D.R., Munro, J.I.: Efficient Suffix Trees on Secondary Storage. In: Proceedings of the 7th ACM-SIAM Symposium on Discrete Algorithms, pp. 383–391 (1996)Google Scholar
  4. 4.
    Clift, B., Haussler, D., McConnel, R., Schneider, T.D., Stormo, G.D.: Sequence landscapes. Nucleic Acids Research 4(1), 141–158 (1986)CrossRefGoogle Scholar
  5. 5.
    Colussi, L., De Col, A.: A time and space efficient data structure for string searching on large texts. Information Processing Letters 58, 217–222 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Fraser, C., Wendt, A., Myers, E.W.: Analysing and compressing assembly code. In: Proceedings of the SIGPLAN Symposium on Compiler Construction (1984)Google Scholar
  7. 7.
    Gonnet, G.H., Baeza-Yates, R.A., Snider, T.: New indices for text: PAT trees and PAT arrays. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)Google Scholar
  8. 8.
    Jacobson, G.: Space-efficient Static Trees and Graphs. In: Proceedings of the IEEE Symposium on Foundations of Computer Science, pp. 549–554 (1989)Google Scholar
  9. 9.
    Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)Google Scholar
  10. 10.
    Landau, G.M., Vishkin, U.: Introducing efficient parallelism into approximate string matching. In: Proc. 18th ACM Symposium on Theory of Computing, pp. 220–230 (1986)Google Scholar
  11. 11.
    Manber, U., Myers, G.: Suffix Arrays: A New Method for On-line String Searches. SIAM Journal on Computing 22(5), 935–948 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    McCreight, M.E.: A space-economical suffix tree construction algorithm. Journal of the ACM 23, 262–272 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Morrison, D.R.: PATRICIA: Practical Algorithm To Retrieve Information Coded In Alphanumeric. Journal of the ACM 15, 514–534 (1968)CrossRefGoogle Scholar
  14. 14.
    Munro, J.I., Benoit, D.: Succinct Representation of k-ary trees. ManuscriptGoogle Scholar
  15. 15.
    Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)Google Scholar
  16. 16.
    Munro, J.I., Raman, V.: Succinct representation of balanced parentheses, static trees and planar graphs. In: Proceedings of the IEEE Symposium on Foundations of Computer Science, pp. 118–126 (1997)Google Scholar
  17. 17.
    Muthukrishnan, S.: Randomization in Stringology. In: Proceedings of the Preconference Workshop on Randomization, Kharagpur, India (December 1997)Google Scholar
  18. 18.
    Rodeh, M., Pratt, V.R., Even, S.: Linear algorithm for data compression via string matching. Journal of the ACM 28(1), 16–24 (1991)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Shang, H.: Trie methods for text and spatial data structures on secondary storage, PhD Thesis, McGill University (1995)Google Scholar
  20. 20.
    Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Ian Munro
    • 1
  • Venkatesh Raman
    • 2
  • S. Srinivasa Rao
    • 2
  1. 1.Department of Computer ScienceUniversity of WaterlooCanada
  2. 2.The Institute of Mathematical SciencesChennaiIndia

Personalised recommendations