Skip to main content

On Constructing Suffix Arrays in External Memory

  • Conference paper
  • First Online:
Algorithms - ESA’ 99 (ESA 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1643))

Included in the following conference series:

Abstract

The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [16] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/fast search operations supported. In this paper we analyze theoretically and experimentally, the I/O-complexity and the working space of six algorithms for constructing large suffix arrays. Additionally, we design a new external-memory algorithm that follows the basic philosophy underlying the algorithm in [13] but in a significantly different manner, thus combining its good practical qualities with efficient worstcase performances. At the best of our knowledge, this is the first study which provides a wide spectrum of possible approaches to the construction of suffix arrays in external memory, and thus it should be helpful to anyone who is interested in building full-text indexes on very large text collections.

Part of this work was done while the second author had a Post-Doctoral fellowship at the Max- Planck-Institut für Informatik, Saarbrücken, Germany. The work has been supported by EU ESPRIT LTR Project N. 20244 (ALCOM-IT)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. Arge, P. Ferragina, R. Grossi and J. S. Vitter. On sorting Strings in External Memory. In ACM Symp. on Theory of Computing, pp. 540–548, 1997.

    Google Scholar 

  2. R. Ahuja, K. Mehlhorn, J. B. Orlin and R. E. Tarjan. Faster Algorithms for the Shortest Path Problem. Journal of the ACM (2), pp. 213–223, 1990.

    Article  MathSciNet  Google Scholar 

  3. A. Andersson and S. Nilsson. Efficient implementation of Suffix Trees. Software Practice and Experience, 2(25): 129–141, 1995.

    Article  Google Scholar 

  4. S. Burkhard, A. Crauser, P. Ferragina, H. Lenhof, E. Rivals and M. Vingron. q-gram based database searching using a suffix array (QUASAR). International Conference on Computational Molecular Biology, 1999.

    Google Scholar 

  5. D. R. Clark and J. I. Munro. Efficient Suffix Trees on Secondary Storage. In ACM-SIAM Symp. on Discrete Algorithms, pp.383–391, 1996.

    Google Scholar 

  6. A. Crauser, P. Ferragina and U. Meyer. Practical and Efficient Priority Queues for External Memory. Technical Report MPI, see WEB pages of the authors.

    Google Scholar 

  7. A. Crauser and K. Mehlhorn. LEDA-SM: A Library Prototype for Computing in Secondary Memory. Technical Report MPI, see WEB pages of the authors.

    Google Scholar 

  8. C. Faloutsos. Access Methods for text. ACM Computing Surveys, 17, pp.49–74, March 1985.

    Article  Google Scholar 

  9. M. Farach. Optimal suffix tree construction with large alphabets. In IEEE Foundations of Computer Science, pp. 137–143, 1997.

    Google Scholar 

  10. M. Farach, P. Ferragina and S. Muthukrishnan. Overcoming the Memory Bottleneck in Suffix Tree Construction. In IEEE Foundations of Computer Science, 1998.

    Google Scholar 

  11. C. L. Feng. PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval. ACM SIGIR, pp. 50–58, 1997.

    Google Scholar 

  12. P. Ferragina and R. Grossi. A Fully-Dynamic Data Structure for External Substring Search. In ACM Symp. Theory of Computing, pp. 693–702, 1995. Also Journal of the ACM (to appear).

    Google Scholar 

  13. G. H. Gonnet, R. A. Baeza-Yates and T. Snider. Newindices for text:PAT trees and PAT arrays. In Information Retrieval-Data Structures and Algorithms,W. B. Frakes and R. BaezaYates Editors, pp. 66–82, Prentice-Hall, 1992.

    Google Scholar 

  14. D. E. Knuth. The Art of Computer Programming: Sorting and Searching. Vol. 3, Addison-Wesley Publishing Co. 1973.

    Google Scholar 

  15. S. Kurtz. Reducing the Space Requirement of SuffixTrees. Technical Report 98-03, University of Bielefeld, 1998.

    Google Scholar 

  16. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal of Computing 22, 5,pp. 935–948, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  17. E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM 23, 2,pp. 262–272, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  18. G. Navarro, J. P. Kitajima, B. A. Ribeiro-Neto and N. Ziviani. Distributed Generation of Suffix Arrays. In Combinatorial Pattern Matching Conference, pp. 103–115, 1997.

    Google Scholar 

  19. S. Näher and K. Mehlhorn. LEDA:A Platform for Combinatorial and Geometric Computing. Communications of the ACM (38), 1995.

    Google Scholar 

  20. C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 27(3):17–29, 1994.

    Google Scholar 

  21. E. A. Shriver and J. S. Vitter. Algorithms for parallel memory I: two-level memories. Algorithmica, 12(2-3), pp. 110–147, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  22. D. E. Vengroff and J. S. Vitter. I/O-efficient scientific computing using TPIE. In IEEE Symposium on Parallel and Distributed Computing, 1995.

    Google Scholar 

  23. J. Vitter. External memory algorithms. Invited Tutorial in 17th Ann. ACMSymp. on Principles of Database Systems (PODS’ 98), 1998. Also Invited Paper in European Symposium on Algorithms (ESA’ 98), 1998.

    Google Scholar 

  24. J. Zobel, A. Moffat and K. Ramamohanarao. Guidelines for presentation and comparison of indexing techniques. SIGMAD Record 25, 3:10–15, 1996.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Crauser, A., Ferragina, P. (1999). On Constructing Suffix Arrays in External Memory. In: Nešetřil, J. (eds) Algorithms - ESA’ 99. ESA 1999. Lecture Notes in Computer Science, vol 1643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48481-7_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-48481-7_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66251-8

  • Online ISBN: 978-3-540-48481-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics