Skip to main content

A Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1552))

Abstract

A new text database management method for distributed cooperative environments is proposed, which can collect texts in distributed sites through a network of narrow bandwidth and enables full-text search in a unified efficient manner. This method is based on the two new developments in full-text search data structures and data compression. Specifically, the Burrows-Wheeler transformation is used as a basis of constructing the suffix array (or, PAT array) for full-text search and of performing the block sorting compression scheme. A cooperative environment makes it possible to employ these new methods in a uniform fashion. This framework may be also used in future for the Web text collection/search problem. The paper first describes this method, and then provides preliminary computational results concerning I/O implementation of suffix arrays and performing the suffix sorting. These preliminary computational results indicate practicality of our method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, 1997. http://www.cs.princeton.edu/~rs/strings/.

    Google Scholar 

  2. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithms. Technical Report 124, Digital SRC Research Report, 1994.

    Google Scholar 

  3. Center for Information Biology, National Institute of Genetics. DNA Data Bank of Japan. http://www.ddbj.nig.ac.jp/.

  4. A. Crauser and P. Ferragina. External memory construction of full-text indexes. In DIMACS Workshop on External Memory Algorithms and/or Visualization 1998. http://www.di.unipi.it/ferragin/Latex/WSA.ps.gz.

    Google Scholar 

  5. P. Ferragina and R. Grossi. An external-memory indexing data structure and its applications. Journal of the ACM 1998. (to appear).

    Google Scholar 

  6. G.H. Gonnet, R. Baeza-Yates, and T. Snider. New Indices for Text: PAT trees and PAT arrays. In W. Frakes and R. Baeza-Yates, editors, Information Retrieval: Algorithms and Data Structures, chapter 5, pages 66–82. Prentice-Hall, 1992.

    Google Scholar 

  7. M. Harada. ODIN. http://odin.ingrid.org/.

  8. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing, 22(5): 935–948, October 1993.

    Article  MathSciNet  MATH  Google Scholar 

  9. K. Sadakane. A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation. In Proceedings of Data Compression Conference (DCC’98), pages 129–138, 1998.

    Google Scholar 

  10. J. Seward. bzip, 1996. http://www.cs.man.ac.uk/arch/people/j-seward/bzip-0.21.tar.gz.

  11. M. Yoshikawa, H. Kato, H. Kinutani, and M. Watanabe. The ParaDocs Document Database System and Visual User Interface for Information Retrieval. In Advanced Database Systems for Integration of Media and User Environments ’88, pages 81–86. World Scientific Publishing, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sadakane, K., Imai, H. (1999). A Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation. In: Kambayashi, Y., Lee, D.L., Lim, EP., Mohania, M.K., Masunaga, Y. (eds) Advances in Database Technologies. ER 1998. Lecture Notes in Computer Science, vol 1552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49121-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49121-7_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65690-6

  • Online ISBN: 978-3-540-49121-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics