Advertisement

Implementing Efficient Updates in Compressed Big Text Databases

  • Stefan Böttcher
  • Alexander Bültmann
  • Rita Hartel
  • Jonathan Schlüßler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8056)

Abstract

Text compression techniques like bzip2 lack the possibility to insert or to delete strings at a given position into a text that has been compressed without prior decompression of the compressed text. We present a technique called DICIRT that supports fast insertion into and deletion from compressed texts without full decompression of the compressed text. For inserted fragments up to a size of 8% of the original text size, and for deleted fragments up to 15% of the original text DICIRT is faster than modifying uncompressed text preceded by a decompression step and followed by a compression step.

Keywords

compression BWT wavelet trees modification of compressed text 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Böttcher, S., Bültmann, A., Hartel, R.: Search and Modification in Compressed Texts. In: 2011 Data Compression Conference (DCC 2011), Snowbird, UT, USA, pp. 403–412 (2011)Google Scholar
  2. 2.
    Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124 (1994)Google Scholar
  3. 3.
    Buneman, P., Grohe, M., Koch, C.: Path Queries on Compressed XML. In: Proceedings of 29th International Conference on Very Large Data Bases, Berlin, Germany, pp. 141–152 (2003)Google Scholar
  4. 4.
    Zhang, N., Kacholia, V., Özsu, M.: A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, Boston, MA, USA, pp. 54–65 (2004)Google Scholar
  5. 5.
    Böttcher, S., Hartel, R., Jacobs, T.: Fast multi-update operations on compressed XML data. In: Gottlob, G., Grasso, G., Olteanu, D., Schallhart, C. (eds.) BNCOD 2013. LNCS, vol. 7968, pp. 149–164. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proceedings of the I.R.E., pp. 1098–1101 (1952)Google Scholar
  7. 7.
    Fraenkel, A., Klein, S.: Robust Universal Complete Codes for Transmission and Compression. Discrete Applied Mathematics 64, 31–55 (1996)zbMATHCrossRefGoogle Scholar
  8. 8.
    Golomb, S.: Run-length encodings. IEEE Transactions on Information Theory 12, 399–401 (1966)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Witten, I., Neal, R., Cleary, J.: Arithmetic Coding for Data Compression. Commun. ACM 30, 520–540 (1987)CrossRefGoogle Scholar
  10. 10.
    Martin, G.N.N.: Range encoding: an algorithm for removing redundancy from a digitized message. In: Video and Data Recording Conference, Southhampton (1979)Google Scholar
  11. 11.
    Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23, 337–343 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Ziv, J., Lempel, A.: Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory 24, 530–536 (1978)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Welch, T.: A Technique for High-Performance Data Compression. IEEE Computer 17, 8–19 (1984)CrossRefGoogle Scholar
  14. 14.
    Cleary, J., Witten, I.: Data Compression Using Adaptive Coding and Partial String Matching. IEEE Transactions on Communications 32, 396–402 (1984)CrossRefGoogle Scholar
  15. 15.
    Cormack, G., Horspool, R.: Data Compression Using Dynamic Markov Modelling. Comput. J. 30, 541–550 (1987)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Nevill-Manning, C., Witten, I.: Identifying Hierarchical Structure in Sequences: A Linear-Time Algorithm. J. Artif. Intell. Res. (JAIR) 7, 67–82 (1997)zbMATHGoogle Scholar
  17. 17.
    Kreft, S., Navarro, G.: LZ77-Like Compression with Fast Random Access. In: 2010 Data Compression Conference (DCC 2010), Snowbird, UT, USA, pp. 239–248 (2010)Google Scholar
  18. 18.
    Bille, P., Landau, G., Raman, R., Sadakane, K., Satti, S., Weimann, O.: Random Access to grammar-Compressed Strings. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, pp. 373–389 (2011)Google Scholar
  19. 19.
    Chan, H.-L., Hon, W.-K., Lam, T., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3 (2007)Google Scholar
  20. 20.
    Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5, 12–22 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52, 552–581 (2005)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, Washington, DC, USA, pp. 269–278 (2001)Google Scholar
  23. 23.
    Salson, M., Lecroq, T., Léonard, M., Mouchard, L.: A four-stage algorithm for updating a Burrows-Wheeler transform. Theor. Comput. Sci. 410, 4350–4359 (2009)zbMATHCrossRefGoogle Scholar
  24. 24.
    Léonard, M., Mouchard, L., Salson, M.: On the number of elements to reorder when updating a suffix array. J. Discrete Algorithms 11, 87–99 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Mäkinen, V., Navarro, G.: Succinct Suffix Arrays based on Run-Length Encoding. Nord. J. Comput. 12, 40–66 (2005)Google Scholar
  26. 26.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Baltimore, Maryland, USA, pp. 841–850 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Stefan Böttcher
    • 1
  • Alexander Bültmann
    • 1
  • Rita Hartel
    • 1
  • Jonathan Schlüßler
    • 1
  1. 1.Computer ScienceUniversity of PaderbornPaderbornGermany

Personalised recommendations