Skip to main content

Optimizing XML Compression

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5679))

Abstract

The eXtensible Markup Language (XML) provides a powerful and flexible means of encoding and exchanging data. As it turns out, its main advantage as an encoding format (namely, its requirement that all open and close markup tags are present and properly balanced) yields also one of its main disadvantages: verbosity. XML-conscious compression techniques seek to overcome this drawback. Many of these techniques first separate XML structure from the document content, and then compress each independently. Further compression gains can be realized by identifying and compressing together document content that is highly similar, thereby amortizing the storage costs of auxiliary information required by the chosen compression algorithm. Additionally, the proper choice of compression algorithm is an important factor not only for the achievable compression gain, but also for access performance. Hence, choosing a compression configuration that optimizes compression gain requires one to determine (1) a partitioning strategy for document content, and (2) the best available compression algorithm to apply to each set within this partition. In this paper, we show that finding an optimal compression configuration with respect to compression gain is an NP-hard optimization problem. This problem remains intractable even if one considers a single compression algorithm for all content. We also describe an approximation algorithm for selecting a partitioning strategy for document content based on the branch-and-bound paradigm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adiego, J., la Fuente, P.D., Navarro, G.: Combining structural and textual contexts for compressing semistructured databases. In: ENC, pp. 68–73 (2005)

    Google Scholar 

  2. Cheney, J.: Compressing XML with multiplexed hierarchical PPM models. In: DCC, pp. 163–172 (2001)

    Google Scholar 

  3. Cheney, J.: An empirical evaluation of simple DTD-conscious compression techniques. In: WebDB, pp. 43–48 (2005)

    Google Scholar 

  4. Leighton, G., Müldner, T., Diamond, J.: TREECHOP: a tree-based query-able compressor for XML. In: CWIT, pp. 115–118 (2005)

    Google Scholar 

  5. Min, J.K., Park, M.J., Chung, C.W.: XPRESS: A queriable compression for XML data. In: SIGMOD, pp. 122–133 (2003)

    Google Scholar 

  6. Tolani, P.M., Haritsa, J.R.: XGRIND: A query-friendly XML compressor. In: ICDE, pp. 225–234 (2002)

    Google Scholar 

  7. Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: XQueC: A query-conscious compressed XML database. ACM TOIT 7(2), Article 10 (May 2007)

    Google Scholar 

  8. Leighton, G., Diamond, J., Müldner, T.: AXECHOP: a grammar-based compressor for XML. In: DCC, p. 467 (2005)

    Google Scholar 

  9. Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data. In: SIGMOD, pp. 153–164 (2000)

    Google Scholar 

  10. Maneth, S., Mihaylov, N., Sakr, S.: XML tree structure compression. In: XANTEC, pp. 243–247 (2008)

    Google Scholar 

  11. Skibinski, P., Grabowski, S., Swacha, J.: Effective asymmetric XML compression. Software: Practice and Experience 38(10), 1027–1047 (2008)

    Google Scholar 

  12. Berglund, A., Boag, S., Chamberlin, D., Fernández, M.F., Kay, M., Robie, J., Siméon, J. (eds.): XML path language (XPath) 2.0, W3C Recommendation (January 2007), http://www.w3.org/TR/xpath20/

  13. Boag, S., Chamberlin, D., Fernández, M.F., Florescu, D., Robie, J., Siméon, J. (eds.): XQuery 1.0: An XML query language, W3C Recommendation (January 2007), http://www.w3.org/TR/xquery/

  14. Leighton, G., Barbosa, D.: Optimizing XML compression (extended version). CoRR abs/0905.4761 (2009), http://arxiv.org/abs/0905.4761

  15. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leighton, G., Barbosa, D. (2009). Optimizing XML Compression. In: Bellahsène, Z., Hunt, E., Rys, M., Unland, R. (eds) Database and XML Technologies. XSym 2009. Lecture Notes in Computer Science, vol 5679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03555-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03555-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03554-8

  • Online ISBN: 978-3-642-03555-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics