Skip to main content

Compressing XML Documents Using Recursive Finite State Automata

  • Conference paper
Implementation and Application of Automata (CIAA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3845))

Included in the following conference series:

Abstract

We propose a scheme for automatically generating compressors for XML documents from Document Type Definition(DTD) specifications. Our algorithm is a lossless adaptive algorithm where the model used for compression and decompression is generated automatically from the DTD, and is used in conjunction with an arithmetic compressor to produce a compressed version of the document. The structure of the model mirrors the syntactic specification of the document. Our compression scheme is on-line, that is, it can compress the document as it is being read. We have implemented the compressor generator, and provide the results of experiments on some large XML databases whose DTD’s are specified. We note that the average compression is better than that of XMLPPM, the only other on-line tool we are aware of. The tool is able to compress massive documents where XMLPPM failed to work as it ran out of memory. We believe the main appeal of this technique is the fact that the underlying model is so simple and yet so effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. XML: W3C recommendation (2004), http://www.w3.org/TR/REC-xml

  2. Backhouse, R.C.: Syntax of Programming Languages - Theory and Practice. Prentice Hall International, London (1979)

    MATH  Google Scholar 

  3. Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30, 520–540 (1987)

    Article  Google Scholar 

  4. Nelson, M.: Arithmetic coding and statistical modeling. Dr. Dobbs Journal (1991), http://dogma.net/markn/articles/arith/part1.htm

  5. Liefke, H., Suciu, D.: XMILL: An efficient compressor for XML data. In: SIGMOD Conference, pp. 153–164 (2000)

    Google Scholar 

  6. Cheney, J.: Compressing XML with Multiplexed Hierarchical PPM Models. In: Proceedings of the Data Compression Conference, pp. 163–172. IEEE Computer Society, Los Alamitos (2001)

    Google Scholar 

  7. DBLP: http://www.informatik.uni-trier.de/~ley/db

  8. UniProt: http://www.ebi.uniprot.org

  9. XMark: http://monetdb.cwi.nl/xml/generator.html

  10. Michigan: http://www.eecs.umich.edu/db/mbench

  11. XOO7: http://www.comp.nus.edu.sg/~ebh/xoo7.html

  12. Bzip2: http://www.bzip.org

  13. Cameron, R.D.: Source encoding using syntactic information source models. IEEE Transactions on Information Theory 34, 843–850 (1988)

    Article  MathSciNet  Google Scholar 

  14. Ernst, J., Evans, W.S., Fraser, C.W., Lucco, S., Proebsting, T.A.: Code compression. In: PLDI, pp. 358–365 (1997)

    Google Scholar 

  15. Franz, M.: Adaptive compression of syntax trees and iterative dynamic code optimization: Two basic technologies for mobile object systems. In: Mobile Object Systems: Towards the Programmable Internet, pp. 263–276. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  16. Franz, M., Kistler, T.: Slim binaries. Commun. ACM 40, 87–94 (1997)

    Article  Google Scholar 

  17. Fraser, C.W.: Automatic inference of models for statistical code compression. In: PLDI, pp. 242–246 (1999)

    Google Scholar 

  18. XMLZIP: http://www.xmls.com

  19. SAX: http://www.megginson.com/sax

  20. Cleary, J.G., Teahan, W.J.: Unbounded length contexts for PPM. The Computer Journal 40, 67–75 (1997)

    Article  Google Scholar 

  21. Tolani, P.M., Haritsa, J.R.: XGRIND: A query-friendly XML compressor. In: ICDE, pp. 225–234 (2002)

    Google Scholar 

  22. Min, J.K., Park, M.J., Chung, C.W.: XPRESS: A queriable compression for XML data. In: SIGMOD Conference, pp. 122–133 (2003)

    Google Scholar 

  23. Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: Efficient query evaluation over compressed XML data. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 200–218. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  24. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Subramanian, H., Shankar, P. (2006). Compressing XML Documents Using Recursive Finite State Automata. In: Farré, J., Litovsky, I., Schmitz, S. (eds) Implementation and Application of Automata. CIAA 2005. Lecture Notes in Computer Science, vol 3845. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605157_24

Download citation

  • DOI: https://doi.org/10.1007/11605157_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31023-5

  • Online ISBN: 978-3-540-33097-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics