Beyond Lazy XML Parsing

  • Fernando Farfán
  • Vagelis Hristidis
  • Raju Rangaswami
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4653)


XML has become the standard format for data representation and exchange in domains ranging from Web to desktop applications. However, wide adoption of XML is hindered by inefficient document-parsing methods. Recent work on lazy parsing is a major step towards alleviating this problem. However, lazy parsers must still read the entire XML document in order to extract the overall document structure, due to the lack of internal navigation pointers inside XML documents. Further, these parsers must load and parse the entire virtual document tree into memory during XML query processing. These overheads significantly degrade the performance of navigation operations. We have developed a framework for efficient XML parsing based on the idea of placing internal physical pointers within the document, which allows skipping large portions of the document during parsing. The internal pointers are generated in a way that optimizes parsing for common navigation patterns. A double-Lazy Parser (2LP) is then used to parse the document that exploits the internal pointers. To create the internal pointers, we use constructs supported by the current W3C XML standard. We study our pointer generation and parsing algorithms both theoretically and experimentally, and show that they perform considerably better than existing approaches.


XML Document Object Model Double Lazy Parsing Deferred Expansion XPath 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abramsky, S.: The Lazy Lambda Calculus. In: Turner, D. (ed.) Research Topics in Functional Programming, AddisonWesley, London (1990)Google Scholar
  2. 2.
    Document Object Model (DOM) (2006),
  3. 3.
    Dimitrijevic, Z., Rangaswami, R.: Quality of Service Support for Real-time Storage Systems. In: IPSI (2003)Google Scholar
  4. 4.
    Franceschet, M.: XPathMark: An XPath Benchmark for the XMark Generated Data. In: XSym (2005)Google Scholar
  5. 5.
    Farfán, F., Hristidis, V., Rangaswami, R.: Beyond Lazy XML Parsing Extended Version (2007),
  6. 6.
    Gil, J., Itai, A.: How to pack trees. Journal of Algorithms 32(2), 108–132 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Gottlob, G., Koch, C., Pichler, R.: Efficient Algorithms for Processing XPath Queries. In: VLDB (2002)Google Scholar
  8. 8.
    Geography Markup Language (2006),
  9. 9.
    Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML streams with deterministic automata. In: ICDT (2003)Google Scholar
  10. 10.
  11. 11.
    Kiselyov, O.: A Better XML Parser Through Functional Programming. In: Krishnamurthi, S., Ramakrishnan, C.R. (eds.) PADL 2002. LNCS, vol. 2257, pp. 209–224. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Kenji, M., Hiroyuki, S.: Static Optimization of XSLT Stylesheets: Template Instantiation Optimization and Lazy XML Parsing. In: DocEng (2005)Google Scholar
  13. 13.
    Kanne, C.C., Moerkoette, G.: Efficient storage of XML data. In: ICDE 1998 (1999)Google Scholar
  14. 14.
    Kanne, C.C., Moerkoette, G.: A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and its Application to XML Data Stores. In: VLDB (2006)Google Scholar
  15. 15.
    van Lunteren, J., Engbersen, T., Bostian, J., Carey, B., Larsson, C.: XML Accelerator Engine. In: First International Workshop on High Performance XML Processing (2004)Google Scholar
  16. 16.
    Mars Reference: Version 0.7. Adobe Systems Inc.,
  17. 17.
  18. 18.
  19. 19.
    Nicola, M., John, J.: XML Parsing: a Threat to Database Performance. In: CIKM (2003)Google Scholar
  20. 20.
    Noga, M., Schott, S., Löwe, W.: Lazy XML Processing. In: ACM DocEng, ACM Press, New York (2002)Google Scholar
  21. 21.
  22. 22.
    Simple API for XML (SAX) (2006),
  23. 23.
    Schott, S., Noga, M.: Lazy XSL Transformations. In: ACM DocEng, ACM Press, New York (2003)Google Scholar
  24. 24.
    Schmidt, A., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: A Benchmark for XML Data Management. In: VLDB (2002)Google Scholar
  25. 25.
    Apache Xerces2 Java Parser: Apache XML Project (2006),
  26. 26.
    XML Inclusion (2006),
  27. 27.
    XML Pull Parsing (2006),
  28. 28.
  29. 29.
    XML Pointer Language Version 1.0 (2006),

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Fernando Farfán
    • 1
  • Vagelis Hristidis
    • 1
  • Raju Rangaswami
    • 1
  1. 1.School of Computer and Information Sciences, Florida International University 

Personalised recommendations