Optimizing Differential XML Processing by Leveraging Schema and Statistics

  • Toyotaro Suzumura
  • Satoshi Makino
  • Naohiko Uramoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4294)


XML fills a critical role in many software infrastructures such as SOA (Service-Oriented Architecture), Web Services, and Grid Computing. In this paper, we propose a high performance XML parser used as a fundamental component to increase the viability of such infrastructures even for mission-critical business applications. We previously proposed an XML parser based on the notion of differential processing under the hypothesis that XML documents are similar to each other, and in this paper we enhance this approach to achieve higher performance by leveraging static information as well as dynamic information. XML schema languages can represent the static information that is used for optimizing the inside state transitions. Meanwhile, statistics for a set of instance documents are used as dynamic information. These two approaches can be used in complementary ways. Our experimental results show that each of the proposed optimization techniques is effective and the combination of multiple optimizations is especially effective, resulting in a 73.2% performance improvement compared to our earlier work.


XML Web Services XML Schema Statistics 


  1. 1.
    Takase, T., Miyashita, H., Suzumura, T., Tatsubori, M.: An Adaptive, Fast, and Safe XML Parser Based on Byte Sequences Memorization. In: 14th International World Wide Web Conference, WWW 2005 (2005)Google Scholar
  2. 2.
    Suzumura, T., Takase, T., Tatsubori, M.: Optimizing Web Services Performance by Differential Deserialization. In: ICWS 2005 (International Conference on Web Services) (2005)Google Scholar
  3. 3.
    Abu-Ghazaleh, N., Lewis, M.J.: Differential Deserialization for Optimized SOAP Performance. In: SC 2005 (2005)Google Scholar
  4. 4.
    Chiu, K., Liu, W.: A Compiler-Based Approach to Schema-Specific XML Parsing. In: WWW 2004 Workshop (2004)Google Scholar
  5. 5.
    Reuter, F., Luttenberger, N.: Cardinality Constraint Automata: A Core Technology for Efficient XML Schema-aware Parsers,
  6. 6.
    Abu-Ghazaleh, N., Lewis, M.J.: Differential Serialization for Optimized SOAP Performance. In: The 13th IEEE International Symposium on High-Performance Distributed Computing (HPDC 13)Google Scholar
  7. 7.
    Evaluating SOAP for High-Performance Business Applications: Real Trading System. In: Proceedings of the 12th International World Wide Web ConferenceGoogle Scholar
  8. 8.
    Wang, Y., DeWitt, D.J., Cai, J.Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents. In: 19th international conference on Data Engineering (2003)Google Scholar
  9. 9.
    Noga, M.L., Schott, S., Lowe, W.: Lazy XML Processing. In: Symposium on Document Engineering (2002)Google Scholar
  10. 10.
    van Lunteren, J., Engbersen, T.: XML Accelerator Engine. In: First International Workshop on High Performance XMLGoogle Scholar
  11. 11.
    Nicola, M., John, J.: XML parsing: a threat to database performance. In: 12th International Conference on Information and knowledge management (2003)Google Scholar
  12. 12.
  13. 13.
  14. 14.
    Apache Xerces,

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Toyotaro Suzumura
    • 1
  • Satoshi Makino
    • 1
  • Naohiko Uramoto
    • 1
  1. 1.Tokyo Research Laboratory, IBM ResearchKanagawa-kenJapan

Personalised recommendations