Using XQuery for Flat-File Based Scientific Datasets

  • Xiaogang Li
  • Gagan Agrawal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2921)


XQuery is a recently developed query language for XML datasets. In this paper, we focus on the use of XQuery and other XML technologies for flat-file based scientific datasets. Traditionally, complex and domain-specific data layouts have complicated the processing of large datasets arising from scientific applications. The use of XML schemas and XQuery’s high-level structure can simplify the analysis on these datasets.

Though scientific data processing applications can be conveniently represented in XQuery, compiling them to achieve efficient execution involves a number of challenges. These are, 1) analysis of recursive functions to identify reduction computations involving only associative and commutative operations, 2) replacement of recursive functions with iterative constructs, 3) application of data-centric transformations on the structure of XQuery, and 4) translation of XQuery processing to an imperative language like C/C++, which is required for using a middleware that offers low-level data access functionality. This paper describes our solutions towards these problems and demonstrates significant benefits from the transformations we have developed.


Recursive Function Recursive Call Abstract Syntax Tree Imperative Language Commutative Operation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Afework, A., Beynon, M.D., Bustamante, F., Demarzo, A., Ferreira, R., Miller, R., Silberman, M., Saltz, J., Sussman, A., Tsang, H.: Digital dynamic telepathology - the Virtual Microscope. In: Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association (November 1998)Google Scholar
  2. 2.
    Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML Query Language. W3C Working Draft (November 2002), available from
  3. 3.
    Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., Saltz, J.: Titan: A high performance remote-sensing database. In: Proceedings of the 1997 International Conference on Data Engineering, April 1997, pp. 375–384. IEEE Computer Society Press, Los Alamitos (1997)CrossRefGoogle Scholar
  4. 4.
    Choi, B., Fernandez, M., Simeon, J.: The XQuery Formal Semantics: A Foundation for Implementation and Opitmization (May 2002)Google Scholar
  5. 5.
    DeHaan, D., Toman, D., Consens, M.P., Tamer Ozsu, M.: A Comprehensive XQuery to SQL Translation Using Dynamic Interval Coding. In: Proceedings of the ACM SIGMOD, June 2003. ACM Press, New York (2003)Google Scholar
  6. 6.
    Draper, D., Fankhauser, P., Fernandez, M., Malhotra, A., Rose, K., Rys, M., Simion, J., Wadler, P.: XQuery 1.0 and XPath 2.0 Formal Semantics. W3C Working Draft (November 2002), available from
  7. 7.
    Ferreira, R., Agrawal, G., Saltz, J.: Compiler supported high-level abstractions for sparse disk-resident datasets. In: Proceedings of the International Conference on Supercomputing (ICS) (June 2002)Google Scholar
  8. 8.
    Palsberg. J., Schwartzbach, M.: Object-Oriented Type Inference. In: ACM SIGPLAN Sixth Annual Conference on Obejct-Oriented Programming Systems, Languages and Applications (1991)Google Scholar
  9. 9.
    Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking. In: Proceedings of the SIGPLAN 1997 Conference on Programming Language Design and Implementation, June 1997, pp. 346–357 (1997)Google Scholar
  10. 10.
    Lieuwen, D.F., Dewitt, D.J.: A Transformation Based Approach for Optimizing Loops in Database Programming Languages. In: Proceedings of ACM SIGMOD, pp. 91–100 (1992)Google Scholar
  11. 11.
    Park, C.-W., Min, J.-K., Chung, C.-W.: Structural Function Inlining Techniques for Structurally Recursive XML Queries. In: Proceedings of Conference on Very Large Databases (VLDB) (September 2002)Google Scholar
  12. 12.
    Shatdal, A.: Architectural considerations for parallel query evaluation algorithms. Technical Report CS-TR-1996-1321, University of Wisconsin (1999)Google Scholar
  13. 13.
    Yao, B.B., Ozsu, M.T., Kennleyside, J.: XBench – A Family of Benchmarks for XML DBMSs. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 162–164. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Xiaogang Li
    • 1
  • Gagan Agrawal
    • 1
  1. 1.Department of Computer and Information SciencesOhio State UniversityColumbusUSA

Personalised recommendations