Advertisement

Distributed Processing of XPath Queries Using MapReduce

  • Matthew Damigos
  • Manolis Gergatsoulis
  • Stathis Plitsos
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 241)

Abstract

In this paper we investigate the problem of efficiently evaluating XPath queries over large XML data stored in a distributed manner. We propose a MapReduce algorithm based on a query decomposition which computes all expected answers in one MapReduce step. The algorithm can be applied over large XML data which is given either as a single distributed document or as a collection of small XML documents.

Keywords

Selection Path Hadoop Distribute File System MapReduce Framework Distribute File System XPath Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    XMark: An XML Benchmark Project, http://www.xml-benchmark.org
  2. 2.
    Choi, H., Lee, K.-H., Kim, S.-H., Lee, Y.-J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: CIKM, pp. 2737–2739 (2012)Google Scholar
  3. 3.
    Cong, G., Fan, W., Kementsietsidis, A., Li, J., Liu, X.: Partial evaluation for distributed XPath query processing and beyond. ACM Trans. Database Syst. 37(4), 32 (2012)CrossRefGoogle Scholar
  4. 4.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  5. 5.
    Fegaras, L., Li, C., Gupta, U., Philip, J.: XML query optimization in Map-Reduce. In: WebDB (2011)Google Scholar
  6. 6.
    Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River (2008)Google Scholar
  7. 7.
    Suciu, D.: Distributed query evaluation on semistructured data. ACM Transactions on Database Systems 27, 2002 (1997)Google Scholar
  8. 8.
    Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered XML using a relational database system. In: SIGMOD Conference, pp. 204–215 (2002)Google Scholar
  9. 9.
    Zinn, D., Khler, S., Bowers, S., Ludscher, B.: Parallelizing XML processing pipelines via MapReduce. Technical report (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Matthew Damigos
    • 1
  • Manolis Gergatsoulis
    • 1
  • Stathis Plitsos
    • 2
  1. 1.Database and Information Systems Group (DBIS), Department of Archives and Library ScienceIonian UniversityCorfuGreece
  2. 2.Department of Management Science and TechnologyAthens University of Economics and BusinessAthensGreece

Personalised recommendations