Skip to main content

Distributed XML Twig Query Processing Using MapReduce

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9313))

Included in the following conference series:

Abstract

Twig query processing is one of the core operations of XML queries. Centralized holistic twig algorithms suffer great efficiency losses when large-scale XML documents are partitioned and stored in the cloud. Previous work on distributed twig query processing have some limitations, e.g., utter dependence on priori knowledge of query patterns, iteration of MapReduce jobs, etc. In this paper, our arbitrary XML partitioning and storage strategy require no knowledge of query pattern; twig queries can be efficiently processed in a single-round MapReduce job with good scalability. Extensive experiments are conducted to verify the efficiency and scalability of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Khalifa, S., Jagadish, H., Koudas, N., Patel, J., Srivastava, D., Wu, Y.: Structural joins: A primitive for efficient XML query pattern matching. In: Proceedings of the 18th International Conference on Data Engineering, pp. 141–152. IEEE Computer Society, Washington, DC (2002)

    Chapter  Google Scholar 

  2. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 310–321. ACM, New York (2002)

    Chapter  Google Scholar 

  3. Chen, S., Li, H.G., Tatemura, J., Hsiung, W.P., Agrawal, D., Candan, K.S.: Twig2stack: Bottom-up processing of generalized-tree-pattern queries over XML documents. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 283–294. VLDB Endowment (2006)

    Google Scholar 

  4. Chen, T., Lu, J., Ling, T.W.: On boosting holism in XML twig pattern matching using structural indexing techniques. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 455–466. ACM, New York (2005)

    Chapter  Google Scholar 

  5. Choi, H., Lee, K.H., Kim, S.H., Lee, Y.J., Moon, B.: HadoopXML: A suite for parallel processing of massive XML data with multiple twig pattern queries. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2737–2739. ACM, New York (2012)

    Google Scholar 

  6. Cui, B., Mei, H., Ooi, B.C.: Big data: the driver for innovation in databases. National Science Review 1(1), 27–30 (2014)

    Article  Google Scholar 

  7. Damigos, M., Gergatsoulis, M., Plitsos, S.: Distributed processing of XPath queries using MapReduce. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. AISC, vol. 241, pp. 69–77. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  8. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, Berkeley, CA, USA, vol. 6, p. 10 (2004)

    Google Scholar 

  9. Ding, L., Wang, G., Xin, J., Wang, X., Huang, S., Zhang, R.: Commapreduce: An improvement of mapreduce with lightweight communication mechanisms. Data & Knowledge Engineering 88, 224–247 (2013)

    Article  Google Scholar 

  10. Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 273–284. VLDB Endowment (2003)

    Google Scholar 

  11. Lu, J., Ling, T.W., Chan, C.Y., Chen, T.: From region encoding to extended dewey: On efficient processing of XML twig pattern matching. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 193–204. VLDB Endowment (2005)

    Google Scholar 

  12. Machdi, I., Amagasa, T., Kitagawa, H.: Gmx: An XML data partitioning scheme for holistic twig joins. In: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, iiWAS 2008, pp. 137–146. ACM, New York (2008)

    Google Scholar 

  13. Machdi, I., Amagasa, T., Kitagawa, H.: XML data partitioning strategies to improve parallelism in parallel holistic twig joins. In: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication, ICUIMC 2009, pp. 471–480. ACM, New York (2009)

    Google Scholar 

  14. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: Xmark: A benchmark for XML data management. In: Proceedings of the 28th International Conference on Very Large Databases, San Francisco, pp. 974–985 (2002)

    Google Scholar 

  15. Wu, H.: Parallelizing structural joins to process queries over big XML data using MapReduce. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds.) DEXA 2014, Part II. LNCS, vol. 8645, pp. 183–190. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Bi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bi, X., Wang, G., Zhao, X., Zhang, Z., Chen, S. (2015). Distributed XML Twig Query Processing Using MapReduce. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25255-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25254-4

  • Online ISBN: 978-3-319-25255-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics