Skip to main content

Distributed XML Filtering Using HADOOP Framework

  • Conference paper
  • First Online:
Book cover Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9511))

Included in the following conference series:

Abstract

Publish-subscribe systems present the state of the art in information dissemination to multiple users. Current XML-based pub-sub systems provide users with considerable flexibility allowing the formulation of complex queries on the content as well as the structure of the streaming messages. Messages that contain one or more matches for a given user profile (query) are forwarded to the user. Typically the use of XML representation entails the profile representation with the use of the XPath query language and the employment of efficient heuristic techniques for constraining the complexity of the filtering mechanism. However, as the number of XML documents exchanged daily grows rapidly, the need for distributed management is becoming crucial. In this paper we propose three different approaches for distributed XML filtering using the Hadoop framework. The experimental results clearly demonstrate that the proposed techniques provide good scalability and effectiveness for very large number of document and user queries, compared to traditional XML filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aguilera, M.K., Strom, R.E., Stunnan, D.C., Ashey, M., Chandra, T.D.: Matching events in a content-based subscription system. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 1999, pp. 53–61 (1999)

    Google Scholar 

  2. Altinel, M., Franklin, M.L.J.: Efficient filtering of XML documents for selective dissemination of information. In: VLDB, pp. 53–64 (2000)

    Google Scholar 

  3. Antonellis, P., Makris, C.: XFIS: an XML filtering system based on string representation and matching. Int. J. Web Eng. Technol., IJWET 4(1), 70–94 (2008)

    Article  Google Scholar 

  4. Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. In: Association for Computational Linguistics, vol. 32, pp. 32–47 (2006)

    Google Scholar 

  5. Canadan, K., Hsiung, W., Chen, S., Tatemura, J., Agrrawal, D.: AFilter: adaptable XML filtering with prefix-caching and suffix-clustering. In: VLDB, pp. 559–570 (2006)

    Google Scholar 

  6. Diao, Y., Altinel, M., Franklin, M.L.J., Zhang, H., Fischer, P.: Path sharing and predicate evaluation for high-performance XML filtering. TODS 28(4), 467–516 (2003)

    Article  Google Scholar 

  7. Diaz, A.L., Lovell, D.: XML Generator. http://alphaworks.ibm.com/tech/xmlgenerator

  8. Fellbaum, C. (ed.): WordNet, An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  9. Grummt, E.: Fine-grained parallel XML filtering for content-based publish/subscribe systems. In: Proceedings of the 5th ACM International Conference on Distributed Event-Based System, DEBS 2011 (2011)

    Google Scholar 

  10. Gupta, A.K., Suciu, D.: Stream processing of XPath queries with predicates. In: SIGMOD, pp. 419–430 (2003)

    Google Scholar 

  11. Kwon, J., Rao, P., Moon, B., Lee, S.: FiST: scalable XML document filtering by sequencing twig patterns. In: VLDB, pp. 217–228 (2005)

    Google Scholar 

  12. Kwon, J., Rao, P., Moon, B., Lee, S.: Value-based predicate filtering of XML documents. Data Knowl. Eng. (KDE) 67(1), 51–73 (2008)

    Article  Google Scholar 

  13. Miliaraki, I., Koubarakis, M.: Distributed structural and value XML filtering. In: DEBS, pp. 2–13 (2010)

    Google Scholar 

  14. Olteanu, D.: SPEX: streamed and progressive evaluation of XPath. IEEE Trans. Knowl. Data Eng. 19(7), 934–949 (2007)

    Article  Google Scholar 

  15. Peng, F., Chawathe, S.: XSQ: a streaming XPath queries. TODS 30, 577–623 (2005)

    Article  Google Scholar 

  16. Zhang, Y., Pan, Y., Chiu, K.: A parallel XPath engine based on concurrent NFA execution. In: Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems, ICPADS 2010, pp. 314–321 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Pispirigos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Antonellis, P., Makris, C., Pispirigos, G. (2016). Distributed XML Filtering Using HADOOP Framework. In: Karydis, I., Sioutas, S., Triantafillou, P., Tsoumakos, D. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2015. Lecture Notes in Computer Science(), vol 9511. Springer, Cham. https://doi.org/10.1007/978-3-319-29919-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29919-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29918-1

  • Online ISBN: 978-3-319-29919-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics