Abstract
Publish-subscribe systems present the state of the art in information dissemination to multiple users. Current XML-based pub-sub systems provide users with considerable flexibility allowing the formulation of complex queries on the content as well as the structure of the streaming messages. Messages that contain one or more matches for a given user profile (query) are forwarded to the user. Typically the use of XML representation entails the profile representation with the use of the XPath query language and the employment of efficient heuristic techniques for constraining the complexity of the filtering mechanism. However, as the number of XML documents exchanged daily grows rapidly, the need for distributed management is becoming crucial. In this paper we propose three different approaches for distributed XML filtering using the Hadoop framework. The experimental results clearly demonstrate that the proposed techniques provide good scalability and effectiveness for very large number of document and user queries, compared to traditional XML filtering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aguilera, M.K., Strom, R.E., Stunnan, D.C., Ashey, M., Chandra, T.D.: Matching events in a content-based subscription system. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC 1999, pp. 53–61 (1999)
Altinel, M., Franklin, M.L.J.: Efficient filtering of XML documents for selective dissemination of information. In: VLDB, pp. 53–64 (2000)
Antonellis, P., Makris, C.: XFIS: an XML filtering system based on string representation and matching. Int. J. Web Eng. Technol., IJWET 4(1), 70–94 (2008)
Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. In: Association for Computational Linguistics, vol. 32, pp. 32–47 (2006)
Canadan, K., Hsiung, W., Chen, S., Tatemura, J., Agrrawal, D.: AFilter: adaptable XML filtering with prefix-caching and suffix-clustering. In: VLDB, pp. 559–570 (2006)
Diao, Y., Altinel, M., Franklin, M.L.J., Zhang, H., Fischer, P.: Path sharing and predicate evaluation for high-performance XML filtering. TODS 28(4), 467–516 (2003)
Diaz, A.L., Lovell, D.: XML Generator. http://alphaworks.ibm.com/tech/xmlgenerator
Fellbaum, C. (ed.): WordNet, An Electronic Lexical Database. MIT Press, Cambridge (1998)
Grummt, E.: Fine-grained parallel XML filtering for content-based publish/subscribe systems. In: Proceedings of the 5th ACM International Conference on Distributed Event-Based System, DEBS 2011 (2011)
Gupta, A.K., Suciu, D.: Stream processing of XPath queries with predicates. In: SIGMOD, pp. 419–430 (2003)
Kwon, J., Rao, P., Moon, B., Lee, S.: FiST: scalable XML document filtering by sequencing twig patterns. In: VLDB, pp. 217–228 (2005)
Kwon, J., Rao, P., Moon, B., Lee, S.: Value-based predicate filtering of XML documents. Data Knowl. Eng. (KDE) 67(1), 51–73 (2008)
Miliaraki, I., Koubarakis, M.: Distributed structural and value XML filtering. In: DEBS, pp. 2–13 (2010)
Olteanu, D.: SPEX: streamed and progressive evaluation of XPath. IEEE Trans. Knowl. Data Eng. 19(7), 934–949 (2007)
Peng, F., Chawathe, S.: XSQ: a streaming XPath queries. TODS 30, 577–623 (2005)
Zhang, Y., Pan, Y., Chiu, K.: A parallel XPath engine based on concurrent NFA execution. In: Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems, ICPADS 2010, pp. 314–321 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Antonellis, P., Makris, C., Pispirigos, G. (2016). Distributed XML Filtering Using HADOOP Framework. In: Karydis, I., Sioutas, S., Triantafillou, P., Tsoumakos, D. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2015. Lecture Notes in Computer Science(), vol 9511. Springer, Cham. https://doi.org/10.1007/978-3-319-29919-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-29919-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29918-1
Online ISBN: 978-3-319-29919-8
eBook Packages: Computer ScienceComputer Science (R0)