Abstract
Currently user’s interests are expressed by XPath or XQuery queries in XML dissemination applications. These queries require a good knowledge of the structure and contents of the documents that will arrive; As well as knowledge of XQuery which few consumers will have. In some cases, where the distinction of relevant and irrelevant documents requires the consideration of a large number of features, the query may be impossible. This paper introduces a data mining approach to XML dissemination that uses a given document collection of the user to automatically learn a classifier modelling of his/her information needs. Also discussed are the corresponding optimization methods that allow a dissemination server to execute a massive number of classifiers simultaneously. The experimental evaluation of several real XML document sets demonstrates the accuracy and efficiency of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jacobsen, H.-A.: Content-based publish/subscribe. In: Liu, L., Tamer Ozsu, M. (eds.) Encyclopedia of Database Systems, pp. 464–466. Springer, Heidelberg (2009)
Diao, Y., Rizvi, S., Franklin, M.J.: Towards an internet-scale XML dissemination service. In: Nascimento, M.A., Özsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., Bernhard Schiefer, K. (eds.) VLDB, pp. 612–623. Morgan Kaufmann, San Francisco (2004)
Gong, X., Yan, Y., Qian, W., Zhou, A.: Bloom filter-based XML packets filtering for millions of path queries. In: ICDE, pp. 890–901. IEEE Computer Society, Los Alamitos (2005)
Kwon, J., Rao, P., Moon, B., Lee, S.: Fast xml document filtering by sequencing twig patterns. ACM Trans. Internet Techn. 9(4) (2009)
Theobald, M., Schenkel, R., Weikum, G.: Exploiting structure, annotation, and ontological knowledge for automatic classification of xml data. In: WebDB, pp. 1–6 (2003)
Zaki, M.J., Aggarwal, C.C.: Xrules: An effective algorithm for structural classification of xml data. Machine Learning 62(1-2), 137–170 (2006)
Hong, M., Demers, A.J., Gehrke, J., Koch, C., Riedewald, M., White, W.M.: Massively multi-query join processing in publish/subscribe systems. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD Conference, pp. 761–772. ACM, New York (2007)
Zaki, M.J.: Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Ester, M., Qian, W., Zhou, A. (2010). A Data Mining Approach to XML Dissemination. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-17616-6_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17615-9
Online ISBN: 978-3-642-17616-6
eBook Packages: Computer ScienceComputer Science (R0)