Skip to main content

A Data Mining Approach to XML Dissemination

  • Conference paper
Web Information Systems Engineering – WISE 2010 (WISE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6488))

Included in the following conference series:

  • 1525 Accesses

Abstract

Currently user’s interests are expressed by XPath or XQuery queries in XML dissemination applications. These queries require a good knowledge of the structure and contents of the documents that will arrive; As well as knowledge of XQuery which few consumers will have. In some cases, where the distinction of relevant and irrelevant documents requires the consideration of a large number of features, the query may be impossible. This paper introduces a data mining approach to XML dissemination that uses a given document collection of the user to automatically learn a classifier modelling of his/her information needs. Also discussed are the corresponding optimization methods that allow a dissemination server to execute a massive number of classifiers simultaneously. The experimental evaluation of several real XML document sets demonstrates the accuracy and efficiency of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jacobsen, H.-A.: Content-based publish/subscribe. In: Liu, L., Tamer Ozsu, M. (eds.) Encyclopedia of Database Systems, pp. 464–466. Springer, Heidelberg (2009)

    Google Scholar 

  2. Diao, Y., Rizvi, S., Franklin, M.J.: Towards an internet-scale XML dissemination service. In: Nascimento, M.A., Özsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., Bernhard Schiefer, K. (eds.) VLDB, pp. 612–623. Morgan Kaufmann, San Francisco (2004)

    Google Scholar 

  3. Gong, X., Yan, Y., Qian, W., Zhou, A.: Bloom filter-based XML packets filtering for millions of path queries. In: ICDE, pp. 890–901. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  4. Kwon, J., Rao, P., Moon, B., Lee, S.: Fast xml document filtering by sequencing twig patterns. ACM Trans. Internet Techn. 9(4) (2009)

    Google Scholar 

  5. Theobald, M., Schenkel, R., Weikum, G.: Exploiting structure, annotation, and ontological knowledge for automatic classification of xml data. In: WebDB, pp. 1–6 (2003)

    Google Scholar 

  6. Zaki, M.J., Aggarwal, C.C.: Xrules: An effective algorithm for structural classification of xml data. Machine Learning 62(1-2), 137–170 (2006)

    Article  Google Scholar 

  7. Hong, M., Demers, A.J., Gehrke, J., Koch, C., Riedewald, M., White, W.M.: Massively multi-query join processing in publish/subscribe systems. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD Conference, pp. 761–772. ACM, New York (2007)

    Google Scholar 

  8. Zaki, M.J.: Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, X., Ester, M., Qian, W., Zhou, A. (2010). A Data Mining Approach to XML Dissemination. In: Chen, L., Triantafillou, P., Suel, T. (eds) Web Information Systems Engineering – WISE 2010. WISE 2010. Lecture Notes in Computer Science, vol 6488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17616-6_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17616-6_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17615-9

  • Online ISBN: 978-3-642-17616-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics