Skip to main content

XML Schema and Data Summarization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6114))

Abstract

As XML repositories are becoming more and more complex there is a need to develop methods and tools to facilitate the understanding and exploring schemas and contents of these repositories. A solution can be provided by a proper summarization of XML documents. In this paper we propose the summarization concerning both the schema and the contents of XML documents. There are three general steps in our approach: (1) the schema is extracted from a given XML document; (2) a summary of the schema is derived, and correspondences between the summary and the underlying source schema are established; (3) the summarization information is used to summarize (aggregate) contents (text values) of instance document. We show how the user can be involved in this process. We develop new algorithms used in the summarization process. We show that our approach is useful and effective in practice.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arenas, M., Libkin, L.: A normal form for XML documents. ACM Trans. Database Syst. 29, 195–232 (2004)

    Article  Google Scholar 

  2. Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of Concise DTDs from XML Data. In: VLDB, pp. 115–126. ACM, New York (2006)

    Google Scholar 

  3. Marciniak, J., Pankowski, T.: Automatic XML data transformation and merging. Zeszyty Naukowe Wydzialu ETI Politechniki Gdańskiej. Technologie Informacyjne 16, 231–236 (2008)

    Google Scholar 

  4. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web, Technical report (1999)

    Google Scholar 

  5. Pankowski, T.: XML data integration in SixP2P – a theoretical framework. In: EDBT Workshop Data Management in P2P Systems (DAMAP 2008). ACM Digital Library, pp. 11–18 (2008)

    Google Scholar 

  6. Pankowski, T., Cybulka, J., Meissner, A.: XML Schema Mappings in the Presence of Key Constraints and Value Dependencies. In: ICDT 2007 Workshop EROW 2007. CEUR Workshop Proceedings, vol. 229, pp. 1–15. CEUR-WS.org (2007)

    Google Scholar 

  7. Yu, C., Jagadish, H.V.: Schema Summarization. In: VLDB, pp. 319–330. ACM, New York (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marciniak, J. (2010). XML Schema and Data Summarization. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artifical Intelligence and Soft Computing. ICAISC 2010. Lecture Notes in Computer Science(), vol 6114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13232-2_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13232-2_68

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13231-5

  • Online ISBN: 978-3-642-13232-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics