Abstract
As XML repositories are becoming more and more complex there is a need to develop methods and tools to facilitate the understanding and exploring schemas and contents of these repositories. A solution can be provided by a proper summarization of XML documents. In this paper we propose the summarization concerning both the schema and the contents of XML documents. There are three general steps in our approach: (1) the schema is extracted from a given XML document; (2) a summary of the schema is derived, and correspondences between the summary and the underlying source schema are established; (3) the summarization information is used to summarize (aggregate) contents (text values) of instance document. We show how the user can be involved in this process. We develop new algorithms used in the summarization process. We show that our approach is useful and effective in practice.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arenas, M., Libkin, L.: A normal form for XML documents. ACM Trans. Database Syst. 29, 195–232 (2004)
Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of Concise DTDs from XML Data. In: VLDB, pp. 115–126. ACM, New York (2006)
Marciniak, J., Pankowski, T.: Automatic XML data transformation and merging. Zeszyty Naukowe Wydzialu ETI Politechniki Gdańskiej. Technologie Informacyjne 16, 231–236 (2008)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web, Technical report (1999)
Pankowski, T.: XML data integration in SixP2P – a theoretical framework. In: EDBT Workshop Data Management in P2P Systems (DAMAP 2008). ACM Digital Library, pp. 11–18 (2008)
Pankowski, T., Cybulka, J., Meissner, A.: XML Schema Mappings in the Presence of Key Constraints and Value Dependencies. In: ICDT 2007 Workshop EROW 2007. CEUR Workshop Proceedings, vol. 229, pp. 1–15. CEUR-WS.org (2007)
Yu, C., Jagadish, H.V.: Schema Summarization. In: VLDB, pp. 319–330. ACM, New York (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marciniak, J. (2010). XML Schema and Data Summarization. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artifical Intelligence and Soft Computing. ICAISC 2010. Lecture Notes in Computer Science(), vol 6114. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13232-2_68
Download citation
DOI: https://doi.org/10.1007/978-3-642-13232-2_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13231-5
Online ISBN: 978-3-642-13232-2
eBook Packages: Computer ScienceComputer Science (R0)