XML Structure Mapping

Maes, Francis; Denoyer, Ludovic; Gallinari, Patrick

doi:10.1007/978-3-540-73888-6_49

XML Structure Mapping

Application to the PASCAL/INEX 2006 XML Document Mining Track

Francis Maes¹,
Ludovic Denoyer¹ &
Patrick Gallinari¹

Conference paper

614 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Abstract

We address the problem of learning to map automatically flat and semi-structured documents onto a mediated target XML schema. We propose a machine learning approach where the mapping between input and target documents is learned from examples. Complex transformations can be learned using only pairs of input and corresponding target documents. From a machine learning point of view, the structure mapping task raises important complexity challenges. Hence we propose an original model which scales well to real world applications. We provide learning and inference procedures with low complexity. The model sequentially builds the target XML document by processing the input document node per node. We demonstrate the efficiency of our model on two structure mapping tasks. Up to our knowledge, there are no other model yet able to solve these tasks.

This work was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Denoyer, L., Wisniewski, G., Gallinari, P.: Document structure matching for heterogeneous corpora. In: SIGIR 2004. Workshop, Sheffield (2004)
Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. Technology Press-Wiley, Cambridge, Massachusetts (1960)
MATH Google Scholar
Sutton, R., Barto, A.: Reinforcement learning: an introduction. MIT Press, Cambridge (1998)
Google Scholar
Si, J., Barto, A.G., B., P.W., W. II, D.: Handbook of Learning and Approximate Dynamic Programming. Wiley&Sons, Inc., Publications, New York (2004)
Google Scholar
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 1038–1044. The MIT Press, Cambridge, MA (1996)
Google Scholar
Chidlovskii, B., Fuselier, J.: A probabilistic learning method for xml annotation of documents. In: IJCAI, pp. 1016–1021 (2005)
Google Scholar
Doan, A., Halevy, A.Y.: Semantic integration research in the database community: A brief survey. AI Magazine, Special Issue on Semantic Integration (2005)
Google Scholar
Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: A multistrategy approach. Maching Learning 50(3), 279–301 (2003)
Article MATH Google Scholar
Califf, M.E., Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. J. Mach. Learn. Res. 4, 177–210 (2003)
Article MathSciNet Google Scholar
Young-Lai, M., Tompa, F.W.: Stochastic grammatical inference of text database structure. Mach. Learn. 40(2), 111–137 (2000)
Article Google Scholar
Chidlovskii, B., Fuselier, J.: Supervised learning for the legacy document conversion. In: DocEng 2004, pp. 220–228. ACM Press, New York (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LIP6 - University of Paris 6,
Francis Maes, Ludovic Denoyer & Patrick Gallinari

Authors

Francis Maes
View author publications
You can also search for this author in PubMed Google Scholar
Ludovic Denoyer
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Gallinari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maes, F., Denoyer, L., Gallinari, P. (2007). XML Structure Mapping. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-540-73888-6_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73887-9
Online ISBN: 978-3-540-73888-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics