Abstract
We describe here the XML Mining Track at INEX 2008. This track was launched for exploring two main ideas: first identifying key problems for mining semi-structured documents and new challenges of this emerging field and second studying and assessing the potential of machine learning techniques for dealing with generic Machine Learning (ML) tasks in the structured domain i.e. classification and clustering of semi structured documents. This year, the track focuses on the supervised classification and the unsupervised clustering of XML documents using link information. We consider a corpus of about 100,000 Wikipedia pages with the associated hyperlinks. The participants have developed models using the content information, the internal structure information of the XML documents and also the link information between documents.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Denoyer, L., Gallinari, P.: Report on the xml mining track at inex 2005 and inex 2006: categorization and clustering of xml documents 41(1), 79–90 (2007)
Denoyer, L., Gallinari, P.: Report on the xml mining track at inex 2007 categorization and clustering of xml documents 42(1), 22–28 (2008)
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus (2006)
Fachry, K.N., Kamps, J., Kaptein, R., Koolen, M., Zhang, J.: The University of Amsterdam at INEX 2008: Ad Hoc, Book, Entity Ranking, Interactive, Link the Wiki, and XML Mining Tracks. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
de Campos, L.M., Fernandez-Luna, J.M., Huete, J.F., Romero, A.E.: Probabilistic Methods for Link-based Classification at INEX 2008. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
Chidlovskii, B.: Semi-supervised Categorization of Wikipedia collection by Label Expansion. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
Mathias Gery, C.L., Moulin, C.: UJM at INEX 2008 XML mining track. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
Vries, C.M.D., Geva, S.: Document Clustering with K-tree. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
Kutty, S., Tran, T., Nayak, R., Li, Y.: Combining the structure and content of XML documents for Clustering using frequent subtress. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
Tran, T., Kutty, S., Nayak, R.: Utilizing the Structure and Data Information for XML Document Clustering. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
Zhang, S., Hagenbuchner, M., Tsoi, A., Sperduti, A.: Self Organizing Maps for the clustering of large sets of labeled graphs. In: Workshop of the INitiative for the Evaluation of XML Retrieval (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Denoyer, L., Gallinari, P. (2009). Overview of the INEX 2008 XML Mining Track. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-03761-0_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)