An Inductive Learning System for XML Documents

Wu, Xiaobing

doi:10.1007/978-3-540-78469-2_28

Xiaobing Wu¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4894))

Included in the following conference series:

International Conference on Inductive Logic Programming

563 Accesses

Abstract

This paper presents a complete inductive learning system that aims to produce comprehensible theories for XML document classifications. The knowledge representation method is based on a higher-order logic formalism which is particularly suitable for structured-data learning systems. A systematic way of generating predicates is also given. The learning algorithm of the system is a modified standard decision-tree learning algorithm driven by predicate/recall breakeven point. Experimental results on XML version of Reuters dataset show that this system is able to produce comprehensible theories with high precision/recall breakeven point values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dagan, I., Karov, Y., Roth, D.: Mistake-driven learning in text categorization. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, AAAI Press, Menlo Park (1997)
Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155 (1998)
Google Scholar
Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: Proceedings of ACM-SIGIR International Conference on Research and Development in Information Retrieval, Athens, pp. 256–263 (2000)
Google Scholar
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning (1997)
Google Scholar
Lewis, D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994)
Google Scholar
Lloyd, J.W.: Logic for Learning: Learning Comprehensible Theories from Structured Data. Springer, Heidelberg (2003)
MATH Google Scholar
Sebastiani, F.: A tutorial on automated text categorisation. In: Proceedings of ASAI 1999, First Argentinian Symposium on Artificial Intelligence, Buenos Aires, AR, pp. 7–35 (1999)
Google Scholar
van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
Google Scholar
Wu, X.: Knowledge Representation and Learning For Semistructured Data. PhD thesis, The Australian National University (2006)
Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. ACM Transactions on Information Systems 12(3), 296–333 (1998)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, TX, Fisher, D.H. (eds.). pp. 412–420 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

CSIRO ICT Centre, Australia
Xiaobing Wu

Authors

Xiaobing Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hendrik Blockeel Jan Ramon Jude Shavlik Prasad Tadepalli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X. (2008). An Inductive Learning System for XML Documents. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds) Inductive Logic Programming. ILP 2007. Lecture Notes in Computer Science(), vol 4894. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78469-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-78469-2_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78468-5
Online ISBN: 978-3-540-78469-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics