Relational Learning: Statistical Approach Versus Logical Approach in Document Image Understanding

Ceci, Michelangelo; Berardi, Margherita; Malerba, Donato

doi:10.1007/11558590_42

Michelangelo Ceci²⁰,
Margherita Berardi²⁰ &
Donato Malerba²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3673))

Included in the following conference series:

Congress of the Italian Association for Artificial Intelligence

689 Accesses

Abstract

Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image. This recognition process is based on some visual models that can be automatically acquired by applying machine learning techniques. In particular, by properly encapsulating knowledge of the inherent spatial nature of the layout of a document image, spatial relations among logical components of interest can play a key role in the learned models. For this reason, we are investigating the application of (multi-)relational learning techniques, which successfully allows relations between components to be effectively and naturally represented. Goal of this paper is to evaluate and systematically compare two different approaches to relational learning, that is, a statistical approach and a logical approach in the task of document image understanding. For a fair comparison, both methods are tested on the same dataset consisting of multi-page articles published in an international journal. An analysis of pros and cons of both approaches is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aiello, M., Monz, C., Todoran, L., Worring, M.: Document Understanding for a Broad Class of Documents. International Journal of Document Analysis and Recognition IJDAR 5(1), 1–16 (2002)
Article MATH Google Scholar
Akindele, O.T., Belaïd, A.: Construction of generic models of document structures using inference of tree grammars. In: Proceedings of the 3rd ICDAR, pp. 206–209 (1995)
Google Scholar
Altamura, O., Esposito, F., Malerba, D.: Transforming paper documents into XML format with WISDOM++. International Journal on Document Analysis and Recognition IJDAR 4(1), 2–17 (2001)
Article Google Scholar
Ceci, M., Appice, A., Malerba, D.: Mr-SBC: a Multi-Relational Naive Bayes Classifier. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 95–106. Springer, Heidelberg (2003)
Chapter Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2-3), 103–130 (1997)
Article MATH Google Scholar
Dzeroski, S., Lavrac, N.: Relational Data Mining. Springer, Berlin (2001)
MATH Google Scholar
Le Bourgeois, F., Souafi-Bensafi, S., Duong, J., Parizeau, M., Coté, M., Emptoz, H.: Using statistical models in document images understanding. In: Workshop on Document Layout Interpretation and its Applications, DLIA (2001)
Google Scholar
Malerba, D.: Learning recursive theories in the normal ilp setting. Fundamenta Informaticae 57(1), 39–77 (2003)
MATH MathSciNet Google Scholar
Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive bayes. In: Proc. of the 16th International Conference on Machine Learning ICML, pp. 258–267 (1999)
Google Scholar
Palmero, G.I.S., Dimitriadis, Y.A.: Structured Document Labeling and Rule Extraction using a New Recurrent Fuzzy-neural System. International Journal of Document Analysis and Recognition IJDAR, 181–184 (1999)
Google Scholar
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco (1988)
Google Scholar
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Rosenfeld, A., Hummel, R.A., Zucker, S.W.: Scene labeling by relaxation operations. IEEE Transactions SMC 6(6) (1976)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proc. of Int. Conf. on Uncertainty in Artificial Intelligence, pp. 485–492 (2002)
Google Scholar
Walischewski, H.: Automatic knowledge acquisition for spatial document interpretation. In: Proc. of the 4th International Conference on Document Analysis and Recognition ICDAR, pp. 243–247 (1997)
Google Scholar
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proc. of the 18th International Conference on Machine Learning ICML, pp. 609–616 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Bari, Via Orabona 4, 70126, Bari
Michelangelo Ceci, Margherita Berardi & Donato Malerba

Authors

Michelangelo Ceci
View author publications
You can also search for this author in PubMed Google Scholar
Margherita Berardi
View author publications
You can also search for this author in PubMed Google Scholar
Donato Malerba
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Center on Complex Systems and Artificial Intelligence (CSAI) Department of Computer Science, Systems and Communication (DISCo), University of Milan, Bicocca viale Sarca, 336, 20126, Milan, (Italy)
Stefania Bandini
CSAI - Complex Systems & Artificial Intelligence Research Centre, University of Milano–Bicocca,
Sara Manzoni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ceci, M., Berardi, M., Malerba, D. (2005). Relational Learning: Statistical Approach Versus Logical Approach in Document Image Understanding. In: Bandini, S., Manzoni, S. (eds) AI*IA 2005: Advances in Artificial Intelligence. AI*IA 2005. Lecture Notes in Computer Science(), vol 3673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558590_42

Download citation

DOI: https://doi.org/10.1007/11558590_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29041-4
Online ISBN: 978-3-540-31733-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics