Skip to main content

Relational Learning: Statistical Approach Versus Logical Approach in Document Image Understanding

  • Conference paper
AI*IA 2005: Advances in Artificial Intelligence (AI*IA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3673))

Included in the following conference series:

  • 689 Accesses

Abstract

Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image. This recognition process is based on some visual models that can be automatically acquired by applying machine learning techniques. In particular, by properly encapsulating knowledge of the inherent spatial nature of the layout of a document image, spatial relations among logical components of interest can play a key role in the learned models. For this reason, we are investigating the application of (multi-)relational learning techniques, which successfully allows relations between components to be effectively and naturally represented. Goal of this paper is to evaluate and systematically compare two different approaches to relational learning, that is, a statistical approach and a logical approach in the task of document image understanding. For a fair comparison, both methods are tested on the same dataset consisting of multi-page articles published in an international journal. An analysis of pros and cons of both approaches is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aiello, M., Monz, C., Todoran, L., Worring, M.: Document Understanding for a Broad Class of Documents. International Journal of Document Analysis and Recognition IJDAR 5(1), 1–16 (2002)

    Article  MATH  Google Scholar 

  2. Akindele, O.T., Belaïd, A.: Construction of generic models of document structures using inference of tree grammars. In: Proceedings of the 3rd ICDAR, pp. 206–209 (1995)

    Google Scholar 

  3. Altamura, O., Esposito, F., Malerba, D.: Transforming paper documents into XML format with WISDOM++. International Journal on Document Analysis and Recognition IJDAR 4(1), 2–17 (2001)

    Article  Google Scholar 

  4. Ceci, M., Appice, A., Malerba, D.: Mr-SBC: a Multi-Relational Naive Bayes Classifier. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 95–106. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2-3), 103–130 (1997)

    Article  MATH  Google Scholar 

  6. Dzeroski, S., Lavrac, N.: Relational Data Mining. Springer, Berlin (2001)

    MATH  Google Scholar 

  7. Le Bourgeois, F., Souafi-Bensafi, S., Duong, J., Parizeau, M., Coté, M., Emptoz, H.: Using statistical models in document images understanding. In: Workshop on Document Layout Interpretation and its Applications, DLIA (2001)

    Google Scholar 

  8. Malerba, D.: Learning recursive theories in the normal ilp setting. Fundamenta Informaticae 57(1), 39–77 (2003)

    MATH  MathSciNet  Google Scholar 

  9. Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naive bayes. In: Proc. of the 16th International Conference on Machine Learning ICML, pp. 258–267 (1999)

    Google Scholar 

  10. Palmero, G.I.S., Dimitriadis, Y.A.: Structured Document Labeling and Rule Extraction using a New Recurrent Fuzzy-neural System. International Journal of Document Analysis and Recognition IJDAR, 181–184 (1999)

    Google Scholar 

  11. Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  12. Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)

    Article  MATH  Google Scholar 

  13. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  14. Rosenfeld, A., Hummel, R.A., Zucker, S.W.: Scene labeling by relaxation operations. IEEE Transactions SMC 6(6) (1976)

    Google Scholar 

  15. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  16. Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proc. of Int. Conf. on Uncertainty in Artificial Intelligence, pp. 485–492 (2002)

    Google Scholar 

  17. Walischewski, H.: Automatic knowledge acquisition for spatial document interpretation. In: Proc. of the 4th International Conference on Document Analysis and Recognition ICDAR, pp. 243–247 (1997)

    Google Scholar 

  18. Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proc. of the 18th International Conference on Machine Learning ICML, pp. 609–616 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ceci, M., Berardi, M., Malerba, D. (2005). Relational Learning: Statistical Approach Versus Logical Approach in Document Image Understanding. In: Bandini, S., Manzoni, S. (eds) AI*IA 2005: Advances in Artificial Intelligence. AI*IA 2005. Lecture Notes in Computer Science(), vol 3673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558590_42

Download citation

  • DOI: https://doi.org/10.1007/11558590_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29041-4

  • Online ISBN: 978-3-540-31733-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics