Skip to main content

Semantic-Based Access to Digital Document Databases

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3488))

Included in the following conference series:

Abstract

Discovering significant meta-information from document collections is a critical factor for knowledge distribution and preservation. This paper presents a system that implements intelligent document processing techniques, by combining strategies for the layout analysis of electronic documents with incremental first-order learning in order to automatically classify the documents and their layout components according to their semantics. Indeed, an in-deep analysis of specific layout components can allow the extraction of useful information to improve the semantic-based document storage and retrieval tasks. The viability of the proposed approach is confirmed by experiments run in the real-world application domain of scientific papers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adobe Systems Incorporated. PostScript language reference manual, 2nd edn. Addison Wesley, Reading (1990)

    Google Scholar 

  2. Besagni, D., Belaid, A.: Citation recognition for scientific publications in digital libraries. In: Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), Palo Alto, CA, USA, January 23-24, pp. 244–252. IEEE Computer Society, Los Alamitos (2004)

    Chapter  Google Scholar 

  3. Breuel, T.M.: Two geometric algorithms for layout analysis. In: Workshop on Document Analysis Systems (2002)

    Google Scholar 

  4. Egenhofer, M.: Reasoning about binary topological relations. In: Günther, O., Schek, H.-J. (eds.) SSD 1991. LNCS, vol. 525, pp. 143–160. Springer, Heidelberg (1991)

    Google Scholar 

  5. Esposito, F., Ferilli, S., Fanizzi, N., Basile, T.M.A., Di Mauro, N.: Incremental multistrategy learning for document processing. Applied Artificial Intelligence: An Internationa Journal 17(8/9), 859–883 (2003)

    Article  Google Scholar 

  6. Glunz, W.: Pstoedit - a tool converting postscript and pdf files into various vector graphic formats, http://www.pstoedit.net

  7. Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19/20, 629–679 (1994)

    Article  Google Scholar 

  8. Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 38–62 (2000)

    Article  Google Scholar 

  9. Papadias, D., Theodoridis, Y.: Spatial relations, minimum bounding rectangles, and spatial data structures. International Journal of Geographical Information Science 11(2), 111–138 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Esposito, F., Ferilli, S., Basile, T.M.A., Di Mauro, N. (2005). Semantic-Based Access to Digital Document Databases. In: Hacid, MS., Murray, N.V., RaÅ›, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_39

Download citation

  • DOI: https://doi.org/10.1007/11425274_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25878-0

  • Online ISBN: 978-3-540-31949-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics