Abstract
The decreasing cost and the increasing availability of new technologies is enabling people to create their own digital libraries. One of the main topic in personal digital libraries is allowing people to select interesting information among all the different digital formats available today (pdf, html, tiff, etc.). Moreover the increasing availability of these on-line libraries, as well as the advent of the so called Semantic Web [1], is raising the demand for converting paper documents into digital, possibly semantically annotated, documents. These motivations drove us to design a new system which could enable the user to interact and query documents independently from the digital formats in which they are represented. In order to achieve this independence from the format we consider all the digital documents contained in a digital library as images. Our system tries to automatically detect the layout of the digital documents and recognize the geometric regions of interest. All the extracted information is then encoded with respect to a reference ontology, so that the user can query his digital library by typing free text or browsing the ontology.
Chapter PDF
Similar content being viewed by others
References
Berners-Lee, T.: Weaving the Web. Harper, San Francisco (1999)
Smith, B., Welty, C.: Ontology: towards a new synthesis. In: Proc. of Formal Ontology in Information Systems FOIS-2001, October 2001, ACM Press, New York (2001)
Cinque, L., Levialdi, S., Malizia, A.: An Integrated System for the Automatic Segmentation and Classification of Documents. In: Proceedings of the International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA 2002), Crete, Greece, June 2002, pp. 491–496 (2002)
Pavlidis, T.: Algorithms for Graphics and Image Processing. Computer Science Press, Rockeville (1982)
Miller, A.: WordNet: An On-line Lexical Resource. Journal of Lexicography 3(4) (1990)
Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual database. In: Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25 (2002)
Nagy, G.: Twenty years of document image analysis. PAMI, IEEE Trans. Pattern Analysis and Machine Intelligence 1/22, 38–62 (2000)
Navigli, R., Velardi, P.: Semantic Interpretation of Terminological Strings. In: Proc. 6th Int’l Conf. on Terminology and Knowledge Engineering (TKE 2002), INIST-CNRS, Vandoeuvre-lès-Nancy, France, pp. 95–100 (2002)
Missikoff, M., Navigli, R., Velardi, P.: An Integrated Approach for Web Ontology Learning and Engineering. IEEE Computer, 60–63 (November 2002)
International Conference on Document Analysis and Recognition (ICDAR 2003), Edinburgh, Scotland, UK, August 3-6 (2003), http://www.essex.ac.uk/ese/icdar2003/
Spitz, L., Tombre, K.: Special issue-selected papers from the ICDAR 2001 conference. IJDAR 5(2-3), 87 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cinque, L., Malizia, A., Navigli, R. (2004). A Semantic-Based System for Querying Personal Digital Libraries. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-28640-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive