XDOCS: An Application to Index Historical Documents

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 806)

Abstract

Dematerialization and digitalization of historical documents are key elements for their availability, preservation and diffusion. Unfortunately, the conversion from handwritten to digitalized documents presents several technical challenges.

The XDOCS project is created with the main goal of making available and extending the usability of historical documents for a great variety of audience, like scholars, institutions and libraries. In this paper, the core elements of XDOCS, i.e. page dewarping and word spotting technique, are described and two new applications, i.e. annotation/indexing and search tool, are presented.

Keywords

Indexing Page dewarping Word spotting Word annotation Handwriting recognition 

Notes

Acknowledgement

The XDOCS project is currently underway at SATA s.r.l. in collaboration with the University of Modena and Reggio-Emilia, and co-funded by the Emilia-Romagna regional administration.

References

  1. 1.
    Balducci, F., Borghi, G.: An annotation tool for a digital library system of epidermal data. In: Grana, C., Baraldi, L. (eds.) IRCDL 2017. CCIS, vol. 733, pp. 173–186. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68130-6_14 CrossRefGoogle Scholar
  2. 2.
    Bolelli, F.: Indexing of historical document images: ad hoc dewarping technique for handwritten text. In: Grana, C., Baraldi, L. (eds.) IRCDL 2017. CCIS, vol. 733, pp. 45–55. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68130-6_4 CrossRefGoogle Scholar
  3. 3.
    Bolelli, F., Borghi, G., Grana, C.: Historical handwritten text images word spotting through sliding window HOG features. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 729–738. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68560-1_65 CrossRefGoogle Scholar
  4. 4.
    Cao, H., Ding, X., Liu, C.: Rectifying the bound document image captured by the camera: a model based approach. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 71–74. IEEE (2003)Google Scholar
  5. 5.
    Corbelli, A., Baraldi, L., Balducci, F., Grana, C., Cucchiara, R.: Layout analysis and content classification in digitized books. In: Agosti, M., Bertini, M., Ferilli, S., Marinai, S., Orio, N. (eds.) IRCDL 2016. CCIS, vol. 701, pp. 153–165. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-56300-8_14 CrossRefGoogle Scholar
  6. 6.
    Duda, R.O., Hart, P.E.: Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972)CrossRefMATHGoogle Scholar
  7. 7.
    Fu, B., Wu, M., Li, R., Li, W., Xu, Z., Yang, C.: A model-based book dewarping method using text line detection. In: Proceedings of the 2nd International Workshop on Camera Based Document Analysis and Recognition, Curitiba, Barazil, pp. 63–70 (2007)Google Scholar
  8. 8.
    Gatos, B., Pratikakis, I., Ntirogiannis, K.: Segmentation based recovery of arbitrarily warped document images. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 989–993. IEEE (2007)Google Scholar
  9. 9.
    Kolcz, A., Alspector, J., Augusteijn, M., Carlson, R., Popescu, G.V.: A line-oriented approach to word spotting in handwritten documents. Pattern Anal. Appl. 3(2), 153–168 (2000)CrossRefGoogle Scholar
  10. 10.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  11. 11.
    Manmatha, R., Croft, W.: Word spotting: Indexing handwritten archives. In: Intelligent Multimedia Information Retrieval Collection, pp. 43–64 (1997)Google Scholar
  12. 12.
    Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing handwriting using word matching. In: Proceedings of the first ACM International Conference on Digital Libraries, pp. 151–159. ACM (1996)Google Scholar
  13. 13.
    Pini, S., Cornia, M., Baraldi, L., Cucchiara, R.: Towards video captioning with naming: a novel dataset and a multi-modal approach. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 384–395. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68548-9_36 CrossRefGoogle Scholar
  14. 14.
    Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 218–222. IEEE (2003)Google Scholar
  15. 15.
    Rodriguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents. In: Proceedings of the 1st ICFHR, pp. 7–12 (2008)Google Scholar
  16. 16.
    Stamatopoulos, N., Gatos, B., Pratikakis, I., Perantonis, S.J.: A two-step dewarping of camera document images. In: The Eighth IAPR International Workshop on Document Analysis Systems, DAS 2008, pp. 209–216. IEEE (2008)Google Scholar
  17. 17.
    Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, pp. 437–441. IEEE (2005)Google Scholar
  18. 18.
    Terasawa, K., Tanaka, Y.: Slit style hog feature for document image word spotting. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 116–120. IEEE (2009)Google Scholar
  19. 19.
    Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 413–418. IEEE (2002)Google Scholar
  20. 20.
    Ulges, A., Lampert, C.H., Breuel, T.M.: Document image dewarping using robust estimation of curled text lines. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 1001–1005. IEEE (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Federico Bolelli
    • 1
  • Guido Borghi
    • 1
  • Costantino Grana
    • 1
  1. 1.Dipartimento di Ingegneria “Enzo Ferrari”Università degli Studi di Modena e Reggio EmiliaModenaItaly

Personalised recommendations