Skip to main content

A Historical Document Handwriting Transcription End-to-end System

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10255))

Included in the following conference series:

Abstract

To provide access to the contents of the document collections that are being digitized, transcription is required. Unfortunately manual transcription is generally too expensive and, in most cases, current automatic techniques fail to provide the required level of accuracy. An alternative that can speed up and lower the cost of this process is the use of computer assisted, interactive techniques. These techniques work at line-level thus the transcription task assumes that the page images have been correctly decomposed into the relevant text line images. In this paper we present an end-to-end system that takes as input a page image and provides a fully correct transcript with the help of user interaction. The system automatically performs the text block and text line detection to be fed into the interactive computer assisted transcription. Experiments carried out show that the expected amount of user effort needed to produce perfect transcripts, can be reduced by using the proposed end-to-end system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://riunet.upv.es/handle/10251/18484.

References

  1. Bosch, V., Toselli, A.H., Vidal, E.: Statistical text line analysis in handwritten documents. In: Proceedings ICFHR, pp. 201–206 (2012)

    Google Scholar 

  2. Bosch, V., Toselli, A.H., Vidal, E.: Semiautomatic text baseline detection in large historical handwritten documents. In: ICFHR, pp. 690–695, September 2014

    Google Scholar 

  3. Pastor, M., Toselli, A., Vidal, E.: Projection profile based algorithm for slant removal. In: Campilho, A., Kamel, M. (eds.) ICIAR 2004. LNCS, vol. 3212, pp. 183–190. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30126-4_23

    Chapter  Google Scholar 

  4. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)

    Google Scholar 

  5. Kavallieratou, E., Stamatatos, E.: Improving the quality of degraded document images. In: DIAL 2006, pp. 340–349, April 2006

    Google Scholar 

  6. Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: ICASSP 1995, Los Alamitos, CA, USA, vol. 1, pp. 181–184 (1995)

    Google Scholar 

  7. Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of the ICFHR, pp. 256–261 (2012)

    Google Scholar 

  8. Rezaei, S.B., Sarrafzadeh, A., Shanbehzadeh, J.: Skew detection of scanned document images. In: IMECS, Hong Kong, vol. 1, March 2013

    Google Scholar 

  9. Romero, V., Toselli, A.H., Vidal, E.: Multimodal Interactive Handwritten Text Transcription. MPAI. World Scientific Publishing, River Edge (2012)

    Book  MATH  Google Scholar 

  10. Toselli, A.H., et al.: Integrated handwriting recognition and interpretation using finite-state models. IJPRAI 18(4), 519–539 (2004)

    Google Scholar 

  11. Villegas, M., Toselli, A.H.: Bleed-through removal by learning a discriminative color channel. In: ICFHR, pp. 47–52, September 2014

    Google Scholar 

Download references

Acknowledgment

This work has been partially supported through the European Union’s H2020 grant READ (Recognition and Enrichment of Archival Documents) (Ref: 674943), the MINECO/FEDER-UE project TIN2015-70924-C2-1-R, and the HIMANIS EU project, JPICH programme, (Spanish grant Ref. PCIN-2015-068).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verónica Romero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Romero, V., Bosch, V., Hernández, C., Vidal, E., Sánchez, J.A. (2017). A Historical Document Handwriting Transcription End-to-end System. In: Alexandre, L., Salvador Sánchez, J., Rodrigues, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2017. Lecture Notes in Computer Science(), vol 10255. Springer, Cham. https://doi.org/10.1007/978-3-319-58838-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58838-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58837-7

  • Online ISBN: 978-3-319-58838-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics