Confidence Measures for Error Correction in Interactive Transcription Handwritten Text

  • Lionel Tarazón
  • Daniel Pérez
  • Nicolás Serrano
  • Vicent Alabau
  • Oriol Ramos Terrades
  • Alberto Sanchis
  • Alfons Juan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5716)

Abstract

An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.

Keywords

Computer-assisted Transcription of Handwritten Text User Interfaces Confidence Measures 

References

  1. 1.
    Toselli, A.H., Juan, A., Keysers, D., et al.: Integrated handwriting recognition and interpretation using finite-state models. IJPRAI 18(4), 519–539 (2004)Google Scholar
  2. 2.
    Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. IJDAR 9, 123–138 (2007)CrossRefGoogle Scholar
  3. 3.
    Bertolami, R., Bunke, H.: Hidden markov model-based ensemble methods for offline handwritten text line recognition. Patter Recog. 41, 3452–3460 (2008)CrossRefMATHGoogle Scholar
  4. 4.
    Bourgeois, F.L., Emptoz, H.: DEBORA: Digital AccEss to BOoks of the RenAissance. IJDAR 9, 193–221 (2007)CrossRefGoogle Scholar
  5. 5.
    Juan, A., et al.: iDoc research project (2009), http://prhlt.iti.es/projects/handwritten/idoc/content.php?page=idoc.php
  6. 6.
    Pérez, D., Tarazón, L., Serrano, N., Castro, F., Ramos, O., Juan, A.: The GERMANA database. In: Proc. of ICDAR 2009 (2009)Google Scholar
  7. 7.
    Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Conf. measures for large vocabulary speech recognition. IEEE Trans. on Speech and Audio Proc. 9(3), 288–298 (2001)CrossRefGoogle Scholar
  8. 8.
    Sanchis, A.: Estimación y aplicación de medidas de confianza en reconocimiento automático del habla. PhD thesis, Univ. Politécnica de Valencia, Spain (2004)Google Scholar
  9. 9.
    Bertolami, R., Zimmermann, M., Bunke, H.: Rejection strategies for offline handwritten text recognition. Pattern Recognition Letter 27, 2005–2012 (2006)CrossRefGoogle Scholar
  10. 10.
    Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for off-line handwriting recognition. IJDAR, 39–46 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Lionel Tarazón
    • 1
  • Daniel Pérez
    • 1
  • Nicolás Serrano
    • 1
  • Vicent Alabau
    • 1
  • Oriol Ramos Terrades
    • 1
  • Alberto Sanchis
    • 1
  • Alfons Juan
    • 1
  1. 1.DSIC/ITIUniversitat Politècnica de ValènciaValènciaSpain

Personalised recommendations