Confidence Measures for Error Correction in Interactive Transcription Handwritten Text
An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.
KeywordsComputer-assisted Transcription of Handwritten Text User Interfaces Confidence Measures
- 1.Toselli, A.H., Juan, A., Keysers, D., et al.: Integrated handwriting recognition and interpretation using finite-state models. IJPRAI 18(4), 519–539 (2004)Google Scholar
- 5.Juan, A., et al.: iDoc research project (2009), http://prhlt.iti.es/projects/handwritten/idoc/content.php?page=idoc.php
- 6.Pérez, D., Tarazón, L., Serrano, N., Castro, F., Ramos, O., Juan, A.: The GERMANA database. In: Proc. of ICDAR 2009 (2009)Google Scholar
- 8.Sanchis, A.: Estimación y aplicación de medidas de confianza en reconocimiento automático del habla. PhD thesis, Univ. Politécnica de Valencia, Spain (2004)Google Scholar
- 10.Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for off-line handwriting recognition. IJDAR, 39–46 (2002)Google Scholar