Active Interaction and Learning in Handwritten Text Transcription
Computer-assisted systems are being increasingly used in a variety of real-world tasks, though their application to handwritten text transcription in old manuscripts remains largely unexplored. The basic idea explored in this chapter is to follow a sequential, line-by-line transcription of the whole manuscript in which a continuously retrained system interacts with the user to efficiently transcribe each new line. User interaction is expensive in terms of time and cost. Our top priority is to take advantage of these interactions, while trying to reduce them as most as possible.
To this end, we study three different frameworks: (a) improve a recognition system from newly recognized transcriptions via adaptation techniques, using semi-supervised learning techniques; (b) study how to best adapt from limited user supervisions, which is related to active learning; and (c) develop a simple error estimate, which is used to let the user adjust the error in a computer-assisted transcription task. In addition, we test these approaches in the sequential transcription of two old text documents.
KeywordsText Line Recognition Error Word Error Rate Handwritten Text Word Graph
- 2.Kristjannson, T., Culotta, A., Viola, P., & McCallum, A. (2004). Interactive information extraction with constrained conditional random fields. In Proceedings of the 19th national conference on artificial intelligence (AAAI 2004) (pp. 412–418), San Jose, CA, USA. Google Scholar
- 7.Serrano, N., Pérez, D., Sanchis, A., & Juan, A. (2009). Adaptation from partially supervised handwritten text transcriptions. In Proceedings of the 11th international conference on multimodal interfaces and the 6th workshop on machine learning for multimodal interaction (ICMI-MLMI 2009) (pp. 289–292), Cambridge, MA, USA. CrossRefGoogle Scholar
- 8.Serrano, N., Castro, F., & Juan, A. (2010). The RODRIGO database. In Proceedings of the 7th international conference on language resources and evaluation (LREC 2010) (pp. 2709–2712), Valleta, Malta. Google Scholar
- 9.Settles, B. (2009). Active learning literature survey (Computer Sciences Technical Report No. 1648). University of Wisconsin-Madison. Google Scholar
- 10.Tarazón, L., Pérez, D., Serrano, N., Alabau, V., Ramos-Terrades, O., Sanchis, A., & Juan, A. (2009). Confidence measures for error correction in interactive transcription of handwritten text. In Proceedings of the 15th international conference on image analysis and processing (ICIAP 2009) (pp. 567–574), Vietri sul Mare, Italy. Google Scholar