Skip to main content

Modern vs Diplomatic Transcripts for Historical Handwritten Text Recognition

  • Conference paper
  • First Online:
New Trends in Image Analysis and Processing – ICIAP 2019 (ICIAP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11808))

Included in the following conference series:

Abstract

The transcription of handwritten documents is useful to make their contents accessible to the general public. However, so far automatic transcription of historical documents has mostly focused on producing diplomatic transcripts, even if such transcripts are often only understandable by experts. Main difficulties come from the heavy use of extremely abridged and tangled abbreviations and archaic or outdated word forms. Here we study different approaches to train optical models which allow to recognize historic document images containing archaic and abbreviated handwritten text and produce modernized transcripts with expanded abbreviations. Experiments comparing the performance of the different approaches proposed are carried out on a document collection related with Spanish naval commerce during the XV–XIX centuries, which includes extremely difficult handwritten text images.

Work partially supported by the BBVA Foundation through the 2017–2018 Digital Humanities research grant “Carabela”, by Miniterio de Ciencia/AEI/FEDER/EU through the MIRANDA-DocTIUM project (RTI2018-095645-B-C22), and by EU JPICH project “HOME – History Of Medieval Europe” (Spanish PEICTI Ref. PCI2018-093122).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.tei-c.org/index.xml.

  2. 2.

    https://github.com/jpuigcerver/Laia.

References

  1. Bloomberg, D.S., Kopec, G.E., Dasari, L.: Measuring document image skew and orientation. In: SPIE, vol. 2422, pp. 302–316 (1995)

    Google Scholar 

  2. Bluche, T., et al.: Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: 2017 14th ICDAR, vol. 01, pp. 311–316 (2017)

    Google Scholar 

  3. Bluche, T., Ney, H., Kermorvant, C.: The LIMSI/A2iA handwriting recognition systems for the HTRtS contest. In: ICDAR, pp. 448–452 (2015)

    Google Scholar 

  4. Bluche, T.: Deep neural networks for large vocabulary handwritten text recognition. Ph.D. thesis, Ecole Doctorale Informatique de Paris-Sud, May 2015

    Google Scholar 

  5. Buse, R., Liu, Z., Caelli, T.: A structural and relational approach to handwritten word recognition. IEEE Trans. SMCS, Part B 27(5), 847–861 (1997)

    Google Scholar 

  6. España-Boquera, S., Castro-Bleda, M., Gorbe-Moya, J., Zamora-Martínez, F.: Improving offline handwriting text recognition with hybrid HMM/ANN models. IEEE Trans. PAMI 33(4), 767–779 (2011)

    Article  Google Scholar 

  7. Fawzi, A., Gadea, M.P., Martínez-Hinarejos, C.D.: Baseline detection on Arabic handwritten documents. In: Proceedings of the 2017 ACM Symposium on Document Engineering, DocEng 2017, pp. 193–196. ACM (2017)

    Google Scholar 

  8. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)

    Google Scholar 

  9. Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. PAMI 31(5), 855–868 (2009)

    Article  Google Scholar 

  10. Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: ICASSP 1995, vol. 1, pp. 181–184. IEEE Computer Society (1995)

    Google Scholar 

  11. Leiva, L.A., Toselli, A.H., Bordes-Cabrera, I., Hernández-Tornero, C., Vidal, E., Bosch, V.: Transcribing a 17th-century botanical manuscript: longitudinal evaluation of document layout detection and interactive transcription. Digit. Scholarsh. Humanit. 33(1), 173–202 (2017)

    Google Scholar 

  12. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: International Conference on Machine Learning, vol. 30 (2013)

    Google Scholar 

  13. Moysset, B., et al.: The A2iA multi-lingual text recognition system at the second Maurdor evaluation. In: ICFHR, pp. 297–302 (2014)

    Google Scholar 

  14. Pham, V., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. CoRR abs/1312.4569 (2013)

    Google Scholar 

  15. Povey, D., et al.: The Kaldi speech recognition toolkit. In: ASRU, December 2011

    Google Scholar 

  16. Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR, vol. 01, pp. 67–72 (2017)

    Google Scholar 

  17. Quirós, L., Bosch, V., Serrano, L., Toselli, A.H., Vidal, E.: From HMMs to RNNs: computer-assisted transcription of a handwritten notarial records collection. In: 2018 16th International Conference on Frontiers in Handwriting Recognition, pp. 116–121 (2018)

    Google Scholar 

  18. Roeder, P.: Adapting the RWTH-OCR handwriting recognition system to French handwriting. Ph.D. thesis, RWTH Aachen University, Aachen, Germany (2009)

    Google Scholar 

  19. Romero, V., Toselli, A.H., Sánchez, J.A., Vidal, E.: Handwriting transcription and keyword spotting in historical daily records documents. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 275–280, April 2016

    Google Scholar 

  20. Romero, V., Toselli, A.H., Vidal, E.: Multimodal Interactive Handwritten Text Transcription. Series in MPAI. World Scientific Publishing, Singapore (2012)

    Book  Google Scholar 

  21. Sánchez, J.A., Bosch, V., Romero, V., Depuydt, K., de Does, J.: Handwritten text recognition for historical documents in the transcriptorium project. In: Proceedings of the DATeCH 2014, pp. 111–117, New York, NY, USA (2014)

    Google Scholar 

  22. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs/1507.05717 (2015)

    Google Scholar 

  23. Stolcke, A.: SRILM—an extensible language modeling toolkit. In: The 7th International Conference on Spoken Language Processing (ICSLP 2002), vol. 2, July 2004

    Google Scholar 

  24. Tieleman, T., Hinton, G.: Lecture 6.5-RMSProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–30 (2012)

    Google Scholar 

  25. Toselli, A.H., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical n-gram-HMM methods. In: International Workshop on Historical Document Imaging and Processing, pp. 15–22 (2015)

    Google Scholar 

  26. Villegas, M., Romero, V., Sánchez, J.A.: On the modification of binarization algorithms to retain grayscale information for handwritten text recognition. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 208–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_24

    Chapter  Google Scholar 

  27. Villegas, M., Toselli, A.H., Romero, V., Vidal, E.: Exploiting existing modern transcripts for historical handwritten text recognition. In: 2016 ICFHR, pp. 66–71, October 2016

    Google Scholar 

  28. Vinciarelli, A., Luettin, J.: A new normalization technique for cursive handwritten words. Pattern Recogn. Lett. 22(9), 1043–1050 (2001)

    Article  Google Scholar 

  29. Vinciarelli, A., Bengio, S., Bunke, H.: Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. PAMI 26(6), 709–720 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verónica Romero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Romero, V., Toselli, A.H., Vidal, E., Sánchez, J.A., Alonso, C., Marqués, L. (2019). Modern vs Diplomatic Transcripts for Historical Handwritten Text Recognition. In: Cristani, M., Prati, A., Lanz, O., Messelodi, S., Sebe, N. (eds) New Trends in Image Analysis and Processing – ICIAP 2019. ICIAP 2019. Lecture Notes in Computer Science(), vol 11808. Springer, Cham. https://doi.org/10.1007/978-3-030-30754-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30754-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30753-0

  • Online ISBN: 978-3-030-30754-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics