Advertisement

Learning Features for Writer Identification from Handwriting on Papyri

Conference paper
  • 165 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1322)

Abstract

Computerized analysis of historical documents has remained an interesting research area for the pattern classification community for many decades. From the perspective of computerized analysis, key challenges in the historical manuscripts include automatic transcription, dating, retrieval, classification of writing styles and identification of scribes etc. Among these, the focus of our current study lies on identification of writers from the digitized manuscripts. We exploit convolutional neural networks for extraction of features and characterization of writer. The ConvNets are first trained on contemporary handwriting samples and then fine-tuned to the limited set of historical manuscripts considered in our study. Dense sampling is carried out over a given manuscript producing a set of small writing patches for each document. Decisions on patches are combined using a majority vote to conclude the authorship of a query document. Preliminary experiments on a set of challenging and degraded manuscripts report promising performance.

Keywords

Writer identification ConvNets IAM dataset Papyrus 

Notes

Acknowledgement

Authors would like to thank Dr. Isabelle Marthot-Santaniello from University of Basel, Switzerland for making the dataset available.

References

  1. 1.
    Baird, H.S., Govindaraju, V., Lopresti, D.P.: Document analysis systems for digital libraries: challenges and opportunities. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 1–16. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-28640-0_1CrossRefGoogle Scholar
  2. 2.
    Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document images analysis solutions for digital libraries. In: 2004 Proceedings of the First International Workshop on Document Image Analysis for Libraries, pp. 2–24. IEEE (2004)Google Scholar
  3. 3.
    Sankar, K.P., Ambati, V., Pratha, L., Jawahar, C.V.: Digitizing a million books: challenges for document analysis. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 425–436. Springer, Heidelberg (2006).  https://doi.org/10.1007/11669487_38CrossRefGoogle Scholar
  4. 4.
    Klemme, A.: International Dunhuang project: the silk road online. Ref. Rev. 28(2), 51–52 (2014)Google Scholar
  5. 5.
    Van der Zant, T., Schomaker, L., Haak, K.: Handwritten-word spotting using biologically inspired features. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1945–1957 (2008)CrossRefGoogle Scholar
  6. 6.
    Aiolli, F., Ciula, A.: A case study on the system for paleographic inspections (SPI): challenges and new developments. Comput. Intell. Bioeng. 196, 53–66 (2009)Google Scholar
  7. 7.
    Hamid, A., Bibi, M., Siddiqi, I., Moetesum, M.: Historical manuscript dating using textural measures. In: 2018 International Conference on Frontiers of Information Technology (FIT), pp. 235–240. IEEE (2018)Google Scholar
  8. 8.
    Hamid, A., Bibi, M., Moetesum, M., Siddiqi, I.: Deep learning based approach for historical manuscript dating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 967–972 (2019)Google Scholar
  9. 9.
    He, S., Samara, P., Burgers, J., Schomaker, L.: Image-based historical manuscript dating using contour and stroke fragments. Pattern Recogn. 58, 159–171 (2016)CrossRefGoogle Scholar
  10. 10.
    Srihari, S.N., Cha, S.-H., Arora, H., Lee, S.: Individuality of handwriting. J. Forensic Sci. 47(4), 1–17 (2002)CrossRefGoogle Scholar
  11. 11.
    Said, H.E., Tan, T.N., Baker, K.D.: Personal identification based on handwriting. Pattern Recogn. 33(1), 149–160 (2000)CrossRefGoogle Scholar
  12. 12.
    He, Z., You, X., Tang, Y.Y.: Writer identification using global wavelet-based features. Neurocomputing 71(10–12), 1832–1841 (2008)CrossRefGoogle Scholar
  13. 13.
    He, S., Schomaker, L.: Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recogn. 88, 64–74 (2019)CrossRefGoogle Scholar
  14. 14.
    Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)CrossRefGoogle Scholar
  15. 15.
    Siddiqi, I., Vincent, N.: Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features. Pattern Recogn. 43(11), 3853–3865 (2010)CrossRefGoogle Scholar
  16. 16.
    Xing, L., Qiao, Y.: DeepWriter: a multi-stream deep CNN for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589. IEEE (2016)Google Scholar
  17. 17.
    Mohammed, H., Marthot-Santaniello, I., Märgner, V.: GRK-Papyri: a dataset of Greek handwriting on papyri for the task of writer identification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 726–731 (2019)Google Scholar
  18. 18.
    Rehman, A., Naz, S., Razzak, M.I., Hameed, I.A.: Automatic visual features for writer identification: a deep learning approach. IEEE Access 7, 17149–17157 (2019)CrossRefGoogle Scholar
  19. 19.
    Xing, L., Qiao, Y.: DeepWriter: a multi-stream deep CNN for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589 (2016)Google Scholar
  20. 20.
    Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 991–997 (2017)Google Scholar
  21. 21.
    Keglevic, M., Fiel, S., Sablatnig, R.: Learning features for writer retrieval and identification using triplet CNNs. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 211–216 (2018)Google Scholar
  22. 22.
    Awaida, S.M., Mahmoud, S.A.: State of the art in off-line writer identification of handwritten text and survey of writer identification of Arabic text. Educ. Res. Rev. 7(20), 445–463 (2012)CrossRefGoogle Scholar
  23. 23.
    Tan, G.J., Sulong, G., Rahim, M.S.M.: Writer identification: a comparative study across three world major languages. Forensic Sci. Int. 279, 41–52 (2017)CrossRefGoogle Scholar
  24. 24.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  25. 25.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  26. 26.
    Tang, Y., Wu, X.: Text-independent writer identification via CNN features and joint Bayesian. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 566–571, October 2016Google Scholar
  27. 27.
    Nasuno, R., Arai, S.: Writer identification for offline Japanese handwritten character using convolutional neural network. In: Proceedings of the 5th IIAE (Institute of Industrial Applications Engineers) International Conference on Intelligent Systems and Image Processing, pp. 94–97 (2017)Google Scholar
  28. 28.
    Fiel, S., Sablatnig, R.: Writer identification and retrieval using a convolutional neural network. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9257, pp. 26–37. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23117-4_3CrossRefGoogle Scholar
  29. 29.
    Chen, S., Wang, Y., Lin, C.-T., Ding, W., Cao, Z.: Semi-supervised feature learning for improving writer identification. Inf. Sci. 482, 156–170 (2019)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Islam, A.U., Khan, M.J., Khurshid, K., Shafait, F.: Hyperspectral image analysis for writer identification using deep learning. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7 (2019)Google Scholar
  31. 31.
    Bar-Yosef, I., Beckman, I., Kedem, K., Dinstein, I.: Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents. IJDAR 9(2–4), 89–99 (2007).  https://doi.org/10.1007/s10032-007-0041-5CrossRefGoogle Scholar
  32. 32.
    Fecker, D., Asit, A., Märgner, V., El-Sana, J., Fingscheidt, T.: Writer identification for historical Arabic documents. In: 2014 22nd International Conference on Pattern Recognition, pp. 3050–3055. IEEE (2014)Google Scholar
  33. 33.
    Schomaker, L., Franke, K., Bulacu, M.: Using codebooks of fragmented connected-component contours in forensic and historic writer identification. Pattern Recogn. Lett. 28(6), 719–727 (2007)CrossRefGoogle Scholar
  34. 34.
    Cilia, N., De Stefano, C., Fontanella, F., Marrocco, C., Molinara, M., Di Freca, A.S.: An end-to-end deep learning system for medieval writer identification. Pattern Recogn. Lett. 129, 137–143 (2020)CrossRefGoogle Scholar
  35. 35.
    Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. arXiv preprint arXiv:1905.09113 (2019)
  36. 36.
    Cilia, N.D., De Stefano, C., Fontanella, F., Marrocco, C., Molinara, M., Scotto Di Freca, A.: A two-step system based on deep transfer learning for writer identification in medieval books. In: Vento, M., Percannella, G. (eds.) CAIP 2019. LNCS, vol. 11679, pp. 305–316. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-29891-3_27CrossRefGoogle Scholar
  37. 37.
    Mohammed, H., Märgner, V., Stiehl, H.S.: Writer identification for historical manuscripts: analysis and optimisation of a classifier as an easy-to-use tool for scholars from the humanities. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 534–539 (2018)Google Scholar
  38. 38.
    McCann, S., Lowe, D.G.: Local Naive Bayes nearest neighbor for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3650–3656. IEEE (2012)Google Scholar
  39. 39.
    Pagels, P.E.: e-codices-virtual manuscript library of Switzerland (2016)Google Scholar
  40. 40.
    Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)CrossRefGoogle Scholar
  41. 41.
    He, S., Schomaker, L.: DeepOtsu: document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019)CrossRefGoogle Scholar
  42. 42.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  43. 43.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  44. 44.
    Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)
  45. 45.
    Marti, U.-V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002).  https://doi.org/10.1007/s100320200071CrossRefzbMATHGoogle Scholar
  46. 46.
    Rong, W., Li, Z., Zhang, W., Sun, L.: An improved canny edge detection algorithm. In: 2014 IEEE International Conference on Mechatronics and Automation, pp. 577–582. IEEE (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2021

Authors and Affiliations

  1. 1.Vision and Learning LabBahria UniversityIslamabadPakistan

Personalised recommendations