Advertisement

Attribute CNNs for word spotting in handwritten documents

Special Issue Paper
  • 91 Downloads

Abstract

Word spotting has become a field of strong research interest in document image analysis over the last years. Recently, AttributeSVMs were proposed which predict a binary attribute representation (Almazán et al. in IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566, 2014). At their time, this influential method defined the state of the art in segmentation-based word spotting. In this work, we present an approach for learning attribute representations with convolutional neural networks(CNNs). By taking a probabilistic perspective on training CNNs, we derive two different loss functions for binary and real-valued word string embeddings. In addition, we propose two different CNN architectures, specifically designed for word spotting. These architectures are able to be trained in an end-to-end fashion. In a number of experiments, we investigate the influence of different word string embeddings and optimization strategies. We show our attribute CNNs to achieve state-of-the-art results for segmentation-based word spotting on a large variety of data sets.

Keywords

Attribute CNN PHOCNet TPP layer Word spotting Deep learning Handwritten documents Historical documents 

Notes

Acknowledgements

We would like to thank Irfan Ahmad for supplying the IFN/ENIT character mapping.

References

  1. 1.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: International Conference on Database Theory, pp. 420–434 (2001)Google Scholar
  2. 2.
    Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: Integrating visual and textual cues for query-by-string word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 511–515 (2013)Google Scholar
  3. 3.
    Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)CrossRefGoogle Scholar
  4. 4.
    Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: conjoined triple deep network for learning local image descriptors. arXiv (2016)Google Scholar
  5. 5.
    Chollet, F.: Information-theoretical label embeddings for large-scale image classification. arXiv (2016)Google Scholar
  6. 6.
    Dai, B., Ding, S., Wahba, G.: Multivariate Bernoulli distribution. Bernoulli 19(4), 1465–1483 (2013)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78 (2012)CrossRefGoogle Scholar
  8. 8.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetMATHGoogle Scholar
  9. 9.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Computer Vision and Pattern Recognition, pp. 1778–1785. Miami (2009)Google Scholar
  10. 10.
    Fischer, A., Keller, A., Frinken, V., Bunke, H.: HMM-based word spotting in handwritten documents using subword models. In: Proceedings of the International Conference on Pattern Recognition, pp. 3416–3419 (2010)Google Scholar
  11. 11.
    Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 34, 211–224 (2012)CrossRefGoogle Scholar
  12. 12.
    Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the International Conference on Machine Learning, pp. 1050–1059. New York City (2016)Google Scholar
  13. 13.
    Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68, 310–332 (2017)CrossRefGoogle Scholar
  14. 14.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778. Las Vegas (2016)Google Scholar
  18. 18.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Neural Information Processing Systems. Montreal (2014)Google Scholar
  19. 19.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T., Eecs, U.C.B.: Caffe: convolutional architecture for fast feature embedding. In: ACM Conference on Multimedia, pp. 675–678. Orlando (2014)Google Scholar
  20. 20.
    Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4565–4574. Las Vegas (2016)Google Scholar
  21. 21.
    Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations. San Diego (2015)Google Scholar
  22. 22.
    Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: International Conference on Document Analysis and Recognition, pp. 560–564. Washingotn (2013)Google Scholar
  23. 23.
    Kołcz, A., Alspector, J., Augusteijn, M., Carlson, R., Viorel Popescu, G.: A line-oriented approach to word spotting in handwritten documents. Pattern Anal. Appl. 3(2), 154–168 (2000)Google Scholar
  24. 24.
    Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten Text. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 289–294 (2016)Google Scholar
  25. 25.
    Krishnan, P., Jawahar, C.: Matching handwritten document images. In: European Conference on Computer Vision. Amsterdam (2016)Google Scholar
  26. 26.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105. Montreal (2012)Google Scholar
  27. 27.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Computer Vision and Pattern Recognition, pp. 951–958. Miami (2009)Google Scholar
  28. 28.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)CrossRefGoogle Scholar
  29. 29.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. New York City (2006)Google Scholar
  30. 30.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404. Denver (1990)Google Scholar
  31. 31.
    Manmatha, R., Han, C., Riseman, E.: Word spotting: a new approach to indexing handwriting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–29 (1996)Google Scholar
  32. 32.
    Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATHGoogle Scholar
  33. 33.
    Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015)Google Scholar
  34. 34.
    Ojala, M., Garriga, G.C.: Permutation tests for studying classifier performance. J. Mach. Learn. Res. 11, 1833–1863 (2010)MathSciNetMATHGoogle Scholar
  35. 35.
    Pechwitz, M., Maddouri, S., Märgner, V.: IFN/ENIT-database of handwritten Arabic words. Colloque International Francophone sur l’Ecrit et le Document, pp. 1–8 (2002)Google Scholar
  36. 36.
    Poznanski, A., Wolf, L.: CNN-N-Gram for Handwriting Word Recognition. In: Computer Vision and Pattern Recognition, pp. 2305–2314. Las Vegas (NV), USA (2016)Google Scholar
  37. 37.
    Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: International Conference on Frontiers in Handwriting Recognition, pp. 613–618. Shenzhen (2016)Google Scholar
  38. 38.
    Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recogn. 9, 139–152 (2007)CrossRefGoogle Scholar
  39. 39.
    Retsinas, G., Sfikas, G., Gatos, B.: Transferable deep features for keyword spotting. In: Proceedings of the European Signal Processing Conference. Kos Island (2017)Google Scholar
  40. 40.
    Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2108–2120 (2012)CrossRefGoogle Scholar
  41. 41.
    Rodriguez-Serrano, J.A., Perronnin, F.: Label embedding for text recognition. In: British Machine Vision Conference (2013)Google Scholar
  42. 42.
    Romero, V., Fornés, A., Serrano, N., Sánchez, J.A., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46(6), 1658–1669 (2013)CrossRefGoogle Scholar
  43. 43.
    Rothacker, L., Fink, G.A.: Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: International Conference on Document Analysis and Recognition, pp. 661–665. Nancy (2015)Google Scholar
  44. 44.
    Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: International Conference on Document Analysis and Recognition, pp. 1305–1309 (2013)Google Scholar
  45. 45.
    Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: Proceedings of the International Conference on Document Analysis and Recognition. Kyoto (2017)Google Scholar
  46. 46.
    Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: International Conference on Document Analysis and Recognition, pp. 63–67. Beijing (2011)Google Scholar
  47. 47.
    Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. Pattern Recogn. 48(2), 545–555 (2015)CrossRefGoogle Scholar
  48. 48.
    Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Towards query-by-speech handwritten keyword spotting. In: International Conference on Document Image Analysis, pp. 501–505. Nancy (2015)Google Scholar
  49. 49.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Shalizi, C.R.: Advanced Data Analysis from an Elementary Point of View. Cambridge University Press, Cambridge (2013)Google Scholar
  51. 51.
    Sharma, A., Pramod, S.K.: Adapting off-the-shelf CNNs for word spotting & recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 986–990 (2015)Google Scholar
  52. 52.
    Silberpfennig, A., Wolf, L., Dershowitz, N., Bhagesh, S., Chaudhuri, B.B.: Improving OCR for an under-resourced script using unsupervised word-spotting. In: International Conference on Document Analysis and Recognition, pp. 706–710. Nancy (2015)Google Scholar
  53. 53.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (2015)Google Scholar
  54. 54.
    Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Conference on Information and Knowledge Management, pp. 623–632. Lisbon (2007)Google Scholar
  55. 55.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: Proceedings of the International Conference on Learning Representations (2015)Google Scholar
  56. 56.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetMATHGoogle Scholar
  57. 57.
    Sudholt, S., Fink, G.A.: A modified isomap approach to manifold learning in word spotting. In: Proceedings of the German Conference on Pattern Recognition, pp. 529–539 (2015)Google Scholar
  58. 58.
    Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 277–282 (2016)Google Scholar
  59. 59.
    Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition (2017)Google Scholar
  60. 60.
    Sudholt, S., Gurjar, N., Fink, G.A.: Learning deep representations for word spotting under weak supervision. arXiv (2017)Google Scholar
  61. 61.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., Hill, C., Arbor, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2014)Google Scholar
  62. 62.
    Tieleman, T., Hinton, G.: Lecture 6.5–RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 26–31 (2012)Google Scholar
  63. 63.
    Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370, 497–518 (2016)CrossRefGoogle Scholar
  64. 64.
    Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 307–312 (2016)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.TU Dortmund UniversityDortmundGermany

Personalised recommendations