Advertisement

Using BLSTM for Spotting Regular Expressions in Handwritten Documents

  • Gautier BideaultEmail author
  • Luc Mioulet
  • Clément Chatelain
  • Thierry Paquet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9493)

Abstract

This article concerns the spotting of regular expressions (REGEX) in handwritten documents using a hybrid model. Spotting REGEX in a document image allow to consider further extraction tasks such as document categorization or named entities extraction. Our model combines state of the art BLSTM recurrent neural network for character recognition and segmentation with a HMM model able to spot the desired sequences. The BLSTM has also been evaluated for spotting without the use of the HMM stage, providing a 100 % precision system. Our experiments on a public handwritten database show interesting results.

Keywords

Regular Expression Recurrent Neural Network Document Image Text Line Meta Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Hosoya, H., Pierce, B.: Regular expression pattern matching for XML. In: Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 67–80 (2001)Google Scholar
  2. 2.
    Dengel, A.R., Klein, B.: \(smartFIX\): a requirements-driven system for document analysis and understanding. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 433–444. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  3. 3.
    Spitz, A.: Using character shape codes for word spotting in document images. In: Proceedings of the Symposium on Document Analysis and Information Retrieval, pp. 382–389 (1995)Google Scholar
  4. 4.
    Spitz, A.: Determination of script, language content of document images. IEEE Trans. Pattern Anal. Mach. Intell. 19, 235–245 (1997)CrossRefGoogle Scholar
  5. 5.
    Morita, M.E., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Segmentation and recognition of handwritten dates: an HMM-MLP hybrid approach. Doc. Anal. Recogn. 6(4), 248–262 (2003)CrossRefGoogle Scholar
  6. 6.
    Chatelain, C., Heutte, L., Paquet, T.: A two-stage outlier rejection strategy for numerical field extraction in handwritten documents. In: ICPR, Hong Kong, China, vol. 3, pp. 224–227 (2006)Google Scholar
  7. 7.
    Chatelain, C., Heutte, L., Paquet, T.: Recognition-based vs syntax-directed models for numerical field extraction in handwritten documents. In: ICFHR, Montreal, Canada, 6 p. (2008)Google Scholar
  8. 8.
    Kessentini, Y., Chatelain, C., Paquet, T.: Word spotting and regular expression detection in handwritten documents. In: ICDAR (2013)Google Scholar
  9. 9.
    Grosicki, E., El Abed, H.: ICDAR 2009 handwriting recognition competition. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 1398–1402. IEEE (2009)Google Scholar
  10. 10.
    Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 218–222. IEEE (2003)Google Scholar
  11. 11.
    Cao, H., Govindaraju, V.: Template-free word spotting in low-quality manuscripts. In: Proceedings of the 6th International Conference on Advances in Pattern Recognition, pp. 135–139 (2007)Google Scholar
  12. 12.
    Adamek, T., O’Connor, N.E., Smeaton, A.F.: Word matching using single closed contours for indexing handwritten historical documents. Int. J. Doc. Anal. Recogn. (IJDAR) 9, 153–165 (2007)CrossRefGoogle Scholar
  13. 13.
    Rusinol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 63–67. IEEE (2011)Google Scholar
  14. 14.
    Rodríguez-Serrano, J.A., Perronnin, F., Lladós, J., Sánchez, G.: A similarity measure between vector sequences with application to handwritten word image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1722–1729. IEEE (2009)Google Scholar
  15. 15.
    Rodríguez-Serrano, J.A., Perronnin, F.: Handwritten word-spotting using hidden markov models and universal vocabularies. Pattern Recogn. 42, 2106–2116 (2009)CrossRefzbMATHGoogle Scholar
  16. 16.
    Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 34, 211–224 (2012)CrossRefGoogle Scholar
  17. 17.
    Thomas, S., Chatelain, C., Heutte, L., Paquet, T., Kessentini, Y.: A deep HMM model for multiple keywords spotting in handwritten documents. Accepted for publication in Pattern Analysis and Applications (2014). doi: 10.1007/s10044-014-0433-3
  18. 18.
    Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn. Lett. 33, 934–942 (2012)CrossRefGoogle Scholar
  19. 19.
    Wshah, S., Kumar, G., Govindaraju, V.: Script independent word spotting in offline handwritten documents based on hidden markov models. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 14–19. IEEE (2012)Google Scholar
  20. 20.
    Thomas, S., Chatelain, C., Heutte, L., Paquet, T.: An information extraction model for unconstrained handwritten documents. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 3412–3415. IEEE (2010)Google Scholar
  21. 21.
    Graves, A., Liwicki, M., Bunke, H., Schmidhuber, J., Fernández, S.: Unconstrained on-line handwriting recognition with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 577–584 (2008)Google Scholar
  22. 22.
    Frinken, V., Fischer, A., Bunke, H.: A novel word spotting algorithm using bidirectional long short-term memory neural networks. In: Schwenker, F., El Gayar, N. (eds.) ANNPR 2010. LNCS, vol. 5998, pp. 185–196. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  23. 23.
    Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: A tandem BLSTM-DBN architecture for keyword spotting with enhanced context modeling. In: Proceedings of NOLISP (2009)Google Scholar
  24. 24.
    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)Google Scholar
  25. 25.
    Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009)CrossRefGoogle Scholar
  26. 26.
    Grosicki, E., El-Abed, H.: ICDAR 2011-french handwriting recognition competition. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1459–1463. IEEE (2011)Google Scholar
  27. 27.
    Rodrıguez, J.A., Perronnin, F.: Local gradient histogram features for word spotting in unconstrained handwritten documents. In: International Conference on Frontiers in Handwriting Recognition (2008)Google Scholar
  28. 28.
    Paquet, T., Heutte, L., Koch, G., Chatelain, C.: A categorization system for handwritten documents. Int. J. Doc. Anal. Recogn. 15, 315–330 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Gautier Bideault
    • 1
    Email author
  • Luc Mioulet
    • 1
  • Clément Chatelain
    • 2
  • Thierry Paquet
    • 1
  1. 1.Laboratoire LITIS - EA 4108, Universite de RouenSaint-Etienne-du-Rouvray CedexFrance
  2. 2.Laboratoire LITIS - EA 4108, INSA RouenSaint-Etienne-du-Rouvray CedexFrance

Personalised recommendations