Skip to main content

Evaluating Performance and Accuracy Improvements for Attention-OCR

  • Conference paper
  • First Online:
Computer Information Systems and Industrial Management (CISIM 2019)

Abstract

In this paper we evaluated a set of potential improvements to the successful Attention-OCR architecture, designed to predict multiline text from unconstrained scenes in real-world images. We investigated the impact of several optimizations on model’s accuracy, including employing dynamic RNNs (Recurrent Neural Networks), scheduled sampling, BiLSTM (Bidirectional Long Short-Term Memory) and a modified attention model. BiLSTM was found to slightly increase the accuracy, while dynamic RNNs and a simpler attention model provided a significant training time reduction with only a slight decline in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/tensorflow/models/tree/master/research/attention_ocr.

  2. 2.

    https://github.com/Avenire/models.

References

  1. Badue, C., et al.: Self-driving cars: A survey. arXiv preprint arXiv:1901.04407 (2019)

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473 (2014)

    Google Scholar 

  3. Bartz, C., Yang, H., Meinel, C.: STN-OCR: A single neural network for text detection and text recognition. CoRR, abs/1707.08831 (2017)

    Google Scholar 

  4. Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. CoRR, abs/1506.03099 (2015)

    Google Scholar 

  5. Brzeski, A., Grinholc, K., Nowodworski, K., Przybyłek, A.: Residual mobilenets. In: Workshop on Modern Approaches in Data Engineering and Information System Design at ADBIS 2019 (2019)

    Google Scholar 

  6. Goyal, K., Dyer, C., Berg-Kirkpatrick, T.: Differentiable scheduled sampling for credit assignment. CoRR, abs/1704.06970 (2017)

    Google Scholar 

  7. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. CoRR, abs/1604.06646 (2016)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR, abs/1512.03385 (2015)

    Google Scholar 

  9. Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  10. Huszár, F.: How (not) to train your generative model: Scheduled sampling, likelihood, adversary? arXiv e-prints, November 2015

    Google Scholar 

  11. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., (eds) Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025. Curran Associates Inc (2015)

    Google Scholar 

  12. Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. CoRR, abs/1707.03985 (2017)

    Google Scholar 

  13. Liukkonen, M., Tsai, T.-N.: Toward decentralized intelligence in manufacturing: recent trends in automatic identification of things. Int. J. Adv. Manufact. Technol. 87(9–12), 2509–2531 (2016)

    Article  Google Scholar 

  14. Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR, abs/1508.04025 (2015)

    Google Scholar 

  15. Mathews, A.P., Xie, L., He, X.: Semstyle: Learning to generate stylised image captions using unaligned text. CoRR, abs/1805.07030 (2018)

    Google Scholar 

  16. Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_14

    Chapter  Google Scholar 

  17. Przybyłek, K., Shkroba, I.: Crowd counting á la bourdieu. In: Workshop on Modern Approaches in Data Engineering and Information System Design at ADBIS 2019 (2019)

    Google Scholar 

  18. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497 (2015)

    Google Scholar 

  19. Shi, B., Wang, X., Lv, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. CoRR, abs/1603.03915 (2016)

    Google Scholar 

  20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  21. Smith, R., et al.: End-to-end interpretation of the French street name signs dataset. CoRR, abs/1702.03970 (2017)

    Google Scholar 

  22. Sønderby, S.K., Sønderby, C.K., Maaløe, L., Winther, O.: Recurrent spatial transformer networks. CoRR, abs/1509.05329 (2015)

    Google Scholar 

  23. Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019)

  24. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  25. Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66

    Chapter  Google Scholar 

  26. Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: Platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626 (2018)

  27. Wang, X., Takaki, S., Yamagishi, J.: An RNN-based quantized f0 model with multi-tier feedback links for text-to-speech synthesis. In: INTERSPEECH (2017)

    Google Scholar 

  28. Wang, Y., Gao, Z., Long, M., Wang, J., Yu, P.S.: PredRNN++: Towards A resolution of the deep-in-time dilemma in spatiotemporal predictive learning. CoRR, abs/1804.06300 (2018)

    Google Scholar 

  29. Wojna, Z., et al.: Attention-based extraction of structured information from street view imagery. CoRR, abs/1704.03549 (2017)

    Google Scholar 

  30. Yi, C., Tian, Y.: Assistive text reading from natural scene for blind persons. In: Hua, G., Hua, X.-S. (eds.) Mobile Cloud Visual Media Computing, pp. 219–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24702-1_9

    Chapter  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by Statutory Funds of Electronics, Telecommunications and Informatics Faculty, Gdansk University of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Brzeski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brzeski, A., Grinholc, K., Nowodworski, K., Przybyłek, A. (2019). Evaluating Performance and Accuracy Improvements for Attention-OCR. In: Saeed, K., Chaki, R., Janev, V. (eds) Computer Information Systems and Industrial Management. CISIM 2019. Lecture Notes in Computer Science(), vol 11703. Springer, Cham. https://doi.org/10.1007/978-3-030-28957-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28957-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28956-0

  • Online ISBN: 978-3-030-28957-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics