Evaluating Performance and Accuracy Improvements for Attention-OCR

Brzeski, Adam; Grinholc, Kamil; Nowodworski, Kamil; Przybyłek, Adam

doi:10.1007/978-3-030-28957-7_1

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11703))

Included in the following conference series:

IFIP International Conference on Computer Information Systems and Industrial Management

937 Accesses
4 Citations

Abstract

In this paper we evaluated a set of potential improvements to the successful Attention-OCR architecture, designed to predict multiline text from unconstrained scenes in real-world images. We investigated the impact of several optimizations on model’s accuracy, including employing dynamic RNNs (Recurrent Neural Networks), scheduled sampling, BiLSTM (Bidirectional Long Short-Term Memory) and a modified attention model. BiLSTM was found to slightly increase the accuracy, while dynamic RNNs and a simpler attention model provided a significant training time reduction with only a slight decline in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Badue, C., et al.: Self-driving cars: A survey. arXiv preprint arXiv:1901.04407 (2019)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473 (2014)
Google Scholar
Bartz, C., Yang, H., Meinel, C.: STN-OCR: A single neural network for text detection and text recognition. CoRR, abs/1707.08831 (2017)
Google Scholar
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. CoRR, abs/1506.03099 (2015)
Google Scholar
Brzeski, A., Grinholc, K., Nowodworski, K., Przybyłek, A.: Residual mobilenets. In: Workshop on Modern Approaches in Data Engineering and Information System Design at ADBIS 2019 (2019)
Google Scholar
Goyal, K., Dyer, C., Berg-Kirkpatrick, T.: Differentiable scheduled sampling for credit assignment. CoRR, abs/1704.06970 (2017)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. CoRR, abs/1604.06646 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR, abs/1512.03385 (2015)
Google Scholar
Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huszár, F.: How (not) to train your generative model: Scheduled sampling, likelihood, adversary? arXiv e-prints, November 2015
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R., (eds) Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025. Curran Associates Inc (2015)
Google Scholar
Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. CoRR, abs/1707.03985 (2017)
Google Scholar
Liukkonen, M., Tsai, T.-N.: Toward decentralized intelligence in manufacturing: recent trends in automatic identification of things. Int. J. Adv. Manufact. Technol. 87(9–12), 2509–2531 (2016)
Article Google Scholar
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR, abs/1508.04025 (2015)
Google Scholar
Mathews, A.P., Xie, L., He, X.: Semstyle: Learning to generate stylised image captions using unaligned text. CoRR, abs/1805.07030 (2018)
Google Scholar
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_14
Chapter Google Scholar
Przybyłek, K., Shkroba, I.: Crowd counting á la bourdieu. In: Workshop on Modern Approaches in Data Engineering and Information System Design at ADBIS 2019 (2019)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497 (2015)
Google Scholar
Shi, B., Wang, X., Lv, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. CoRR, abs/1603.03915 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Smith, R., et al.: End-to-end interpretation of the French street name signs dataset. CoRR, abs/1702.03970 (2017)
Google Scholar
Sønderby, S.K., Sønderby, C.K., Maaløe, L., Winther, O.: Recurrent spatial transformer networks. CoRR, abs/1509.05329 (2015)
Google Scholar
Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66
Chapter Google Scholar
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: Platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626 (2018)
Wang, X., Takaki, S., Yamagishi, J.: An RNN-based quantized f0 model with multi-tier feedback links for text-to-speech synthesis. In: INTERSPEECH (2017)
Google Scholar
Wang, Y., Gao, Z., Long, M., Wang, J., Yu, P.S.: PredRNN++: Towards A resolution of the deep-in-time dilemma in spatiotemporal predictive learning. CoRR, abs/1804.06300 (2018)
Google Scholar
Wojna, Z., et al.: Attention-based extraction of structured information from street view imagery. CoRR, abs/1704.03549 (2017)
Google Scholar
Yi, C., Tian, Y.: Assistive text reading from natural scene for blind persons. In: Hua, G., Hua, X.-S. (eds.) Mobile Cloud Visual Media Computing, pp. 219–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24702-1_9
Chapter Google Scholar

Download references

Acknowledgments

This work has been partially supported by Statutory Funds of Electronics, Telecommunications and Informatics Faculty, Gdansk University of Technology.

Author information

Authors and Affiliations

Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
Adam Brzeski, Kamil Grinholc, Kamil Nowodworski & Adam Przybyłek
CTA.ai, Gdańsk, Poland
Adam Brzeski
Spartez, Gdańsk, Poland
Kamil Grinholc
IHS Markit, Gdańsk, Poland
Kamil Nowodworski

Authors

Adam Brzeski
View author publications
You can also search for this author in PubMed Google Scholar
Kamil Grinholc
View author publications
You can also search for this author in PubMed Google Scholar
Kamil Nowodworski
View author publications
You can also search for this author in PubMed Google Scholar
Adam Przybyłek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adam Brzeski .

Editor information

Editors and Affiliations

Bialystok University of Technology, Bialystok, Poland
Khalid Saeed
University of Calcutta, Calcutta, India
Rituparna Chaki
Mihajlo Pupin Institute, Belgrade, Serbia
Valentina Janev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brzeski, A., Grinholc, K., Nowodworski, K., Przybyłek, A. (2019). Evaluating Performance and Accuracy Improvements for Attention-OCR. In: Saeed, K., Chaki, R., Janev, V. (eds) Computer Information Systems and Industrial Management. CISIM 2019. Lecture Notes in Computer Science(), vol 11703. Springer, Cham. https://doi.org/10.1007/978-3-030-28957-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-28957-7_1
Published: 11 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28956-0
Online ISBN: 978-3-030-28957-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics