Comparison Between U-Net and U-ReNet Models in OCR Tasks

Moser, Brian B.; Raue, Federico; Hees, Jörn; Dengel, Andreas

doi:10.1007/978-3-030-30508-6_11

Comparison Between U-Net and U-ReNet Models in OCR Tasks

Brian B. Moser^12,13,
Federico Raue¹²,
Jörn Hees¹² &
…
Andreas Dengel^12,13

Conference paper
First Online: 09 September 2019

2652 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11729))

Abstract

The goal of this paper is to explore the benefits of using RNNs instead of using CNNs for image transformation tasks. We are interested in two models for image transformation: U-Net (based on CNNs) and U-ReNet (partially based on CNNs and RNNs). In this work, we propose a novel U-ReNet which is almost entirely RNN based. We compare U-Net, U-ReNet (partially RNN), and our U-ReNet (almost entirely RNN based) in two datasets based on MNIST. The task is to transform text lines of overlapping digits to text lines of separated digits. Our model reaches the best performance in one dataset and comparable results in the other dataset. Additionally, the proposed U-ReNet with RNN upsampling has fewer parameters than U-Net and is more robust to translation transformation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). https://doi.org/10.1109/tpami.2016.2644615
Article Google Scholar
Cereda, S.: A comparison of different neural networks for agricultural image segmentation (2017)
Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014 (2014)
Google Scholar
Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Amari, S., Arbib, M.A. (eds.) Competition and Cooperation in Neural Nets. LNCS, vol. 45, pp. 267–285. Springer, Heidelberg (1982). https://doi.org/10.1007/978-3-642-46466-9_18
Chapter Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006). https://doi.org/10.1145/1143844.1143891
Grzywalski, T., Drgas, S.: Using recurrences in time and frequency within U-net architecture for speech enhancement, pp. 6970–6974, May 2019. https://doi.org/10.1109/ICASSP.2019.8682830
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2015)
Google Scholar
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Liu, N., Han, J., Yang, M.H.: Picanet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3089–3098 (2018). https://doi.org/10.1109/CVPR.2018.00326
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation, pp. 1412–1421, September 2015
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., Bengio, Y.: Renet: A recurrent neural network based alternative to convolutional networks. arXiv preprint arXiv:1505.00393 (2015)
Yan, Z., Zhang, H., Jia, Y., Breuel, T., Yu, Y.: Combining the best of convolutional layers and recurrent layers: A hybrid network for semantic segmentation. arXiv preprint arXiv:1603.04871 (2016)

Download references

Acknowledgement

This work was supported by the BMBF project DeFuseNN (Grant 01IW17002) and the NVIDIA AI Lab (NVAIL) program.

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Brian B. Moser, Federico Raue, Jörn Hees & Andreas Dengel
TU Kaiserslautern, Kaiserslautern, Germany
Brian B. Moser & Andreas Dengel

Authors

Brian B. Moser
View author publications
You can also search for this author in PubMed Google Scholar
Federico Raue
View author publications
You can also search for this author in PubMed Google Scholar
Jörn Hees
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Dengel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Federico Raue .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Praha 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moser, B.B., Raue, F., Hees, J., Dengel, A. (2019). Comparison Between U-Net and U-ReNet Models in OCR Tasks. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-30508-6_11
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30507-9
Online ISBN: 978-3-030-30508-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics