Skip to main content

Comparison Between U-Net and U-ReNet Models in OCR Tasks

  • Conference paper
  • First Online:
  • 2652 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11729))

Abstract

The goal of this paper is to explore the benefits of using RNNs instead of using CNNs for image transformation tasks. We are interested in two models for image transformation: U-Net (based on CNNs) and U-ReNet (partially based on CNNs and RNNs). In this work, we propose a novel U-ReNet which is almost entirely RNN based. We compare U-Net, U-ReNet (partially RNN), and our U-ReNet (almost entirely RNN based) in two datasets based on MNIST. The task is to transform text lines of overlapping digits to text lines of separated digits. Our model reaches the best performance in one dataset and comparable results in the other dataset. Additionally, the proposed U-ReNet with RNN upsampling has fewer parameters than U-Net and is more robust to translation transformation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). https://doi.org/10.1109/tpami.2016.2644615

    Article  Google Scholar 

  2. Cereda, S.: A comparison of different neural networks for agricultural image segmentation (2017)

    Google Scholar 

  3. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014 (2014)

    Google Scholar 

  4. Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Amari, S., Arbib, M.A. (eds.) Competition and Cooperation in Neural Nets. LNCS, vol. 45, pp. 267–285. Springer, Heidelberg (1982). https://doi.org/10.1007/978-3-642-46466-9_18

    Chapter  Google Scholar 

  5. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006). https://doi.org/10.1145/1143844.1143891

  6. Grzywalski, T., Drgas, S.: Using recurrences in time and frequency within U-net architecture for speech enhancement, pp. 6970–6974, May 2019. https://doi.org/10.1109/ICASSP.2019.8682830

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  9. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2015)

    Google Scholar 

  10. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541

    Article  Google Scholar 

  11. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/

  12. Liu, N., Han, J., Yang, M.H.: Picanet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3089–3098 (2018). https://doi.org/10.1109/CVPR.2018.00326

  13. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965

  14. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation, pp. 1412–1421, September 2015

    Google Scholar 

  15. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    Google Scholar 

  16. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  17. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)

    Google Scholar 

  19. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

  20. Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., Bengio, Y.: Renet: A recurrent neural network based alternative to convolutional networks. arXiv preprint arXiv:1505.00393 (2015)

  21. Yan, Z., Zhang, H., Jia, Y., Breuel, T., Yu, Y.: Combining the best of convolutional layers and recurrent layers: A hybrid network for semantic segmentation. arXiv preprint arXiv:1603.04871 (2016)

Download references

Acknowledgement

This work was supported by the BMBF project DeFuseNN (Grant 01IW17002) and the NVIDIA AI Lab (NVAIL) program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Federico Raue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moser, B.B., Raue, F., Hees, J., Dengel, A. (2019). Comparison Between U-Net and U-ReNet Models in OCR Tasks. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30508-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30507-9

  • Online ISBN: 978-3-030-30508-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics