Skip to main content
Log in

Training Deep Nets with Progressive Batch Normalization on Multi-GPUs

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Batch normalization (BN) enables us to train various deep neural networks faster. However, the training accuracy will be significantly influenced with the decrease of input mini-batch size. To increase the model accuracy, a global mean and variance among all the input batch can be used, nevertheless communication across all devices is required in each BN layer, which reduces the training speed greatly. To address this problem, we propose progressive batch normalization, which can achieve a good balance between model accuracy and efficiency in multiple-GPU training. Experimental results show that our algorithm can obtain significant performance improvement over traditional BN without data synchronization across GPUs, achieving up to 18.4% improvement on training DeepLab for semantic segmentation task across 8 GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016). arXiv:1607.06450

  2. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems (2015). arXiv preprint arXiv:1512.01274

  3. Cooijmans, T., Ballas, N., Laurent, C., Courville, A.C.: Recurrent batch normalization. CoRR (2016). arXiv:1603.09025

  4. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  5. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

  6. Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952). https://doi.org/10.1214/aoms/1177729392

    Article  MathSciNet  MATH  Google Scholar 

  7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  8. Li, Y., Wang, N., Shi, J., Liu, J., Hou, X.: Revisiting batch normalization for practical domain adaptation. CoRR (2016). arXiv:1603.04779

  9. Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 901–909. Curran Associates Inc, Montreal (2016)

    Google Scholar 

  10. Shrivastava, D., Chaudhury, S., Jayadeva, D.: A data and model-parallel, distributed and scalable framework for training of deep networks in apache spark. ArXiv e-prints (2017)

  11. Smith, S.L., Kindermans, P., Le, Q.V.: Don’t decay the learning rate, increase the batch size. CoRR (2017). arXiv:1711.00489

  12. Wu, S., Li, G., Deng, L., Liu, L., Xie, Y., Shi, L.: L1-norm batch normalization for efficient training of deep neural networks. CoRR (2018). arXiv:1802.09769

  13. Wu, Y., He, K.: Group normalization. CoRR (2018). arXiv:1803.08494

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments. We acknowledge the support from the Tusimple HPC group and Tsinghua University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lianke Qin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, L., Gong, Y., Tang, T. et al. Training Deep Nets with Progressive Batch Normalization on Multi-GPUs. Int J Parallel Prog 47, 373–387 (2019). https://doi.org/10.1007/s10766-018-0615-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-018-0615-5

Keywords

Navigation