Introducing Noise in Decentralized Training of Neural Networks

  • Linara AdilovaEmail author
  • Nathalie Paul
  • Peter Schlicht
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 967)


It has been shown that injecting noise into the neural network weights during the training process leads to a better generalization of the resulting model. Noise injection in the distributed setup is a straightforward technique and it represents a promising approach to improve the locally trained models. We investigate the effects of noise injection into the neural networks during a decentralized training process. We show both theoretically and empirically that noise injection has no positive effect in expectation on linear models, though. However for non-linear neural networks we empirically show that noise injection substantially improves model quality helping to reach a generalization ability of a local model close to the serial baseline.


  1. 1.
    An, G.: The effects of adding noise during backpropagation training on a generalization performance. Neural Comput. 8(3), 643–674 (1996)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308 (2014)CrossRefGoogle Scholar
  3. 3.
    Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23(4), 769–805 (1998)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bishop, C.M.: Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7(1), 108–116 (1995)CrossRefGoogle Scholar
  5. 5.
    Chen, B., Deng, W., Du, J.: Noisy softmax: improving the generalization ability of DCNN via postponing the early softmax saturation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  6. 6.
    Chung, K.L., et al.: On a stochastic approximation method. Ann. Math. Stat. 25(3), 463–483 (1954)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Edwards, P.J., Murray, A.F.: Fault-tolerance via weight-noise in analogue VLSI implementations—a case study with Epsilon. IEEE Proc. Circ. Syst. II: Analog Digit. Sig. Process. 45(9), 1255–1262 (1998)Google Scholar
  8. 8.
    Gan, Z., Li, C., Chen, C., Pu, Y., Su, Q., Carin, L.: Scalable Bayesian learning of recurrent neural networks for language modeling. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 321–331 (2017)Google Scholar
  9. 9.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  10. 10.
    Jiang, Z., Balu, A., Hegde, C., Sarkar, S.: Collaborative deep learning in fixed topology networks. In: Advances in Neural Information Processing Systems, pp. 5904–5914 (2017)Google Scholar
  11. 11.
    Kamp, M., Boley, M., Keren, D., Schuster, A., Sharfman, I.: Communication-efficient distributed online prediction by dynamic model synchronization. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 623–639. Springer, Heidelberg (2014). Scholar
  12. 12.
    Kamp, M., Boley, M., Missura, O., Gärtner, T.: Effective parallelisation for machine learning. In: Advances in Neural Information Processing Systems, pp. 6477–6488 (2017)Google Scholar
  13. 13.
    Kamp, M., Bothe, S., Boley, M., Mock, M.: Communication-efficient distributed online learning with kernels. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 805–819. Springer, Cham (2016). Scholar
  14. 14.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Kolen, J.F., Pollack, J.B.: Back propagation is sensitive to initial conditions. In: Advances in Neural Information Processing Systems, pp. 860–867 (1991)Google Scholar
  16. 16.
    LeCun, Y.: The MNIST database of handwritten digits (1998).
  17. 17.
    McMahan, H.B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 (2016)
  18. 18.
    Murray, A.F., Edwards, P.J.: Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Trans. Neural Netw. 5(5), 792–802 (1994)CrossRefGoogle Scholar
  19. 19.
    Neelakantan, A., et al.: Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 (2015)
  20. 20.
    Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A.S.: Federated multi-task learning. In: Advances in Neural Information Processing Systems, pp. 4424–4434 (2017)Google Scholar
  21. 21.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Wang, C., Principe, J.C.: Training neural networks with additive noise in the desired signal. IEEE Trans. Neural Netw. 10(6), 1511–1517 (1999)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Linara Adilova
    • 1
    • 2
    Email author
  • Nathalie Paul
    • 1
  • Peter Schlicht
    • 3
  1. 1.Fraunhofer IAISSankt AugustinGermany
  2. 2.Fraunhofer Center for Machine LearningSankt AugustinGermany
  3. 3.Volkswagen Group ResearchWolfsburgGermany

Personalised recommendations