Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Deep Neural Network Structures Solving Variational Inequalities

  • 5 Accesses

Abstract

Motivated by structures that appear in deep neural networks, we investigate nonlinear composite models alternating proximity and affine operators defined on different spaces. We first show that a wide range of activation operators used in neural networks are actually proximity operators. We then establish conditions for the averagedness of the proposed composite constructs and investigate their asymptotic properties. It is shown that the limit of the resulting process solves a variational inequality which, in general, does not derive from a minimization problem. The analysis relies on tools from monotone operator theory and sheds some light on a class of neural networks structures with so far elusive asymptotic properties.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Aragón Artacho, F.J., Campoy, R.: A new projection method for finding the closest point in the intersection of convex sets. Comput. Optim. Appl. 69, 99–132 (2018)

  2. 2.

    Attouch, H., Peypouquet, J., Redont, P.: Backward-forward algorithms for structured monotone inclusions in Hilbert spaces. J. Math. Anal. Appl. 457, 1095–1117 (2018)

  3. 3.

    Baillon, J.-B., Bruck, R.E., Reich, S.: On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces. Houston J. Math. 4, 1–9 (1978)

  4. 4.

    Baillon, J.-B., Combettes, P.L., Cominetti, R.: There is no variational characterization of the cycles in the method of periodic projections. J. Funct. Anal. 262, 400–408 (2012)

  5. 5.

    Bargetz, C., Reich, S., Zalas, R.: Convergence properties of dynamic string-averaging projection methods in the presence of perturbations. Numer. Algor. 77, 185–209 (2018)

  6. 6.

    Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930–941 (1993)

  7. 7.

    Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38, 367–426 (1996)

  8. 8.

    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)

  9. 9.

    Bauschke, H.H., Noll, D., Phan, H.M.: Linear and strong convergence of algorithms involving averaged nonexpansive operators. J. Math. Anal. Appl. 421, 1–20 (2015)

  10. 10.

    Bilski, J.: The backpropagation learning with logarithmic transfer function. In: Proc. 5th Conf. Neural Netw. Soft Comput., pp. 71–76 (2000)

  11. 11.

    Borwein, J.M., Li, G., Tam, M.K.: Convergence rate analysis for averaged fixed point iterations in common fixed point problems. SIAM J. Optim. 27, 1–33 (2017)

  12. 12.

    Borwein, J., Reich, S., Shafrir, I.: Krasnoselski-Mann iterations in normed spaces. Canad. Math. Bull. 35, 21–28 (1992)

  13. 13.

    Boţ, R.I., Csetnek, E.R.: A dynamical system associated with the fixed points set of a nonexpansive operator. J. Dynam. Diff. Equ. 29, 155–168 (2017)

  14. 14.

    Bravo, M., Cominetti, R.: Sharp convergence rates for averaged nonexpansive maps. Israel J. Math. 227, 163–188 (2018)

  15. 15.

    Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Neurocomputing, NATO ASI Series, Series F, vol. 68, pp 227–236. Springer, Berlin (1990)

  16. 16.

    Carlile, B., Delamarter, G., Kinney, P., Marti, A., Whitney, B.: Improving deep learning by inverse square root linear units (ISRLUs). https://arxiv.org/abs/1710.09967 (2017)

  17. 17.

    Cegielski, A.: Iterative Methods for Fixed Point Problems in Hilbert Spaces. Lecture Notes in Mathematics, vol. 2057. Springer, Heidelberg (2012)

  18. 18.

    Censor, Y., Mansour, R.: New Douglas–Rachford algorithmic structures and their convergence analyses. SIAM J. Optim. 26, 474–487 (2016)

  19. 19.

    Combettes, P.L.: Construction d’un point fixe commun à une famille de contractions fermes. C. R. Acad. Sci. Paris Sér. I Math., 320, 1385–1390 (1995)

  20. 20.

    Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53, 475–504 (2004)

  21. 21.

    Combettes, P.L.: Monotone operator theory in convex optimization. Math. Programming B170, 177–206 (2018)

  22. 22.

    Combettes, P.L., Pesquet, J.-C.: Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18, 1351–1376 (2007)

  23. 23.

    Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

  24. 24.

    Combettes, P.L., Yamada, I.: Compositions and convex combinations of averaged nonexpansive operators. J. Math. Anal. Appl. 425, 55–70 (2015)

  25. 25.

    Condat, L.: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158, 460–479 (2013)

  26. 26.

    Cybenko, G.: Approximation by superposition of sigmoidal functions. Math. Control Signals Syst. 2, 303–314 (1989)

  27. 27.

    Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

  28. 28.

    Elliot, D.L.: A better activation function for artificial neural networks, Institute for Systems Research, University of Maryland, Tech. Rep., pp. 93–8 (1993)

  29. 29.

    Funahashi, K.-I.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2, 183–192 (1989)

  30. 30.

    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proc. 14th Int. Conf. Artificial Intell. Stat., pp. 315–323 (2011)

  31. 31.

    Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Pearson Education, Singapore (1998)

  32. 32.

    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proc. Int. Conf. Comput. Vision, pp. 1026–1034 (2015)

  33. 33.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vision Pattern Recogn., pp. 770–778 (2016)

  34. 34.

    LeCun, Y.A., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

  35. 35.

    LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backprop. Lect. Notes Comput. Sci. 1524, 9–50 (1998)

  36. 36.

    Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. Cas de l’application prox. C. R. Acad. Sci. Paris A274, 163–165 (1972)

  37. 37.

    McCulloch, W.S., Pitts, W.H.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943)

  38. 38.

    Moursi, W.M.: The forward-backward algorithm and the normal problem. J. Optim. Theory Appl. 176, 605–624 (2018)

  39. 39.

    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proc. 27st Int. Conf. Machine Learn., pp. 807–814 (2010)

  40. 40.

    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

  41. 41.

    Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)

  42. 42.

    Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Rev. 65, 386–408 (1958)

  43. 43.

    Ryu, E.K., Hannah, R., Yin, W.: Scaled relative graph: Nonexpansive operators via 2D Euclidean geometry. https://arxiv.org/abs/1902.09788

  44. 44.

    Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. Proc. Neural Inform. Process. Syst. Conf. 28, 2377–2385 (2015)

  45. 45.

    Tariyal, S., Majumdar, A., Singh, R., Vatsa, M.: Deep dictionary learning. IEEE Access 4, 10096–10109 (2016)

  46. 46.

    Tseng, P.: On the convergence of products of firmly nonexpansive mappings. SIAM J. Optim. 2, 425–434 (1992)

  47. 47.

    Yamagishi, M., Yamada, I.: Nonexpansiveness of a linearized augmented Lagrangian operator for hierarchical convex optimization. Inverse Problems, vol. 33, art. 044003, 35 pp. (2017)

  48. 48.

    Zhang, X.-P.: Thresholding neural network for adaptive noise reduction. IEEE Trans. Neural Netw. 12, 567–584 (2001)

Download references

Author information

Correspondence to Patrick L. Combettes.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work of P. L. Combettes was supported by the National Science Foundation under grant CCF-1715671. The work of J.-C. Pesquet was supported by Institut Universitaire de France.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Combettes, P.L., Pesquet, J. Deep Neural Network Structures Solving Variational Inequalities. Set-Valued Var. Anal (2020). https://doi.org/10.1007/s11228-019-00526-z

Download citation

Keywords

  • Averaged operator
  • Deep neural network
  • Monotone operator
  • Nonexpansive operator
  • Proximity operator
  • Variational inequality