Skip to main content

Some Families of FSP Functions and Their Properties

  • Chapter
  • First Online:
  • 636 Accesses

Part of the book series: Communications and Control Engineering ((CCE))

Abstract

We report properties of fixed-structure parametrized (FSP) functions that give insights into the effectiveness of the “Extended Ritz Method” (ERIM) as a methodology for the approximate solution of infinite-dimensional optimization problems. First, we present the structure of some widespread FSP functions, including linear combinations of fixed-basis functions, one-hidden-layer (OHL) and multiple-hidden-layer (MHL) networks, and kernel smoothing models. Second, focusing on the case of OHL neural networks based on ridge and radial constructions, we report their density properties under different metrics. Third, we present rates of function approximation via ridge OHL neural networks, by reporting a fundamental theorem by Maurey, Jones, and Barron, together with its extensions, based on a norm tailored to approximation by computational units from a given set of functions. We also discuss approximation properties valid for MHL networks. Fourth, we compare the classical Ritz method and the ERIM from the point of view of the curse of dimensionality, proving advantages of the latter for a specific class of problems, where the functional to be optimized is quadratic. Finally, we provide rates of approximate optimization by the ERIM, based on the concepts of modulus of continuity and modulus of convexity of the functional to be optimized.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Sometimes, in (3.3) and (3.4), the bias is omitted, see Remark 3.2.

  2. 2.

    The rationale for introducing such functions in [108], which is a seminal paper in computerized tomography, was the reconstruction of a multivariable function from the values of its integrals along certain planes or lines. If such planes or lines are parallel to each other, then each of the abovementioned integrals can be considered as a ridge function with an appropriate direction.

  3. 3.

    Projection pursuit algorithms investigate approximations of a d-variable function by functions having the form \(\sum _{i=1}^n\,g_i({\varvec{x}}^{\top } {\varvec{\alpha }_i})\), where \({\varvec{\alpha }_i}\in {{\mathbb {R}}}^d\) and \(g_i:{\mathbb {R}} \rightarrow {\mathbb {R}}\) have to be suitably chosen (see e.g., [34, 39]).

  4. 4.

    We have reported here the most widespread definition of sigmoidal function (see, for instance, [6, 127] and the references therein). However, in the literature there is a certain lack of consistency in the terminology. For example, some authors also require continuity and/or monotonicity (or even strict monotonicity) of h on \({\mathbb {R}}\). Others consider finite values for the limits to \(+\infty \) and \(-\infty \) (e.g., 1 and \(-1\), respectively). As the algorithms that we shall present in the second part of the book to optimize the parameters of OHL networks are gradient-based, whenever the derivatives of the basis functions are involved we shall implicitly suppose that the sigmoidal functions are continuously differentiable.

  5. 5.

    We resort to the simpler notation introduced at the end of Sect. 2.6.

  6. 6.

    Note that such a mutual dependence may arise only when the set of vector-valued functions to be approximated is not the Cartesian product of sets of scalar-valued functions.

  7. 7.

    Recall that completeness means that every Cauchy sequence in \({{\mathscr {H}}}\) converges to an element of \({{\mathscr {H}}}\). A Cauchy sequence in \({{\mathscr {H}}}\) is any sequence \(\{{\varvec{\gamma }}_i \in {{\mathscr {H}}}\}_{i=1}^{\infty }\) characterized by the property that, for every \(\varepsilon > 0\), there exists an index \(\bar{i}\) such that for every \(i, j \ge \bar{i}\) one has \(\Vert {\varvec{\gamma }}_i - {\varvec{\gamma }}_j\Vert _{{\mathscr {H}}} \le {\varepsilon }\).

  8. 8.

    For example, when \(c_{d,s} \rightarrow c_{\infty ,s}>0\) for \(d \rightarrow \infty \) and the order s of the Sobolev space does not depend on d, for any positive integer n, the lower bound \(c_{d,s} \,n^{-s/(d-1)}\) tends to \(c_{\infty ,s}\) as d tends to \(\infty \). Hence, for any desired approximation error (strictly) smaller than \(c_{\infty ,s}\), in this case any fixed n is not able to guarantee such accuracy when d is sufficiently large.

  9. 9.

    Such an extension basically replaces the gradient in the finite-dimensional case with the Fréchet derivative.

References

  1. Adams RA (1975) Sobolev spaces. Academic Press

    Google Scholar 

  2. Adams RA, Fournier JJF (2003) Sobolev spaces. Academic Press

    Google Scholar 

  3. Alt W (1984) On the approximation of infinite optimization problems with an application to optimal control problems. Appl Math Optim 12:15–27

    Article  MathSciNet  MATH  Google Scholar 

  4. Ba LJ, Caruana R (2014) Do deep networks really need to be deep? In: Ghahrani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27, pp 1–9

    Google Scholar 

  5. Barron AR (1992) Neural net approximation. In: Narendra KS (ed) Proceedings of the 7th Yale workshop on adaptive and learning systems. Yale University Press, pp 69–72

    Google Scholar 

  6. Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39:930–945

    Article  MathSciNet  MATH  Google Scholar 

  7. Barron AR, Klusowski JM (2018) Approximation and estimation for high-dimensional deep learning networks. Technical report arXiv:1809.03090v2

  8. Beard RW, McLain TW (1998) Successive Galerkin approximation algorithms for nonlinear optimal and robust control. Int J Control 71:717–743

    Article  MathSciNet  MATH  Google Scholar 

  9. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–127

    Article  MATH  Google Scholar 

  10. Bengio Y, Delalleau O, Le Roux N (2005) The curse of dimensionality for local kernel machines. Technical Report 1258, Département d’Informatique et Recherche Opérationnelle, Université de Montréal

    Google Scholar 

  11. Bengio Y, Delalleau O, Le Roux N (2006) The curse of highly variable functions for local kernel machines. In: Advances in neural information processing systems, vol 18. MIT Press, pp 107–114

    Google Scholar 

  12. Bengio Y, LeCun Y (2007) Scaling learning algorithms towards AI. In: Bottou L, Chapelle O, DeCoste D, Weston J (eds) Large-scale kernel machines. MIT Press

    Google Scholar 

  13. Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25:1553–1565

    Article  Google Scholar 

  14. Blum EK, Li LK (1991) Approximation theory and feedforward networks. Neural Netw 4:511–515

    Article  Google Scholar 

  15. Bosarge WE Jr, Johnson OG, McKnight RS, Timlake WP (1973) The Ritz-Galerkin procedure for nonlinear control problems. SIAM J Numer Anal 10:94–111

    Article  MathSciNet  MATH  Google Scholar 

  16. Breiman L (1993) Hinging hyperplanes for regression, classification, and function approximation. IEEE Trans Inf Theory 39:993–1013

    Article  MathSciNet  MATH  Google Scholar 

  17. Brezis H (2011) Functional analysis. Sobolev spaces and partial differential equations. Springer

    Google Scholar 

  18. Carroll SM, Dickinson BW (1989) Construction of neural nets using the Radon transform. In: Proceedings of the international joint conference on neural networks, pp 607–611

    Google Scholar 

  19. Cervellera C, Macciò D (2013) Learning with kernel smoothing models and low-discrepancy sampling. IEEE Trans Neural Netw Learn Syst 24:504–509

    Article  Google Scholar 

  20. Cervellera C, Macciò D (2014) Local linear regression for function learning: an analysis based on sample discrepancy. IEEE Trans Neural Netw Learn Syst 25:2086–2098

    Article  Google Scholar 

  21. Chen T, Chen H (1995) Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and application to dynamical systems. IEEE Trans Neural Netw 6:911–917

    Article  Google Scholar 

  22. Chen T, Chen H, Liu R (1995) Approximation capability in \(\, C({\bar{\mathbb{R}}}^n) \,\) by multilayer feedforward networks and related problems. IEEE Trans Neural Netw 6:25–30

    Article  Google Scholar 

  23. Chui CK, Mhaskar HN (2018) Deep nets for local manifold learning. Front Appl Math Stat 4, Article 12

    Google Scholar 

  24. Courant R (1948) Differential and integral calculus, vol II. Interscience Publishers, Inc

    Google Scholar 

  25. Courant R, Hilbert D (1962) Methods of mathematical physics, vol II. Interscience Publishers, Inc

    Google Scholar 

  26. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2:303–314

    Article  MathSciNet  MATH  Google Scholar 

  27. Dacorogna B (2008) Direct methods in the calculus of variations, 2nd edn. Springer

    Google Scholar 

  28. Daniel JW (1971) The approximate minimization of functionals. Prentice Hall

    Google Scholar 

  29. Daniel JW (1973) The Ritz-Galerkin method for abstract optimal control problems. SIAM J Control 11:53–63

    Article  MathSciNet  MATH  Google Scholar 

  30. Darken C, Donahue M, Gurvits L, Sontag E (1993) Rate of approximation results motivated by robust neural network learning. In: Proceedings of the sixth annual ACM conference on computational learning theory. ACM, pp 303–309

    Google Scholar 

  31. de Villiers J, Barnard E (1992) Backpropagation neural nets with one and two hidden layers. IEEE Trans Neural Netw 3:136–141

    Google Scholar 

  32. DeVore RA, Howard R, Micchelli C (1989) Optimal nonlinear approximation. Manuscr Math 63:469–478

    Article  Google Scholar 

  33. Donahue M, Gurvits L, Darken C, Sontag E (1997) Rates of convex approximation in non-Hilbert spaces. Constr Approx 13:187–220

    Article  MathSciNet  MATH  Google Scholar 

  34. Donoho DL, Johnstone IM (1989) Projection-based approximation and a duality method with kernel methods. Ann Stat 17:58–106

    Article  MathSciNet  MATH  Google Scholar 

  35. Dontchev AL (1996) An a priori estimate for discrete approximations in nonlinear optimal control. SIAM J Control Optim 34:1315–1328

    Article  MathSciNet  MATH  Google Scholar 

  36. Dontchev AL, Zolezzi T (1993) Well-posed optimization problems. Lecture notes in mathematics, vol 1543. Springer

    Google Scholar 

  37. Ekeland I, Temam R (1976) Convex analysis and variational problems. North-Holland Publishing Company and American Elsevier

    Google Scholar 

  38. Felgenhauer U (1999) On Ritz type discretizations for optimal control problems. In: Proceedings of the 18th IFIP-ICZ conference. Chapman-Hall, pp 91–99

    Google Scholar 

  39. Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823

    Article  MathSciNet  Google Scholar 

  40. Funahashi K (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2:183–192

    Article  Google Scholar 

  41. Girosi F (1994) Regularization theory, Radial Basis Functions and networks. In: Cherkassky V, Friedman JH, Wechsler H (eds) From statistics to neural networks. Theory and pattern recognition applications, Computer and systems sciences, Subseries F. Springer

    MATH  Google Scholar 

  42. Girosi F (1995) Approximating error bounds that use VC bounds. In: Proceedings of the international conference on artificial neural networks, pp 295–302

    Google Scholar 

  43. Girosi F, Anzellotti G (1992) Rates of convergence of approximation by translates. Technical Report 1288, Artificial Intelligence Laboratory, Massachusetts Institute of Technology

    Google Scholar 

  44. Girosi F, Anzellotti G (1993) Rates of convergence for Radial Basis Functions and neural networks. In: Mammone RJ (ed) Artificial neural networks for speech and vision. Chapman & Hall, pp 97–113

    Google Scholar 

  45. Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7:219–269

    Article  Google Scholar 

  46. Giulini S, Sanguineti M (2009) Approximation schemes for functional optimization problems. J Optim Theory Appl 140:33–54

    Article  MathSciNet  MATH  Google Scholar 

  47. Gnecco G (2012) A comparison between fixed-basis and variable-basis schemes for function approximation and functional optimization. J Appl Math 2012:1–17

    Article  MathSciNet  MATH  Google Scholar 

  48. Gnecco G (2016) On the curse of dimensionality in the Ritz method. J Optim Theory Appl 168:488–509

    Article  MathSciNet  MATH  Google Scholar 

  49. Gnecco G, Gori M, Melacci S, Sanguineti M (2014) A theoretical framework for supervised learning from regions. Neurocomputing 129:25–32

    Article  Google Scholar 

  50. Gnecco G, Gori M, Melacci S, Sanguineti M (2015) Foundations of support constraint machines. Neural Comput 27:388–480

    Article  MathSciNet  MATH  Google Scholar 

  51. Gnecco G, Gori M, Melacci S, Sanguineti M (2015) Learning with mixed hard/soft pointwise constraints. IEEE Trans Neural Netw Learn Syst 26:2019–2032

    Article  MathSciNet  MATH  Google Scholar 

  52. Gnecco G, Gori M, Sanguineti M (2012) Learning with boundary conditions. Neural Comput 25:1029–1106

    Article  MathSciNet  MATH  Google Scholar 

  53. Gnecco G, Kůrková V, Sanguineti M (2011) Can dictionary-based computational models outperform the best linear ones? Neural Netw 24:881–887

    Article  MATH  Google Scholar 

  54. Gnecco G, Kůrková V, Sanguineti M (2011) Some comparisons of complexity in dictionary-based and linear computational models. Neural Netw 24:172–182

    MATH  Google Scholar 

  55. Gnecco G, Sanguineti M (2008) Estimates of the approximation error using Rademacher complexity: learning vector-valued functions. J Inequalities Appl 2008:1–16

    MathSciNet  MATH  Google Scholar 

  56. Gnecco G, Sanguineti M (2010) Estimates of variation with respect to a set and applications to optimization problems. J Optim Theory Appl 145:53–75

    Article  MathSciNet  MATH  Google Scholar 

  57. Gnecco G, Sanguineti M (2010) Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J Optim Theory Appl 146:764–794

    Article  MathSciNet  MATH  Google Scholar 

  58. Gnecco G, Sanguineti M (2011) On a variational norm tailored to variable-basis approximation schemes. IEEE Trans Inf Theory 57:549–558

    Article  MathSciNet  MATH  Google Scholar 

  59. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press

    Google Scholar 

  60. Gurvits L, Koiran P (1997) Approximation and learning of convex superpositions. J Comput Syst Sci 55:161–170

    Article  MathSciNet  MATH  Google Scholar 

  61. Hager WW (1975) The Ritz-Trefftz method for state and control constrained optimal control problems. SIAM J Numer Anal 12:854–867

    Article  MathSciNet  MATH  Google Scholar 

  62. Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning. Springer

    Google Scholar 

  63. Haykin S (2008) Neural networks and learning systems. Pearson Prentice-Hall

    Google Scholar 

  64. Hecht-Nielsen R (1989) Theory of the backpropagation neural network. In: Proceedings of the international joint conference on neural networks, pp 593–605

    Google Scholar 

  65. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  66. Hinton GH (2007) Learning multiple layers of representation. Trends Cogn Sci 11:428–434

    Article  Google Scholar 

  67. Hlaváčková-Schindler K, Sanguineti M (2003) Bounds on the complexity of neural-network models and comparison with linear methods. Int J Adapt Control Signal Process 17:179–194

    Article  MATH  Google Scholar 

  68. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257

    Article  Google Scholar 

  69. Hornik K (1991) Functional approximation and learning in artificial neural networks. Neural Netw World 5:257–266

    Google Scholar 

  70. Hornik K (1993) Some new results on neural network approximation. Neural Netw 6:1069–1072

    Article  Google Scholar 

  71. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366

    Article  MATH  Google Scholar 

  72. Irie B, Miyake S (1988) Capability of three-layered perceptrons. In: Proceedings of the international joint conference on neural networks, pp 641–648

    Google Scholar 

  73. Ito Y (1991) Approximation of functions on a compact set by finite sums of a sigmoid function without scaling. Neural Netw 4:817–826

    Article  Google Scholar 

  74. Jackson D (2004) Fourier series and orthogonal polynomials. Dover

    Google Scholar 

  75. John F (1955) Plane waves and spherical means applied to partial differential equations. Interscience Publishers, Inc

    Google Scholar 

  76. Jones LK (1990) Constructive approximation for neural networks by sigmoid functions. Proc IEEE 78:1586–1589

    Article  Google Scholar 

  77. Jones LK (1992) A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Ann Stat 20:608–613

    Article  MathSciNet  MATH  Google Scholar 

  78. Kainen P, Kůrková V, Sanguineti M (2003) Minimization of error functionals over variable-basis functions. SIAM J Optim 14:732–742

    Article  MathSciNet  MATH  Google Scholar 

  79. Kainen PC (1997) Utilizing geometric anomalies of high dimension: when complexity makes computation easier. In: Warwick K, Karni M (eds) Compute-intensive methods in control and signal processing. The curse of dimensionality, Birkhäuser, pp 283–294

    Chapter  Google Scholar 

  80. Kainen PC, Kůrková V (2009) An integral upper bound for neural network approximation. Neural Comput 21:2970–2989

    Article  MathSciNet  MATH  Google Scholar 

  81. Kainen PC, Kůrková V, Sanguineti M (2009) Complexity of Gaussian radial basis networks approximating smooth functions. J Complex 25:63–74

    Article  MathSciNet  MATH  Google Scholar 

  82. Kainen PC, Kůrková V, Sanguineti M (2012) Dependence of computational models on input dimension: tractability of approximation and optimization tasks. IEEE Trans Inf Theory 58:1203–1214

    Article  MathSciNet  MATH  Google Scholar 

  83. Kainen PC, Kůrková V, Vogt A (1999) Approximation by neural networks is not continuous. Neurocomputing 29:47–56

    Article  Google Scholar 

  84. Kainen PC, Kůrková V, Vogt A (2000) Geometry and topology of continuous best and near best approximations. J Approx Theory 105:252–262

    Article  MathSciNet  MATH  Google Scholar 

  85. Kainen PC, Kůrková V, Vogt A (2001) Continuity of approximation by neural networks in \({L}_p\)-spaces. Ann Oper Res 101:143–147

    Article  MathSciNet  MATH  Google Scholar 

  86. Kantorovich LV, Krylov VI (1958) Approximate methods of higher analysis. P. Noordhoff Ltd., Groningen

    Google Scholar 

  87. Kolmogorov AN (1991) On the best approximation of functions of a given class. In: Tikhomirov VM (ed) Selected works of A. N. Kolmogorov. Kluwer, pp 202–205

    Google Scholar 

  88. Kolmogorov AN, Fomin SV (1975) Introductory real analysis. Dover Publications Inc

    Google Scholar 

  89. Kůrková V (1997) Dimension-independent rates of approximation by neural networks. In: Warwick K, Kárný M (eds) Computer-intensive methods in control and signal processing. The curse of dimensionality, Birkhäuser, pp 261–270

    Chapter  Google Scholar 

  90. Kůrková V (1998) Incremental approximation by neural networks. In Warwick K, Kárný M, Kůrková V (eds) Complexity: neural network approach. Springer, pp 177–188

    Google Scholar 

  91. Kůrková V (2003) High-dimensional approximation by neural networks. In: Suykens J et al (eds) Advances in learning theory: methods, models, and applications (NATO Science Series III: Computer & Systems Sciences, vol 190) (Chap 4). IOS Press, pp 69–88

    Google Scholar 

  92. Kůrková V (2008) Minimization of error functionals over perceptron networks. Neural Comput 20:252–270

    Article  MathSciNet  MATH  Google Scholar 

  93. Kůrková V (2009) Model complexity of neural networks and integral transforms. In: Polycarpou M, Panayiotou C, Alippi C, Ellinas G (eds) Proceedings of the 2009 international conference on artificial neural networks. Lecture notes in computer science, vol 5768. Springer, pp 708–718

    Google Scholar 

  94. Kůrková V (2012) Complexity estimates based on integral transforms induced by computational units. Neural Netw 33:160–167

    Article  MATH  Google Scholar 

  95. Kůrková V, Kainen PC, Kreinovich V (1997) Estimates of the number of hidden units and variation with respect to half-spaces. Neural Netw 10:1061–1068

    Article  Google Scholar 

  96. Kůrková V, Sanguineti M (2001) Bounds on rates of variable-basis and neural-network approximation. IEEE Trans Inf Theory 47:2659–2665

    Article  MathSciNet  MATH  Google Scholar 

  97. Kůrková V, Sanguineti M (2002) Comparison of worst case errors in linear and neural network approximation. IEEE Trans Inf Theory 48:264–275

    Article  MathSciNet  MATH  Google Scholar 

  98. Kůrková V, Sanguineti M (2005) Error estimates for approximate optimization by the extended Ritz method. SIAM J Optim 15:461–487

    Article  MathSciNet  MATH  Google Scholar 

  99. Kůrková V, Sanguineti M (2007) Estimates of covering numbers of convex sets with slowly decaying orthogonal subsets. Discret Appl Math 155:1930–1942

    Article  MathSciNet  MATH  Google Scholar 

  100. Kůrková V, Sanguineti M (2008) Geometric upper bounds on rates of variable-basis approximation. IEEE Trans Inf Theory 54:5681–5688

    Article  MathSciNet  MATH  Google Scholar 

  101. Kůrková V, Sanguineti M (2016) Model complexities of shallow networks representing highly-varying functions. Neurocomputing 171:598–604

    Article  Google Scholar 

  102. Kůrková V, Sanguineti M (2017) Probabilistic lower bounds for approximation by shallow perceptron networks. Neural Netw 91:34–41

    Article  Google Scholar 

  103. Kůrková V, Sanguineti M (2019) Classification by sparse neural networks. IEEE Trans Neural Netw Learn Syst 30(9):2746–2754

    Article  MathSciNet  Google Scholar 

  104. Kůrková V, Savický P, Hlaváčková K (1998) Representations and rates of approximation of real-valued Boolean functions by neural networks. Neural Netw 11:651–659

    Article  Google Scholar 

  105. Lavretsly E (2002) On the geometric convergence of neural approximations. IEEE Trans Neural Netw 13:274–282

    Article  Google Scholar 

  106. Leshno M, Ya V, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6:861–867

    Article  Google Scholar 

  107. Levitin ES, Polyak BT (1966) Convergence of minimizing sequences in conditional extremum problems. Dokl Akad Nauk SSSR 168:764–767

    MATH  Google Scholar 

  108. Logan BF, Shepp LA (1975) Optimal reconstruction of a function from its projections. Duke Math J 42:645–659

    Article  MathSciNet  MATH  Google Scholar 

  109. Luenberger DG (1969) Optimization by vector space methods. Wiley

    Google Scholar 

  110. Maiorov V (1999) On best approximation by ridge functions. J Approx Theory 99:68–94

    Article  MathSciNet  MATH  Google Scholar 

  111. Maiorov V, Pinkus A (1999) Lower bounds for approximation by MLP neural networks. Neurocomputing 25:81–91

    Article  MATH  Google Scholar 

  112. Maiorov VE, Meir R (2000) On the near optimality of the stochastic approximation of smooth functions by neural networks. Adv Comput Math 13:79–103

    Article  MathSciNet  MATH  Google Scholar 

  113. Makovoz Y (1998) Uniform approximation by neural networks. J Approx Theory 95:215–228

    Article  MathSciNet  MATH  Google Scholar 

  114. Malanowski K, Buskens C, Maurer H (1997) Convergence of approximations to nonlinear control problems. In: Fiacco AV (ed) Mathematical programming with data perturbation. Lecture notes in pure and applied mathematics, vol 195. Marcel Dekker, pp 253–284

    Google Scholar 

  115. Mhaskar H, Liao Q, Poggio T (2016) Learning functions: when is deep better than shallow. CBMM Memo No. 045. https://arxiv.org/pdf/1603.00988v4.pdf. Accessed 31 May 2016

  116. Mhaskar H, Liao Q, Poggio T (2016) Learning real and Boolean functions: when is deep better than shallow. CBMM Memo No. 45. https://arxiv.org/pdf/1603.00988v1.pdf. Accessed 4 Mar 2016

  117. Mhaskar HN (1995) Versatile Gaussian networks. In: Proceedings of the IEEE workshop on nonlinear signal and image processing, pp 70–73

    Google Scholar 

  118. Mhaskar HN, Micchelli CA (1992) Approximation by superposition of a sigmoidal function and radial basis functions. Adv Appl Math 13:350–373

    Article  MathSciNet  MATH  Google Scholar 

  119. Mhaskar HN, Poggio T (2016) Deep vs. shallow networks: an approximation theory perspective. Anal Appl 14:829–848

    Article  MathSciNet  MATH  Google Scholar 

  120. Mikhlin SG (1980) The approximate solution of one-sided variational problems. Izvestija Vysšsih Učcebnyh ZavedeniĭMatematika 213(2):45–48

    MathSciNet  MATH  Google Scholar 

  121. Minsky M, Papert S (1969) Perceptrons. MIT Press

    Google Scholar 

  122. Mussa-Ivaldi FA (1992) From basis functions to basis fields: vector field approximation from sparse data. Biol Cybern 67:479–489

    Article  MATH  Google Scholar 

  123. Mussa-Ivaldi FA, Gandolfo F (1993) Networks that approximate vector-valued mappings. In: Proceedings of the IEEE international conference on neural networks, pp 1973–1978

    Google Scholar 

  124. Park J, Sandberg IW (1991) Universal approximation using radial-basis-function networks. Neural Comput 3:246–257

    Article  Google Scholar 

  125. Pinkus A (1985) \(n\)-widths in approximation theory. Springer

    Google Scholar 

  126. Pinkus A (1997) Approximation by ridge functions. In: Le Méhauté A, Rabut C, Schumaker LL (eds) Surface fitting and multiresolution methods. Vanderbilt University Press, pp 1–14

    Google Scholar 

  127. Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numer 8:143–195

    Article  MathSciNet  MATH  Google Scholar 

  128. Pisier G (1981) Remarques sur un résultat non publié de B. Maurey. In: Séminaire d’Analyse Fonctionnelle 1980–81, vol I, no 12. École Polytechnique, Centre de Mathématiques, Palaiseau

    Google Scholar 

  129. Polyak BT (1966) Existence theorems and convergence of minimizing sequences in extremum problems with restrictions. Dokl Akad Nauk SSSR 166:72–75

    Google Scholar 

  130. Ritz W (1909) Über eine neue Methode zur Lösung gewisser Variationsprobleme der mathematischen Physik. Journal für die Reine und Angewandte Mathematik 135:1–61

    Article  MathSciNet  MATH  Google Scholar 

  131. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization of the brain. Psychol Rev 65:386–408

    Article  Google Scholar 

  132. Rosenblatt F (Feb 1960) On the convergence of reinforcement procedures in simple perceptrons. Technical Report Report VG-1196-G-4, Cornell Aeronautical Laboratory, Buffalo, NY

    Google Scholar 

  133. Rudin W (1964) Principles of mathematical analysis. McGraw-Hill

    Google Scholar 

  134. Sanguineti M (2008) Universal approximation by ridge computational models and neural networks: a survey. Open Appl Math J 2:31–58

    Article  MathSciNet  MATH  Google Scholar 

  135. Scarselli F, Tsoi AC (1998) Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural Netw 11:15–37

    Article  Google Scholar 

  136. Schölkopf B, Smola AJ (2001) Learning with kernels. MIT Press

    Google Scholar 

  137. Singer I (1970) Best approximation in normed linear spaces by elements of linear subspaces. Springer

    Google Scholar 

  138. Sirisena HR, Chou FS (1979) Convergence of the control parametrization Ritz method for nonlinear optimal control problems. J Optim Theory Appl 29:369–382

    Article  MathSciNet  MATH  Google Scholar 

  139. Sjöberg J, Zhang Q, Ljung L, Benveniste A, Glorennec P-Y, Delyon B, Hjalmarsson H, Juditsky A (1995) Nonlinear black-box modeling in system identification: a unified overview. Automatica 31:1691–1724

    Article  MathSciNet  MATH  Google Scholar 

  140. Sontag ED (1992) Feedback stabilization using two-hidden-layer nets. IEEE Trans Neural Netw 3:981–990

    Article  Google Scholar 

  141. Stinchcombe M, White H (1989) Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. In: Proceedings of the international joint conference on neural networks, vol 1. SOS Printing, San Diego, pp 613–617. (Reprinted in Artificial neural networks: approximation & learning theory, White H (ed) Blackwell, 1992)

    Google Scholar 

  142. Tjuhtin VB (1982) An error estimate for approximate solutions in one-sided variational problems. Vestn Leningr Univ Math 14:247–254

    Google Scholar 

  143. Vapnik VN (1998) Statistical learning theory. Wiley

    Google Scholar 

  144. Wasilkowski GW, Woźniakowski H (2001) Complexity of weighted approximation over \(\mathbb{R}^d\). J Complex 17:722–740

    Article  MathSciNet  MATH  Google Scholar 

  145. Widrow B, Hoff Jr ME (1960) Adaptive switching circuits. In: 1960 IRE western electric show and convention record, Part 4, pp 96–104

    Google Scholar 

  146. Widrow B, Lehr MA (1990) 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proc IEEE 78:1415–1442

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Zoppoli .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zoppoli, R., Sanguineti, M., Gnecco, G., Parisini, T. (2020). Some Families of FSP Functions and Their Properties. In: Neural Approximations for Optimal Control and Decision. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-29693-3_3

Download citation

Publish with us

Policies and ethics