Skip to main content

Design of Mathematical Models by Learning From Data and FSP Functions

  • Chapter
  • First Online:
Neural Approximations for Optimal Control and Decision

Abstract

First, well-known concepts from Statistical Learning Theory are reviewed. In reference to the problem of modelling an unknown input/output (I/O) relationship by fixed-structure parametrized functions, the concepts of expected risk, empirical risk, and generalization error are described. The last error is then split into approximation and estimation errors. Four quantities of interest are emphasized: the accuracy, the number of arguments of the I/O relationship, the model complexity, and the number of samples generated for the estimation. The possibility of generating such samples by deterministic algorithms like quasi-Monte Carlo methods, orthogonal arrays, Latin hypercubes, etc. gives rise to the so-called Deterministic Learning Theory. This possibility is an intriguing alternative to the random generation of input data, typically obtained by using Monte Carlo techniques, since it enables one to reduce the number of samples (under the same accuracy) and to obtain upper bounds on the errors in deterministic terms rather than in probabilistic ones. Deterministic learning relies on some basic quantities such as variation and discrepancy. Special families of deterministic sequences called “low-discrepancy sequences” are useful in the computation of integrals and in dynamic programming, to mitigate the danger of incurring the curse of dimensionality deriving from the use of regular grids.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A more suitable expression is actually “supervised learning from data,” to distinguish it from other forms of learning, such as “unsupervised learning from data.” However, since in the book we shall deal with supervised learning problems, the term “supervised” will be typically omitted.

  2. 2.

    Recall that the support of a function \(f :X \rightarrow \mathbb R\), where X is a normed linear space, is defined as \(\mathrm{supp}\,f \triangleq \mathrm{cl} (\{x \in X \,|\,f(x) \not = 0\})\), i.e., it is the closure in the norm of X of the set of points where \(f \not = 0\).

  3. 3.

    We refer to O(1/L) and \(O(1/L^{1/2})\) as “linear” and “quadratic” rates with respect to L, respectively, for the following reasons that have close but simpler similarities with what reported in Assumptions 2.10, 2.11, 2.12, and 2.13 (L takes the place of n; the dimension d is absent). Let Q be a quantity dependent on L in such a way that \(Q(L) = O(1/L)\) (Q(L) corresponds, e.g., to the left-hand terms of (2.48) and (2.70)). Then, there exists a constant \(c_1\) such that \(Q(L) \le c_1/L\). Let \({\varepsilon } >0\). To evaluate the rate at which L has to grow when \({\varepsilon } \rightarrow 0\), hence when \(1/{\varepsilon } \rightarrow \infty \), in such a way to guarantee that \(Q(L) \le {\varepsilon }\) holds, we impose \(c_1/L \le {\varepsilon }\). This provides \(L \ge c_1/{\varepsilon }\), which means that when \(1/{\varepsilon } \rightarrow \infty \), \(L \rightarrow \infty \) linearly with respect to \(1/{\varepsilon }\). Analogously, \(Q(L) = O(1/L^{1/2})\) implies that there exists \(c_2\) such that \(Q(L) \le c_2/L^{1/2}\) and so, as shown above, we get \(L \ge (c_2/{\varepsilon })^2\), which expresses that, for \(1/{\varepsilon } \rightarrow \infty \), \(L \rightarrow \infty \) quadratically with respect to \(1/{\varepsilon }\).

References

  1. Alon N, Ben-David S, Cesa-Bianchi N, Haussler D (1997) Scale-sensitive dimensions, uniform convergence, and learnability. J ACM 44:615–631

    Article  MathSciNet  Google Scholar 

  2. Angluin D, Valiant L (1979) Fast probabilistic algorithms for Hamiltonian circuits and matchings. J Comput Syst Sci 18:155–193

    Article  MathSciNet  Google Scholar 

  3. Barron AR (1994) Approximation and estimation bounds for artificial neural networks. Mach Learn 14:115–133

    MATH  Google Scholar 

  4. Cervellera C, Muselli M (2004) Deterministic design for neural network learning: an approach based on discrepancy. IEEE Trans Neural Netw 15:533–544

    Article  Google Scholar 

  5. Cherkassky V, Mulier F (2007) learning from data: concepts, theory, and methods. Wiley

    Google Scholar 

  6. Cucker F, Smale S (2001) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49

    Article  MathSciNet  Google Scholar 

  7. Devroye L, Györfi L, Lugosi G (1997) A probabilistic theory of pattern recognition. Springer, New York

    MATH  Google Scholar 

  8. Dick J, Pillichshammer F (2010) Digital nets and sequences: discrepancy theory and quasi-Monte Carlo integration. Cambridge University Press

    Google Scholar 

  9. Dudley RM (1979) Balls in \(\,{\mathbb{R}}^k\,\) do not cut all subsets of \(\, k+2\,\) points. Adv Math 31:306–308

    Article  MathSciNet  Google Scholar 

  10. Dudley RM, Giné R, Zinn J (1991) Uniform and universal Glivenko-Cantelli classes. J Theor Prob 4:485–510

    Article  MathSciNet  Google Scholar 

  11. Fang K-T, Wang Y (1994) Number-theoretic methods in statistics. Chapmann & Hall

    Google Scholar 

  12. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58

    Article  Google Scholar 

  13. Girosi F (1995) Approximating error bounds that use VC bounds. In: Proceedings of the international conference on artificial neural networks, pp 295–302

    Google Scholar 

  14. Girosi F, Anzellotti G (1992) Rates of convergence of approximation by translates. Technical Report 1288, Artificial Intelligence Laboratory, Massachusetts Institute of Technology

    Google Scholar 

  15. Guyon I, Vapnik V, Boser B, Bottou L, Solla S (1992) Capacity control in linear classifiers for pattern recognition. In: Proceedings of the 11th IAPR international conference on pattern recognition, conference B: pattern recognition methodology and systems, vol II, pp 385–388

    Google Scholar 

  16. Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning. Springer, New York

    MATH  Google Scholar 

  17. Haussler D (1992) Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf Comput 100:78–150

    Article  MathSciNet  Google Scholar 

  18. Haykin S (2008) Neural networks and learning systems. Pearson Prentice-Hall

    Google Scholar 

  19. Hlawka E (1961) Funktionen von Beschränkter Variation in der Theorie der Gleichverteilung. Ann Mat Pura Appl 54:325–333

    Article  MathSciNet  Google Scholar 

  20. Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30

    Article  MathSciNet  Google Scholar 

  21. Kuipers L, Niederreiter H (1974) Uniform distribution of sequences. Wiley

    Google Scholar 

  22. McCaffrey DF, Gallant AR (1994) Convergence rates for single hidden layer feedforward nets. Neural Netw 7:147–158

    Article  Google Scholar 

  23. Mendelson S (2003) A few notes on statistical learning theory. In: Mendelson S, Smola A (eds) Advanced lectures on machine learning – LNCS 2600, pp 1–40. Springer, Berlin

    Chapter  Google Scholar 

  24. Niederreiter H (1992) Random number generation and Quasi-Monte Carlo methods. SIAM

    Google Scholar 

  25. Niyogi P, Girosi F (1994) On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Technical report, A.I. Memo No. 1467, C.B.C.L. No. 88, Massachusset Institute of Technology, ftp://publications.ai.mit.edu/ai-publications/1000-1499/AIM-1467.ps.Z

  26. Niyogi P, Girosi F (1996) On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Comput 8:819–842

    Article  Google Scholar 

  27. Nussbaum M (1996) On nonparametric estimation of a regression function that is smooth in a domain on \({\mathbb{R}}^k\). Theor Probab Appl 31:118–125

    MathSciNet  Google Scholar 

  28. Owen A (2005) Multidimensional variation for quasi-Monte Carlo. In: Fang J, Li G (eds) Contemporary multivariate analysis and experimental design – celebration in honor of Professor Kai-Tai Fang’s 65th birthday, pp 49–85. World Scientific

    Google Scholar 

  29. Pollard D (1990) Empirical processes: theory and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 2. Institute of Mathematical Statistics and American Statistical Association

    Google Scholar 

  30. Schlier C (2004) Discrepancy behaviour in the non-asymptotic regime. Appl Numer Math 50:227–238

    Article  MathSciNet  Google Scholar 

  31. Sobol’ IM (1967) The distribution of points in a cube and the approximate evaluation of integrals. USSR Comput Math Math Phys 7:86–112

    Article  MathSciNet  Google Scholar 

  32. Vapnik VN (1982) Estimation of dependences based on empirical data. Springer

    Google Scholar 

  33. Vapnik VN (1998) Statistical learning theory. Wiley

    Google Scholar 

  34. Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer

    Google Scholar 

  35. Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theor Probab Appl 16:264–280

    Article  Google Scholar 

  36. Vapnik VN, Chervonenkis AJ (1991) The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recogn Image Anal 1:283–305

    Google Scholar 

  37. Vidyasagar M (1997) A theory of learning and generalization. Springer, Berlin

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Zoppoli .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zoppoli, R., Sanguineti, M., Gnecco, G., Parisini, T. (2020). Design of Mathematical Models by Learning From Data and FSP Functions. In: Neural Approximations for Optimal Control and Decision. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-29693-3_4

Download citation

Publish with us

Policies and ethics