Design of Mathematical Models by Learning From Data and FSP Functions

Zoppoli, Riccardo; Sanguineti, Marcello; Gnecco, Giorgio; Parisini, Thomas

doi:10.1007/978-3-030-29693-3_4

Part of the book series: Communications and Control Engineering ((CCE))

646 Accesses

Abstract

First, well-known concepts from Statistical Learning Theory are reviewed. In reference to the problem of modelling an unknown input/output (I/O) relationship by fixed-structure parametrized functions, the concepts of expected risk, empirical risk, and generalization error are described. The last error is then split into approximation and estimation errors. Four quantities of interest are emphasized: the accuracy, the number of arguments of the I/O relationship, the model complexity, and the number of samples generated for the estimation. The possibility of generating such samples by deterministic algorithms like quasi-Monte Carlo methods, orthogonal arrays, Latin hypercubes, etc. gives rise to the so-called Deterministic Learning Theory. This possibility is an intriguing alternative to the random generation of input data, typically obtained by using Monte Carlo techniques, since it enables one to reduce the number of samples (under the same accuracy) and to obtain upper bounds on the errors in deterministic terms rather than in probabilistic ones. Deterministic learning relies on some basic quantities such as variation and discrepancy. Special families of deterministic sequences called “low-discrepancy sequences” are useful in the computation of integrals and in dynamic programming, to mitigate the danger of incurring the curse of dimensionality deriving from the use of regular grids.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A more suitable expression is actually “supervised learning from data,” to distinguish it from other forms of learning, such as “unsupervised learning from data.” However, since in the book we shall deal with supervised learning problems, the term “supervised” will be typically omitted.
2.
Recall that the support of a function \(f :X \rightarrow \mathbb R\), where X is a normed linear space, is defined as \(\mathrm{supp}\,f \triangleq \mathrm{cl} (\{x \in X \,|\,f(x) \not = 0\})\), i.e., it is the closure in the norm of X of the set of points where \(f \not = 0\).
3.
We refer to O(1/L) and \(O(1/L^{1/2})\) as “linear” and “quadratic” rates with respect to L, respectively, for the following reasons that have close but simpler similarities with what reported in Assumptions 2.10, 2.11, 2.12, and 2.13 (L takes the place of n; the dimension d is absent). Let Q be a quantity dependent on L in such a way that \(Q(L) = O(1/L)\) (Q(L) corresponds, e.g., to the left-hand terms of (2.48) and (2.70)). Then, there exists a constant \(c_1\) such that \(Q(L) \le c_1/L\). Let \({\varepsilon } >0\). To evaluate the rate at which L has to grow when \({\varepsilon } \rightarrow 0\), hence when \(1/{\varepsilon } \rightarrow \infty \), in such a way to guarantee that \(Q(L) \le {\varepsilon }\) holds, we impose \(c_1/L \le {\varepsilon }\). This provides \(L \ge c_1/{\varepsilon }\), which means that when \(1/{\varepsilon } \rightarrow \infty \), \(L \rightarrow \infty \) linearly with respect to \(1/{\varepsilon }\). Analogously, \(Q(L) = O(1/L^{1/2})\) implies that there exists \(c_2\) such that \(Q(L) \le c_2/L^{1/2}\) and so, as shown above, we get \(L \ge (c_2/{\varepsilon })^2\), which expresses that, for \(1/{\varepsilon } \rightarrow \infty \), \(L \rightarrow \infty \) quadratically with respect to \(1/{\varepsilon }\).

References

Alon N, Ben-David S, Cesa-Bianchi N, Haussler D (1997) Scale-sensitive dimensions, uniform convergence, and learnability. J ACM 44:615–631
Article MathSciNet Google Scholar
Angluin D, Valiant L (1979) Fast probabilistic algorithms for Hamiltonian circuits and matchings. J Comput Syst Sci 18:155–193
Article MathSciNet Google Scholar
Barron AR (1994) Approximation and estimation bounds for artificial neural networks. Mach Learn 14:115–133
MATH Google Scholar
Cervellera C, Muselli M (2004) Deterministic design for neural network learning: an approach based on discrepancy. IEEE Trans Neural Netw 15:533–544
Article Google Scholar
Cherkassky V, Mulier F (2007) learning from data: concepts, theory, and methods. Wiley
Google Scholar
Cucker F, Smale S (2001) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49
Article MathSciNet Google Scholar
Devroye L, Györfi L, Lugosi G (1997) A probabilistic theory of pattern recognition. Springer, New York
MATH Google Scholar
Dick J, Pillichshammer F (2010) Digital nets and sequences: discrepancy theory and quasi-Monte Carlo integration. Cambridge University Press
Google Scholar
Dudley RM (1979) Balls in \(\,{\mathbb{R}}^k\,\) do not cut all subsets of \(\, k+2\,\) points. Adv Math 31:306–308
Article MathSciNet Google Scholar
Dudley RM, Giné R, Zinn J (1991) Uniform and universal Glivenko-Cantelli classes. J Theor Prob 4:485–510
Article MathSciNet Google Scholar
Fang K-T, Wang Y (1994) Number-theoretic methods in statistics. Chapmann & Hall
Google Scholar
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–58
Article Google Scholar
Girosi F (1995) Approximating error bounds that use VC bounds. In: Proceedings of the international conference on artificial neural networks, pp 295–302
Google Scholar
Girosi F, Anzellotti G (1992) Rates of convergence of approximation by translates. Technical Report 1288, Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Google Scholar
Guyon I, Vapnik V, Boser B, Bottou L, Solla S (1992) Capacity control in linear classifiers for pattern recognition. In: Proceedings of the 11th IAPR international conference on pattern recognition, conference B: pattern recognition methodology and systems, vol II, pp 385–388
Google Scholar
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning. Springer, New York
MATH Google Scholar
Haussler D (1992) Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf Comput 100:78–150
Article MathSciNet Google Scholar
Haykin S (2008) Neural networks and learning systems. Pearson Prentice-Hall
Google Scholar
Hlawka E (1961) Funktionen von Beschränkter Variation in der Theorie der Gleichverteilung. Ann Mat Pura Appl 54:325–333
Article MathSciNet Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30
Article MathSciNet Google Scholar
Kuipers L, Niederreiter H (1974) Uniform distribution of sequences. Wiley
Google Scholar
McCaffrey DF, Gallant AR (1994) Convergence rates for single hidden layer feedforward nets. Neural Netw 7:147–158
Article Google Scholar
Mendelson S (2003) A few notes on statistical learning theory. In: Mendelson S, Smola A (eds) Advanced lectures on machine learning – LNCS 2600, pp 1–40. Springer, Berlin
Chapter Google Scholar
Niederreiter H (1992) Random number generation and Quasi-Monte Carlo methods. SIAM
Google Scholar
Niyogi P, Girosi F (1994) On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Technical report, A.I. Memo No. 1467, C.B.C.L. No. 88, Massachusset Institute of Technology, ftp://publications.ai.mit.edu/ai-publications/1000-1499/AIM-1467.ps.Z
Niyogi P, Girosi F (1996) On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Comput 8:819–842
Article Google Scholar
Nussbaum M (1996) On nonparametric estimation of a regression function that is smooth in a domain on \({\mathbb{R}}^k\). Theor Probab Appl 31:118–125
MathSciNet Google Scholar
Owen A (2005) Multidimensional variation for quasi-Monte Carlo. In: Fang J, Li G (eds) Contemporary multivariate analysis and experimental design – celebration in honor of Professor Kai-Tai Fang’s 65th birthday, pp 49–85. World Scientific
Google Scholar
Pollard D (1990) Empirical processes: theory and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 2. Institute of Mathematical Statistics and American Statistical Association
Google Scholar
Schlier C (2004) Discrepancy behaviour in the non-asymptotic regime. Appl Numer Math 50:227–238
Article MathSciNet Google Scholar
Sobol’ IM (1967) The distribution of points in a cube and the approximate evaluation of integrals. USSR Comput Math Math Phys 7:86–112
Article MathSciNet Google Scholar
Vapnik VN (1982) Estimation of dependences based on empirical data. Springer
Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley
Google Scholar
Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer
Google Scholar
Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theor Probab Appl 16:264–280
Article Google Scholar
Vapnik VN, Chervonenkis AJ (1991) The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recogn Image Anal 1:283–305
Google Scholar
Vidyasagar M (1997) A theory of learning and generalization. Springer, Berlin
MATH Google Scholar

Download references

Author information

Authors and Affiliations

DIBRIS, Università di Genova, Genoa, Italy
Riccardo Zoppoli & Marcello Sanguineti
AXES Research Unit, IMT—School of Advanced Studies Lucca, Lucca, Italy
Giorgio Gnecco
Imperial College London, London, UK
Thomas Parisini
University of Trieste, Trieste, Italy
Thomas Parisini

Authors

Riccardo Zoppoli
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Sanguineti
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Gnecco
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Parisini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riccardo Zoppoli .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zoppoli, R., Sanguineti, M., Gnecco, G., Parisini, T. (2020). Design of Mathematical Models by Learning From Data and FSP Functions. In: Neural Approximations for Optimal Control and Decision. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-29693-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-29693-3_4
Published: 18 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29691-9
Online ISBN: 978-3-030-29693-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics