On the Bounds of Function Approximations

de Wynter, Adrian

doi:10.1007/978-3-030-30487-4_32

Adrian de Wynter ORCID: orcid.org/0000-0003-2679-7241¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11727))

Included in the following conference series:

International Conference on Artificial Neural Networks

2918 Accesses
1 Citations
11 Altmetric

Abstract

Within machine learning, the subfield of Neural Architecture Search (NAS) has recently garnered research attention due to its ability to improve upon human-designed models. However, the computational requirements for finding an exact solution to this problem are often intractable, and the design of the search space still requires manual intervention. In this paper we attempt to establish a formalized framework from which we can better understand the computational bounds of NAS in relation to its search space. For this, we first reformulate the function approximation problem in terms of sequences of functions, and we call it the Function Approximation (FA) problem; then we show that it is computationally infeasible to devise a procedure that solves FA for all functions to zero error, regardless of the search space. We show also that such error will be minimal if a specific class of functions is present in the search space. Subsequently, we show that machine learning as a mathematical problem is a solution strategy for FA, albeit not an effective one, and further describe a stronger version of this approach: the Approximate Architectural Search Problem (a-ASP), which is the mathematical equivalent of NAS. We leverage the framework from this paper and results from the literature to describe the conditions under which a-ASP can potentially solve FA as well as an exhaustive search, but in polynomial time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Throughout this paper, the problem of data selection is not considered, and is simply assumed to be an input to our solution strategies.
2.
With the possible exception of the results from [54].

References

Angeline, P.J., Saunders, G.M., Pollack, J.B.: An evolutionary algorithm that constructs recurrent neural networks. Trans. Neur. Netw. 5(1), 54–65 (1994). https://doi.org/10.1109/72.265960
Article Google Scholar
Bartlett, P., Ben-David, S.: Hardness results for neural network approximation problems. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 50–62. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49097-3_5
Chapter Google Scholar
Baxter, J.: A model of inductive bias learning. J. Artifi. Intell. Res. 12, 149–198 (2000). https://doi.org/10.1613/jair.731
Article MathSciNet MATH Google Scholar
Ben-David, S., Hrubes, P., Moran, S., Shpilka, A., Yehudayoff, A.: A learning problem that is independent of the set theory ZFC axioms. CoRR abs/1711.05195 (2017). http://arxiv.org/abs/1711.05195
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). https://doi.org/10.1561/2200000006
Article MathSciNet MATH Google Scholar
Blum, M.: A machine-independent theory of the complexity of recursive functions. J. ACM 14(2), 322–336 (1967). https://doi.org/10.1145/321386.321395
Article MathSciNet MATH Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the vapnik-chervonenkis dimension. J. Assoc. Comput. Mach. 36, 929–965 (1989). https://doi.org/10.1145/76359.76371
Article MathSciNet MATH Google Scholar
Bshouty, N.H.: A new composition theorem for learning algorithms. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 583–589. ACM, New York (1998). https://doi.org/10.1145/258533.258614
Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput. Vis. Graph. Image Process. 37, 54–115 (1987). https://doi.org/10.1016/S0734-189X(87)80014-2
Article MATH Google Scholar
Carvalho, A.R., Ramos, F.M., Chaves, A.A.: Metaheuristics for the feedforward artificial neural network (ANN) architecture optimization problem. Neural Comput. Appl. (2010). https://doi.org/10.1007/s00521-010-0504-3
Article Google Scholar
Church, A.: An unsolvable problem of elementary number theory. Am. J. Math. 58, 345–363 (1936)
Article MathSciNet Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems 2, 303–314 (1989). https://doi.org/10.1007/BF02551274
Article MathSciNet MATH Google Scholar
Cybenko, G.: Complexity theory of neural networks and classification problems. In: Almeida, L.B., Wellekens, C.J. (eds.) EURASIP 1990. LNCS, vol. 412, pp. 26–44. Springer, Heidelberg (1990). https://doi.org/10.1007/3-540-52255-7_25
Chapter Google Scholar
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey (2019). https://doi.org/10.1007/978-3-030-05318-5_3
Chapter Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2962–2970. Curran Associates, Inc. (2015)
Google Scholar
Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2, 183–192 (1989). https://doi.org/10.1016/0893-6080(89)90003-8
Article Google Scholar
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995). https://doi.org/10.1162/neco.1995.7.2.219
Article Google Scholar
Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., Sculley, D.: Google vizier: a service for black-box optimization (2017). https://doi.org/10.1145/3097983.3098043
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
MATH Google Scholar
He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S.: AMC: autoML for model compression and acceleration on mobile devices. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800 (2018)
Chapter Google Scholar
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
Article MATH Google Scholar
Jin, H., Song, Q., Hu, X.: Auto-keras: Efficient neural architecture search with network morphism (2018)
Google Scholar
Kolmogorov, A.N.: On the representation of continuous functions of several variables by superposition of continuous function of one variable and addition. Dokl. Akad. Nauk SSSR 114, 953–956 (1957)
MathSciNet MATH Google Scholar
Leshno, M., Lin, V.Y., Pinkus, A., Shocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6, 861–867 (1993). https://doi.org/10.1016/S0893-6080(05)80131-5
Article Google Scholar
Liu, H., Simonyan, K., Yang, Y.: Hierarchical representations for efficient architecture search. In: International Conference on Learning Representations (2018)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019)
Google Scholar
Long, P.M., Sedghi, H.: Size-free generalization bounds for convolutional neural networks. CoRR abs/1905.12600 (2019). https://arxiv.org/pdf/1905.12600v1.pdf
Luo, R., Tian, F., Qin, T., Liu, T.Y.: Neural architecture optimization. In: NeurIPS (2018)
Google Scholar
Miller, G.F., Todd, P.M., Hegde, S.U.: Designing neural networks using genetic algorithms. In: Proceedings 3rd International Conference Genetic Algorithms and Their Applications, pp. 379–384 (1989)
Google Scholar
Neto, J.P., Siegelmann, H.T., Costa, J.F., Araujo, C.P.S.: Turing universality of neural nets (revisited). In: Pichler, F., Moreno-Díaz, R. (eds.) EUROCAST 1997. LNCS, vol. 1333, pp. 361–366. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0025058
Chapter Google Scholar
Ojha, V.K., Abraham, A., Snášel, V.: Metaheuristic design of feedforward neural networks: a review of two decades of research. Eng. Appl. Artif. Intell. 60(C), 97–116 (2017). https://doi.org/10.1016/j.engappai.2017.01.013
Article Google Scholar
Orponen, P.: Computational complexity of neural networks: a survey. Nordic J. Comput. 1(1), 94–110 (1994)
MathSciNet Google Scholar
Ostrand, P.A.: Dimension of metric spaces and hilbert’s problem 13. Bull. Am. Math. Soc. 71, 619–622 (1965). https://doi.org/10.1090/S0002-9904-1965-11363-5
Article MathSciNet MATH Google Scholar
Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3, 246–257 (1991). https://doi.org/10.1162/neco.1991.3.2.246
Article Google Scholar
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, 10–15 July 2018, vol. 80, pp. 4095–4104. PMLR (2018)
Google Scholar
Poggio, T., Girosi, F.: A theory of networks for approximation and learning. A.I. Memo No. 1140 (1989)
Google Scholar
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990). https://doi.org/10.1109/5.58326
Article MATH Google Scholar
Rabin, M.O.: Computable algebra, general theory and theory of computable fields. Trans. Amer. Math. Soc. 95, 341–360 (1960). https://doi.org/10.1090/S0002-9947-1960-0113807-4
Article MathSciNet MATH Google Scholar
Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34\(^{th}\) International Conference on Machine Learning (2017)
Google Scholar
Rogers Jr., H.: The Theory of Recursive Functions and Effective Computability. MIT Press, Cambridge (1987)
Google Scholar
Schäfer, A.M., Zimmermann, H.G.: Recurrent neural networks are universal approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006). https://doi.org/10.1007/11840817_66
Chapter Google Scholar
Schaffer, J.D., Caruana, R.A., Eshelman, L.J.: Using genetic search to exploit the emergent behavior of neural networks. Physics D 42, 244–248 (1990). https://doi.org/10.1016/0167-2789(90)90078-4
Article Google Scholar
Siegel, J.W., Xu, J.: On the approximation properties of neural networks. arXiv e-prints arXiv:1904.02311 (2019)
Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4, 77–80 (1991). https://doi.org/10.1016/0893-9659(91)90080-F
Article MathSciNet MATH Google Scholar
Siegelmann, H.T., Sontag, E.D.: On the computational power of neural nets. J. Comput. Syst. Sci. 50, 132–150 (1995). https://doi.org/10.1006/jcss.1995.1013
Article MathSciNet MATH Google Scholar
Sontag, E.D.: VC dimension of neural networks. Neural Netw. Mach. Learn. 168, 69–95 (1998)
MATH Google Scholar
Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through evolutionary algorithms. Nat. Mach. Intell. 1, 24–35 (2019)
Article Google Scholar
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002). https://doi.org/10.1162/106365602320169811
Article Google Scholar
Sun, Y., Yen, G.G., Yi, Z.: Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evol. Comput. 23, 89–103 (2019). https://doi.org/10.1109/TEVC.2018.2808689
Article Google Scholar
Tenorio, M.F., Lee, W.T.: Self organizing neural networks for the identification problem. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 1, pp. 57–64. Morgan-Kaufmann, San Mateo (1989)
Google Scholar
Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984). https://doi.org/10.1145/1968.1972
Article MATH Google Scholar
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds.) Measures of Complexity, pp. 11–30. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21852-6_3
Chapter MATH Google Scholar
Vitushkin, A.: Some properties of linear superpositions of smooth functions. Dokl. Akad. Nauk SSSR 156, 1258–1261 (1964)
MathSciNet MATH Google Scholar
Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–87 (1997). https://doi.org/10.1109/4235.585893
Article Google Scholar
Wolpert, D.H., Macready, W.G.: Coevolutionary free lunches. IEEE Trans. Evol. Comput. 9, 721–735 (2005). https://doi.org/10.1109/TEVC.2005.856205
Article Google Scholar
Wong, C., Houlsby, N., Lu, Y., Gesmundo, A.: Transfer learning with neural autoML. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 8366–8375 (2018)
Google Scholar
Yang, X.S.: Metaheuristic optimization: algorithm analysis and open problems. In: Proceedings of the \(10^{th}\) International Symposium on Experimental Algorithms, vol. 6630, pp. 21–32 (2011). https://doi.org/10.1007/978-3-642-20662-7_2
Chapter Google Scholar
Yao, X.: Evolving artificial neural networks. Proc. IEEE 87(9), 1423–1447 (1999). https://doi.org/10.1109/5.784219
Article Google Scholar
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. CoRR abs/1611.01578 (2016)
Google Scholar

Download references

Acknowledgments

The author is grateful to the anonymous reviewers for their helpful feedback on this paper, and also thanks Y. Goren, Q. Wang, N. Strom, C. Bejjani, Y. Xu, and B. d’Iverno for their comments and suggestions on the early stages of this project.

Author information

Authors and Affiliations

Amazon Alexa, 300 Pine St., Seattle, Washington, 98101, USA
Adrian de Wynter

Authors

Adrian de Wynter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian de Wynter .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Appendices

A PAC Is a Solver for FA

PAC learning, as defined by Valiant [52], is a slightly different problem than FA, as it concerns itself with whether a concept class C can be described with high probability with a member of a hypothesis class H. It also establishes bounds in terms of the amount of samples from members \(c \in C\) that are needed to learn C. On the other hand, FA and its solution strategies concern themselves with finding a solution that minimizes the error, by searching through sequences of explicitly defined members drawn from a search space.

Regardless of these differences, PAC learning as a procedure can still be formulated as a solution strategy for FA. To do this, let H be our search space. Then note that the PAC error function \(e_{pac}(h, c) = Pr_{x\sim \mathcal {P}}[h(x) \ne c(x)],\; c \in C,\;h \in H\), is equivalent to computing \(\varepsilon _\sigma (h, c)\) for some subset \(\sigma \subset dom(c)\), and choosing the frequentist difference between the images of the functions as the metric d. Our objective would be to return the \(h \in H\) that minimizes the approximation error for a given subset \(\sigma \subset C\). Note that we do not search through the expanded search space \(H^{\star , n}\).

Finding the right distribution for a specific class may be NP-hard [7], and so \(e_{pac}\) requires us to make certain assumptions about the distribution of the input values. Additionally, any optimizer for PAC is required to run in polynomial time. Due to all of this, PAC is a weaker approach to solve FA when compared to ASP, but stronger than ML since this solution strategy is fixed to the design of the search space, and not to the choice of function. Nonetheless, it must be stressed that the bounds and paradigms provided by PAC and FA are not mutually exclusive, either: the most prominent example being that PAC learning provides conditions under which the choice subset \(\sigma \) is optimal.

With the polynomial constraint for PAC learning lifted, and letting the sample and search space sizes grow infinitely, PAC is effectively equivalent to ASP. However, that defies the purpose of the PAC framework, as its success relies on being a tractable learning theory.

B The VC Dimension and the Information Potential

There is a natural correspondence between the VC dimension [7, 53] of a hypothesis space, and the information capacity of a sequence.

To see this, note that the VC dimension is usually defined in terms of the set of concepts (i.e., the input function \(\mathcal {F}\)) that can be shattered by a predetermined function f with \(img(f) = \{0,1\}\). It is frequently used to quantify the ability of a procedure to learn the input function \(\mathcal {F}\).

In the FA framework we are more interested in whether the search space–also a set–of a given solution strategy is able to generalize well to multiple, unseen input functions. Therefore, for fixed \(\mathcal {F}\) and f, the VC dimension and its variants provide a powerful insight on the ability of an algorithm to learn. When f is not fixed, it is still possible to utilize this quantity to measure the capacity of a search space \(\mathcal {S}\), by simply taking the union of all possible \(f \in \mathcal {S}^{\star , n}\) for a given n. However, when the the input functions are not fixed either, we are unable to use the definition of VC dimension in this context, as the set of input concepts is unknown to us. We thus need a more flexible way to model generalizability, and that is where we leverage the information potential \(U(\mathcal {S}, n)\) of a search space.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Wynter, A. (2019). On the Bounds of Function Approximations. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-30487-4_32
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Bounds of Function Approximations

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendices

A PAC Is a Solver for FA

B The VC Dimension and the Information Potential

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation