Abstract
Within machine learning, the subfield of Neural Architecture Search (NAS) has recently garnered research attention due to its ability to improve upon human-designed models. However, the computational requirements for finding an exact solution to this problem are often intractable, and the design of the search space still requires manual intervention. In this paper we attempt to establish a formalized framework from which we can better understand the computational bounds of NAS in relation to its search space. For this, we first reformulate the function approximation problem in terms of sequences of functions, and we call it the Function Approximation (FA) problem; then we show that it is computationally infeasible to devise a procedure that solves FA for all functions to zero error, regardless of the search space. We show also that such error will be minimal if a specific class of functions is present in the search space. Subsequently, we show that machine learning as a mathematical problem is a solution strategy for FA, albeit not an effective one, and further describe a stronger version of this approach: the Approximate Architectural Search Problem (a-ASP), which is the mathematical equivalent of NAS. We leverage the framework from this paper and results from the literature to describe the conditions under which a-ASP can potentially solve FA as well as an exhaustive search, but in polynomial time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Throughout this paper, the problem of data selection is not considered, and is simply assumed to be an input to our solution strategies.
- 2.
With the possible exception of the results from [54].
References
Angeline, P.J., Saunders, G.M., Pollack, J.B.: An evolutionary algorithm that constructs recurrent neural networks. Trans. Neur. Netw. 5(1), 54–65 (1994). https://doi.org/10.1109/72.265960
Bartlett, P., Ben-David, S.: Hardness results for neural network approximation problems. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 50–62. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49097-3_5
Baxter, J.: A model of inductive bias learning. J. Artifi. Intell. Res. 12, 149–198 (2000). https://doi.org/10.1613/jair.731
Ben-David, S., Hrubes, P., Moran, S., Shpilka, A., Yehudayoff, A.: A learning problem that is independent of the set theory ZFC axioms. CoRR abs/1711.05195 (2017). http://arxiv.org/abs/1711.05195
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). https://doi.org/10.1561/2200000006
Blum, M.: A machine-independent theory of the complexity of recursive functions. J. ACM 14(2), 322–336 (1967). https://doi.org/10.1145/321386.321395
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the vapnik-chervonenkis dimension. J. Assoc. Comput. Mach. 36, 929–965 (1989). https://doi.org/10.1145/76359.76371
Bshouty, N.H.: A new composition theorem for learning algorithms. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 583–589. ACM, New York (1998). https://doi.org/10.1145/258533.258614
Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput. Vis. Graph. Image Process. 37, 54–115 (1987). https://doi.org/10.1016/S0734-189X(87)80014-2
Carvalho, A.R., Ramos, F.M., Chaves, A.A.: Metaheuristics for the feedforward artificial neural network (ANN) architecture optimization problem. Neural Comput. Appl. (2010). https://doi.org/10.1007/s00521-010-0504-3
Church, A.: An unsolvable problem of elementary number theory. Am. J. Math. 58, 345–363 (1936)
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems 2, 303–314 (1989). https://doi.org/10.1007/BF02551274
Cybenko, G.: Complexity theory of neural networks and classification problems. In: Almeida, L.B., Wellekens, C.J. (eds.) EURASIP 1990. LNCS, vol. 412, pp. 26–44. Springer, Heidelberg (1990). https://doi.org/10.1007/3-540-52255-7_25
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey (2019). https://doi.org/10.1007/978-3-030-05318-5_3
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2962–2970. Curran Associates, Inc. (2015)
Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2, 183–192 (1989). https://doi.org/10.1016/0893-6080(89)90003-8
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995). https://doi.org/10.1162/neco.1995.7.2.219
Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., Sculley, D.: Google vizier: a service for black-box optimization (2017). https://doi.org/10.1145/3097983.3098043
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S.: AMC: autoML for model compression and acceleration on mobile devices. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800 (2018)
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
Jin, H., Song, Q., Hu, X.: Auto-keras: Efficient neural architecture search with network morphism (2018)
Kolmogorov, A.N.: On the representation of continuous functions of several variables by superposition of continuous function of one variable and addition. Dokl. Akad. Nauk SSSR 114, 953–956 (1957)
Leshno, M., Lin, V.Y., Pinkus, A., Shocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6, 861–867 (1993). https://doi.org/10.1016/S0893-6080(05)80131-5
Liu, H., Simonyan, K., Yang, Y.: Hierarchical representations for efficient architecture search. In: International Conference on Learning Representations (2018)
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019)
Long, P.M., Sedghi, H.: Size-free generalization bounds for convolutional neural networks. CoRR abs/1905.12600 (2019). https://arxiv.org/pdf/1905.12600v1.pdf
Luo, R., Tian, F., Qin, T., Liu, T.Y.: Neural architecture optimization. In: NeurIPS (2018)
Miller, G.F., Todd, P.M., Hegde, S.U.: Designing neural networks using genetic algorithms. In: Proceedings 3rd International Conference Genetic Algorithms and Their Applications, pp. 379–384 (1989)
Neto, J.P., Siegelmann, H.T., Costa, J.F., Araujo, C.P.S.: Turing universality of neural nets (revisited). In: Pichler, F., Moreno-Díaz, R. (eds.) EUROCAST 1997. LNCS, vol. 1333, pp. 361–366. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0025058
Ojha, V.K., Abraham, A., Snášel, V.: Metaheuristic design of feedforward neural networks: a review of two decades of research. Eng. Appl. Artif. Intell. 60(C), 97–116 (2017). https://doi.org/10.1016/j.engappai.2017.01.013
Orponen, P.: Computational complexity of neural networks: a survey. Nordic J. Comput. 1(1), 94–110 (1994)
Ostrand, P.A.: Dimension of metric spaces and hilbert’s problem 13. Bull. Am. Math. Soc. 71, 619–622 (1965). https://doi.org/10.1090/S0002-9904-1965-11363-5
Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3, 246–257 (1991). https://doi.org/10.1162/neco.1991.3.2.246
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, 10–15 July 2018, vol. 80, pp. 4095–4104. PMLR (2018)
Poggio, T., Girosi, F.: A theory of networks for approximation and learning. A.I. Memo No. 1140 (1989)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990). https://doi.org/10.1109/5.58326
Rabin, M.O.: Computable algebra, general theory and theory of computable fields. Trans. Amer. Math. Soc. 95, 341–360 (1960). https://doi.org/10.1090/S0002-9947-1960-0113807-4
Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34\(^{th}\) International Conference on Machine Learning (2017)
Rogers Jr., H.: The Theory of Recursive Functions and Effective Computability. MIT Press, Cambridge (1987)
Schäfer, A.M., Zimmermann, H.G.: Recurrent neural networks are universal approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006). https://doi.org/10.1007/11840817_66
Schaffer, J.D., Caruana, R.A., Eshelman, L.J.: Using genetic search to exploit the emergent behavior of neural networks. Physics D 42, 244–248 (1990). https://doi.org/10.1016/0167-2789(90)90078-4
Siegel, J.W., Xu, J.: On the approximation properties of neural networks. arXiv e-prints arXiv:1904.02311 (2019)
Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4, 77–80 (1991). https://doi.org/10.1016/0893-9659(91)90080-F
Siegelmann, H.T., Sontag, E.D.: On the computational power of neural nets. J. Comput. Syst. Sci. 50, 132–150 (1995). https://doi.org/10.1006/jcss.1995.1013
Sontag, E.D.: VC dimension of neural networks. Neural Netw. Mach. Learn. 168, 69–95 (1998)
Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through evolutionary algorithms. Nat. Mach. Intell. 1, 24–35 (2019)
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002). https://doi.org/10.1162/106365602320169811
Sun, Y., Yen, G.G., Yi, Z.: Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evol. Comput. 23, 89–103 (2019). https://doi.org/10.1109/TEVC.2018.2808689
Tenorio, M.F., Lee, W.T.: Self organizing neural networks for the identification problem. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 1, pp. 57–64. Morgan-Kaufmann, San Mateo (1989)
Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984). https://doi.org/10.1145/1968.1972
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds.) Measures of Complexity, pp. 11–30. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21852-6_3
Vitushkin, A.: Some properties of linear superpositions of smooth functions. Dokl. Akad. Nauk SSSR 156, 1258–1261 (1964)
Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–87 (1997). https://doi.org/10.1109/4235.585893
Wolpert, D.H., Macready, W.G.: Coevolutionary free lunches. IEEE Trans. Evol. Comput. 9, 721–735 (2005). https://doi.org/10.1109/TEVC.2005.856205
Wong, C., Houlsby, N., Lu, Y., Gesmundo, A.: Transfer learning with neural autoML. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 8366–8375 (2018)
Yang, X.S.: Metaheuristic optimization: algorithm analysis and open problems. In: Proceedings of the \(10^{th}\) International Symposium on Experimental Algorithms, vol. 6630, pp. 21–32 (2011). https://doi.org/10.1007/978-3-642-20662-7_2
Yao, X.: Evolving artificial neural networks. Proc. IEEE 87(9), 1423–1447 (1999). https://doi.org/10.1109/5.784219
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. CoRR abs/1611.01578 (2016)
Acknowledgments
The author is grateful to the anonymous reviewers for their helpful feedback on this paper, and also thanks Y. Goren, Q. Wang, N. Strom, C. Bejjani, Y. Xu, and B. d’Iverno for their comments and suggestions on the early stages of this project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
A PAC Is a Solver for FA
PAC learning, as defined by Valiant [52], is a slightly different problem than FA, as it concerns itself with whether a concept class C can be described with high probability with a member of a hypothesis class H. It also establishes bounds in terms of the amount of samples from members \(c \in C\) that are needed to learn C. On the other hand, FA and its solution strategies concern themselves with finding a solution that minimizes the error, by searching through sequences of explicitly defined members drawn from a search space.
Regardless of these differences, PAC learning as a procedure can still be formulated as a solution strategy for FA. To do this, let H be our search space. Then note that the PAC error function \(e_{pac}(h, c) = Pr_{x\sim \mathcal {P}}[h(x) \ne c(x)],\; c \in C,\;h \in H\), is equivalent to computing \(\varepsilon _\sigma (h, c)\) for some subset \(\sigma \subset dom(c)\), and choosing the frequentist difference between the images of the functions as the metric d. Our objective would be to return the \(h \in H\) that minimizes the approximation error for a given subset \(\sigma \subset C\). Note that we do not search through the expanded search space \(H^{\star , n}\).
Finding the right distribution for a specific class may be NP-hard [7], and so \(e_{pac}\) requires us to make certain assumptions about the distribution of the input values. Additionally, any optimizer for PAC is required to run in polynomial time. Due to all of this, PAC is a weaker approach to solve FA when compared to ASP, but stronger than ML since this solution strategy is fixed to the design of the search space, and not to the choice of function. Nonetheless, it must be stressed that the bounds and paradigms provided by PAC and FA are not mutually exclusive, either: the most prominent example being that PAC learning provides conditions under which the choice subset \(\sigma \) is optimal.
With the polynomial constraint for PAC learning lifted, and letting the sample and search space sizes grow infinitely, PAC is effectively equivalent to ASP. However, that defies the purpose of the PAC framework, as its success relies on being a tractable learning theory.
B The VC Dimension and the Information Potential
There is a natural correspondence between the VC dimension [7, 53] of a hypothesis space, and the information capacity of a sequence.
To see this, note that the VC dimension is usually defined in terms of the set of concepts (i.e., the input function \(\mathcal {F}\)) that can be shattered by a predetermined function f with \(img(f) = \{0,1\}\). It is frequently used to quantify the ability of a procedure to learn the input function \(\mathcal {F}\).
In the FA framework we are more interested in whether the search space–also a set–of a given solution strategy is able to generalize well to multiple, unseen input functions. Therefore, for fixed \(\mathcal {F}\) and f, the VC dimension and its variants provide a powerful insight on the ability of an algorithm to learn. When f is not fixed, it is still possible to utilize this quantity to measure the capacity of a search space \(\mathcal {S}\), by simply taking the union of all possible \(f \in \mathcal {S}^{\star , n}\) for a given n. However, when the the input functions are not fixed either, we are unable to use the definition of VC dimension in this context, as the set of input concepts is unknown to us. We thus need a more flexible way to model generalizability, and that is where we leverage the information potential \(U(\mathcal {S}, n)\) of a search space.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
de Wynter, A. (2019). On the Bounds of Function Approximations. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-30487-4_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)