Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11727))

Included in the following conference series:

Abstract

Within machine learning, the subfield of Neural Architecture Search (NAS) has recently garnered research attention due to its ability to improve upon human-designed models. However, the computational requirements for finding an exact solution to this problem are often intractable, and the design of the search space still requires manual intervention. In this paper we attempt to establish a formalized framework from which we can better understand the computational bounds of NAS in relation to its search space. For this, we first reformulate the function approximation problem in terms of sequences of functions, and we call it the Function Approximation (FA) problem; then we show that it is computationally infeasible to devise a procedure that solves FA for all functions to zero error, regardless of the search space. We show also that such error will be minimal if a specific class of functions is present in the search space. Subsequently, we show that machine learning as a mathematical problem is a solution strategy for FA, albeit not an effective one, and further describe a stronger version of this approach: the Approximate Architectural Search Problem (a-ASP), which is the mathematical equivalent of NAS. We leverage the framework from this paper and results from the literature to describe the conditions under which a-ASP can potentially solve FA as well as an exhaustive search, but in polynomial time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Throughout this paper, the problem of data selection is not considered, and is simply assumed to be an input to our solution strategies.

  2. 2.

    With the possible exception of the results from [54].

References

  1. Angeline, P.J., Saunders, G.M., Pollack, J.B.: An evolutionary algorithm that constructs recurrent neural networks. Trans. Neur. Netw. 5(1), 54–65 (1994). https://doi.org/10.1109/72.265960

    Article  Google Scholar 

  2. Bartlett, P., Ben-David, S.: Hardness results for neural network approximation problems. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 50–62. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49097-3_5

    Chapter  Google Scholar 

  3. Baxter, J.: A model of inductive bias learning. J. Artifi. Intell. Res. 12, 149–198 (2000). https://doi.org/10.1613/jair.731

    Article  MathSciNet  MATH  Google Scholar 

  4. Ben-David, S., Hrubes, P., Moran, S., Shpilka, A., Yehudayoff, A.: A learning problem that is independent of the set theory ZFC axioms. CoRR abs/1711.05195 (2017). http://arxiv.org/abs/1711.05195

  5. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). https://doi.org/10.1561/2200000006

    Article  MathSciNet  MATH  Google Scholar 

  6. Blum, M.: A machine-independent theory of the complexity of recursive functions. J. ACM 14(2), 322–336 (1967). https://doi.org/10.1145/321386.321395

    Article  MathSciNet  MATH  Google Scholar 

  7. Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the vapnik-chervonenkis dimension. J. Assoc. Comput. Mach. 36, 929–965 (1989). https://doi.org/10.1145/76359.76371

    Article  MathSciNet  MATH  Google Scholar 

  8. Bshouty, N.H.: A new composition theorem for learning algorithms. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 583–589. ACM, New York (1998). https://doi.org/10.1145/258533.258614

  9. Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput. Vis. Graph. Image Process. 37, 54–115 (1987). https://doi.org/10.1016/S0734-189X(87)80014-2

    Article  MATH  Google Scholar 

  10. Carvalho, A.R., Ramos, F.M., Chaves, A.A.: Metaheuristics for the feedforward artificial neural network (ANN) architecture optimization problem. Neural Comput. Appl. (2010). https://doi.org/10.1007/s00521-010-0504-3

    Article  Google Scholar 

  11. Church, A.: An unsolvable problem of elementary number theory. Am. J. Math. 58, 345–363 (1936)

    Article  MathSciNet  Google Scholar 

  12. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems 2, 303–314 (1989). https://doi.org/10.1007/BF02551274

    Article  MathSciNet  MATH  Google Scholar 

  13. Cybenko, G.: Complexity theory of neural networks and classification problems. In: Almeida, L.B., Wellekens, C.J. (eds.) EURASIP 1990. LNCS, vol. 412, pp. 26–44. Springer, Heidelberg (1990). https://doi.org/10.1007/3-540-52255-7_25

    Chapter  Google Scholar 

  14. Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey (2019). https://doi.org/10.1007/978-3-030-05318-5_3

    Chapter  Google Scholar 

  15. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2962–2970. Curran Associates, Inc. (2015)

    Google Scholar 

  16. Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2, 183–192 (1989). https://doi.org/10.1016/0893-6080(89)90003-8

    Article  Google Scholar 

  17. Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995). https://doi.org/10.1162/neco.1995.7.2.219

    Article  Google Scholar 

  18. Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., Sculley, D.: Google vizier: a service for black-box optimization (2017). https://doi.org/10.1145/3097983.3098043

  19. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org

    MATH  Google Scholar 

  20. He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S.: AMC: autoML for model compression and acceleration on mobile devices. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800 (2018)

    Chapter  Google Scholar 

  21. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-T

    Article  Google Scholar 

  22. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8

    Article  MATH  Google Scholar 

  23. Jin, H., Song, Q., Hu, X.: Auto-keras: Efficient neural architecture search with network morphism (2018)

    Google Scholar 

  24. Kolmogorov, A.N.: On the representation of continuous functions of several variables by superposition of continuous function of one variable and addition. Dokl. Akad. Nauk SSSR 114, 953–956 (1957)

    MathSciNet  MATH  Google Scholar 

  25. Leshno, M., Lin, V.Y., Pinkus, A., Shocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6, 861–867 (1993). https://doi.org/10.1016/S0893-6080(05)80131-5

    Article  Google Scholar 

  26. Liu, H., Simonyan, K., Yang, Y.: Hierarchical representations for efficient architecture search. In: International Conference on Learning Representations (2018)

    Google Scholar 

  27. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: International Conference on Learning Representations (2019)

    Google Scholar 

  28. Long, P.M., Sedghi, H.: Size-free generalization bounds for convolutional neural networks. CoRR abs/1905.12600 (2019). https://arxiv.org/pdf/1905.12600v1.pdf

  29. Luo, R., Tian, F., Qin, T., Liu, T.Y.: Neural architecture optimization. In: NeurIPS (2018)

    Google Scholar 

  30. Miller, G.F., Todd, P.M., Hegde, S.U.: Designing neural networks using genetic algorithms. In: Proceedings 3rd International Conference Genetic Algorithms and Their Applications, pp. 379–384 (1989)

    Google Scholar 

  31. Neto, J.P., Siegelmann, H.T., Costa, J.F., Araujo, C.P.S.: Turing universality of neural nets (revisited). In: Pichler, F., Moreno-Díaz, R. (eds.) EUROCAST 1997. LNCS, vol. 1333, pp. 361–366. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0025058

    Chapter  Google Scholar 

  32. Ojha, V.K., Abraham, A., Snášel, V.: Metaheuristic design of feedforward neural networks: a review of two decades of research. Eng. Appl. Artif. Intell. 60(C), 97–116 (2017). https://doi.org/10.1016/j.engappai.2017.01.013

    Article  Google Scholar 

  33. Orponen, P.: Computational complexity of neural networks: a survey. Nordic J. Comput. 1(1), 94–110 (1994)

    MathSciNet  Google Scholar 

  34. Ostrand, P.A.: Dimension of metric spaces and hilbert’s problem 13. Bull. Am. Math. Soc. 71, 619–622 (1965). https://doi.org/10.1090/S0002-9904-1965-11363-5

    Article  MathSciNet  MATH  Google Scholar 

  35. Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3, 246–257 (1991). https://doi.org/10.1162/neco.1991.3.2.246

    Article  Google Scholar 

  36. Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, 10–15 July 2018, vol. 80, pp. 4095–4104. PMLR (2018)

    Google Scholar 

  37. Poggio, T., Girosi, F.: A theory of networks for approximation and learning. A.I. Memo No. 1140 (1989)

    Google Scholar 

  38. Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990). https://doi.org/10.1109/5.58326

    Article  MATH  Google Scholar 

  39. Rabin, M.O.: Computable algebra, general theory and theory of computable fields. Trans. Amer. Math. Soc. 95, 341–360 (1960). https://doi.org/10.1090/S0002-9947-1960-0113807-4

    Article  MathSciNet  MATH  Google Scholar 

  40. Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34\(^{th}\) International Conference on Machine Learning (2017)

    Google Scholar 

  41. Rogers Jr., H.: The Theory of Recursive Functions and Effective Computability. MIT Press, Cambridge (1987)

    Google Scholar 

  42. Schäfer, A.M., Zimmermann, H.G.: Recurrent neural networks are universal approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006). https://doi.org/10.1007/11840817_66

    Chapter  Google Scholar 

  43. Schaffer, J.D., Caruana, R.A., Eshelman, L.J.: Using genetic search to exploit the emergent behavior of neural networks. Physics D 42, 244–248 (1990). https://doi.org/10.1016/0167-2789(90)90078-4

    Article  Google Scholar 

  44. Siegel, J.W., Xu, J.: On the approximation properties of neural networks. arXiv e-prints arXiv:1904.02311 (2019)

  45. Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4, 77–80 (1991). https://doi.org/10.1016/0893-9659(91)90080-F

    Article  MathSciNet  MATH  Google Scholar 

  46. Siegelmann, H.T., Sontag, E.D.: On the computational power of neural nets. J. Comput. Syst. Sci. 50, 132–150 (1995). https://doi.org/10.1006/jcss.1995.1013

    Article  MathSciNet  MATH  Google Scholar 

  47. Sontag, E.D.: VC dimension of neural networks. Neural Netw. Mach. Learn. 168, 69–95 (1998)

    MATH  Google Scholar 

  48. Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through evolutionary algorithms. Nat. Mach. Intell. 1, 24–35 (2019)

    Article  Google Scholar 

  49. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002). https://doi.org/10.1162/106365602320169811

    Article  Google Scholar 

  50. Sun, Y., Yen, G.G., Yi, Z.: Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evol. Comput. 23, 89–103 (2019). https://doi.org/10.1109/TEVC.2018.2808689

    Article  Google Scholar 

  51. Tenorio, M.F., Lee, W.T.: Self organizing neural networks for the identification problem. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 1, pp. 57–64. Morgan-Kaufmann, San Mateo (1989)

    Google Scholar 

  52. Valiant, L.G.: A theory of the learnable. Commun. ACM 27, 1134–1142 (1984). https://doi.org/10.1145/1968.1972

    Article  MATH  Google Scholar 

  53. Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds.) Measures of Complexity, pp. 11–30. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21852-6_3

    Chapter  MATH  Google Scholar 

  54. Vitushkin, A.: Some properties of linear superpositions of smooth functions. Dokl. Akad. Nauk SSSR 156, 1258–1261 (1964)

    MathSciNet  MATH  Google Scholar 

  55. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–87 (1997). https://doi.org/10.1109/4235.585893

    Article  Google Scholar 

  56. Wolpert, D.H., Macready, W.G.: Coevolutionary free lunches. IEEE Trans. Evol. Comput. 9, 721–735 (2005). https://doi.org/10.1109/TEVC.2005.856205

    Article  Google Scholar 

  57. Wong, C., Houlsby, N., Lu, Y., Gesmundo, A.: Transfer learning with neural autoML. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, pp. 8366–8375 (2018)

    Google Scholar 

  58. Yang, X.S.: Metaheuristic optimization: algorithm analysis and open problems. In: Proceedings of the \(10^{th}\) International Symposium on Experimental Algorithms, vol. 6630, pp. 21–32 (2011). https://doi.org/10.1007/978-3-642-20662-7_2

    Chapter  Google Scholar 

  59. Yao, X.: Evolving artificial neural networks. Proc. IEEE 87(9), 1423–1447 (1999). https://doi.org/10.1109/5.784219

    Article  Google Scholar 

  60. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. CoRR abs/1611.01578 (2016)

    Google Scholar 

Download references

Acknowledgments

The author is grateful to the anonymous reviewers for their helpful feedback on this paper, and also thanks Y. Goren, Q. Wang, N. Strom, C. Bejjani, Y. Xu, and B. d’Iverno for their comments and suggestions on the early stages of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian de Wynter .

Editor information

Editors and Affiliations

Appendices

Appendices

A PAC Is a Solver for FA

PAC learning, as defined by Valiant [52], is a slightly different problem than FA, as it concerns itself with whether a concept class C can be described with high probability with a member of a hypothesis class H. It also establishes bounds in terms of the amount of samples from members \(c \in C\) that are needed to learn C. On the other hand, FA and its solution strategies concern themselves with finding a solution that minimizes the error, by searching through sequences of explicitly defined members drawn from a search space.

Regardless of these differences, PAC learning as a procedure can still be formulated as a solution strategy for FA. To do this, let H be our search space. Then note that the PAC error function \(e_{pac}(h, c) = Pr_{x\sim \mathcal {P}}[h(x) \ne c(x)],\; c \in C,\;h \in H\), is equivalent to computing \(\varepsilon _\sigma (h, c)\) for some subset \(\sigma \subset dom(c)\), and choosing the frequentist difference between the images of the functions as the metric d. Our objective would be to return the \(h \in H\) that minimizes the approximation error for a given subset \(\sigma \subset C\). Note that we do not search through the expanded search space \(H^{\star , n}\).

Finding the right distribution for a specific class may be NP-hard [7], and so \(e_{pac}\) requires us to make certain assumptions about the distribution of the input values. Additionally, any optimizer for PAC is required to run in polynomial time. Due to all of this, PAC is a weaker approach to solve FA when compared to ASP, but stronger than ML since this solution strategy is fixed to the design of the search space, and not to the choice of function. Nonetheless, it must be stressed that the bounds and paradigms provided by PAC and FA are not mutually exclusive, either: the most prominent example being that PAC learning provides conditions under which the choice subset \(\sigma \) is optimal.

With the polynomial constraint for PAC learning lifted, and letting the sample and search space sizes grow infinitely, PAC is effectively equivalent to ASP. However, that defies the purpose of the PAC framework, as its success relies on being a tractable learning theory.

B The VC Dimension and the Information Potential

There is a natural correspondence between the VC dimension [7, 53] of a hypothesis space, and the information capacity of a sequence.

To see this, note that the VC dimension is usually defined in terms of the set of concepts (i.e., the input function \(\mathcal {F}\)) that can be shattered by a predetermined function f with \(img(f) = \{0,1\}\). It is frequently used to quantify the ability of a procedure to learn the input function \(\mathcal {F}\).

In the FA framework we are more interested in whether the search space–also a set–of a given solution strategy is able to generalize well to multiple, unseen input functions. Therefore, for fixed \(\mathcal {F}\) and f, the VC dimension and its variants provide a powerful insight on the ability of an algorithm to learn. When f is not fixed, it is still possible to utilize this quantity to measure the capacity of a search space \(\mathcal {S}\), by simply taking the union of all possible \(f \in \mathcal {S}^{\star , n}\) for a given n. However, when the the input functions are not fixed either, we are unable to use the definition of VC dimension in this context, as the set of input concepts is unknown to us. We thus need a more flexible way to model generalizability, and that is where we leverage the information potential \(U(\mathcal {S}, n)\) of a search space.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Wynter, A. (2019). On the Bounds of Function Approximations. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30487-4_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30486-7

  • Online ISBN: 978-3-030-30487-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics