Direct Zero-Norm Minimization for Neural Network Pruning and Training

  • S. P. Adam
  • George D. Magoulas
  • M. N. Vrahatis
Part of the Communications in Computer and Information Science book series (CCIS, volume 311)


Designing a feed-forward neural network with optimal topology in terms of complexity (hidden layer nodes and connections between nodes) and training performance has been a matter of considerable concern since the very beginning of neural networks research. Typically, this issue is dealt with by pruning a fully interconnected network with “many” nodes in the hidden layers, eliminating “superfluous” connections and nodes. However the problem has not been solved yet and it seems to be even more relevant today in the context of deep learning networks. In this paper we present a method of direct zero-norm minimization for pruning while training a Multi Layer Perceptron. The method employs a cooperative scheme using two swarms of particles and its purpose is to minimize an aggregate function corresponding to the total risk functional. Our discussion highlights relevant computational and methodological issues of the approach that are not apparent and well defined in the literature.


Neural networks pruning training zero-norm minimization Particle Swarm Optimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Norgaard, M.: Neural Network Based System Identification Toolbox, version 2. Technical report, 00-E-891, Dept. of Automation, Technical University of Denmark (2000)Google Scholar
  2. 2.
    Stepniewski, S.W., Keane, A.J.: Topology Design of Feedforward Neural Networks by Genetic Algorithms. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 771–780. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  3. 3.
    Pinkus, A.: Approximation theory of the MLP model in neural model. Acta Numerica, 143–195 (1999)Google Scholar
  4. 4.
    Jones, L.K.: A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. The Annals of Statistics 20, 601–613 (1992)CrossRefGoogle Scholar
  5. 5.
    Barron, A.R.: Universal approximation bounds for superposition of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930–945 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Kůrková, V., Kainen, P.C., Kreinovich, V.: Estimates of the number of hidden units and variation with respect to half-spaces. Neural Networks 10, 1061–1068 (1997)CrossRefGoogle Scholar
  7. 7.
    Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 251–257 (1991)CrossRefGoogle Scholar
  8. 8.
    Reed, R.: Pruning algorithms - A Survey. IEEE Trans. Neural Networks 4, 740–747 (1993)CrossRefGoogle Scholar
  9. 9.
    Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-posed Problems. W.H. Winston, Washington, DC (1977)Google Scholar
  10. 10.
    Haykin, S.: Neural networks: A comprehensive Foundation. Prentice-Hall, Upper Saddle River (1999)zbMATHGoogle Scholar
  11. 11.
    Hinton, G.E.: Connectionist learning procedures. Artificial Intelligence 40, 185–234 (1989)CrossRefGoogle Scholar
  12. 12.
    Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: Lippmann, R., Moody, J., Touretzky, D. (eds.) Advances in Neural Information Processing Systems (3), pp. 875–882. Morgan-Kaufmann, San Mateo (1991)Google Scholar
  13. 13.
    Mozer, M.C., Smolensky, P.: Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems (1), pp. 40–48. Morgan Kaufmann, San Francisco (1989)Google Scholar
  14. 14.
    Karnin, E.D.: A simple procedure for pruning back-propagation trained neural networks. IEEE Trans. Neural Networks 1, 239–242 (1990)CrossRefGoogle Scholar
  15. 15.
    LeCun, Y., Denker, J.S., Solla, S.A.: Optimal Brain Damage. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems (2), pp. 598–605. Morgan Kaufmann, San Francisco (1990)Google Scholar
  16. 16.
    Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: Optimal Brain Surgeon. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems (5), pp. 164–172. Morgan-Kaufmann, San Mateo (1993)Google Scholar
  17. 17.
    Hancock, P.J.B.: Pruning neural networks by genetic algorithm. In: Aleksander, I., Taylor, J.G. (eds.) Proc. of the International Conference on Artificial Neural Networks, pp. 991–994. Elsevier, Brighton (1992)Google Scholar
  18. 18.
    Whitley, D.: Genetic Algorithms and Neural Networks. Genetic Algorithms in Engineering and Computer Science, pp. 191–201. John Wiley (1995)Google Scholar
  19. 19.
    Garro, B.A., Sossa, H., Vazquez, R.A.: Design of artificial neural networks using a modified particle swarm optimization algorithm. In: Proc. IEEE International Joint Conference on Neural Networks, Atlanta, pp. 938–945 (2009)Google Scholar
  20. 20.
    Zhao, L., Qian, F.: Tuning the structure and parameters of a neural network using cooperative binary-real particle swarm optimization. Expert Systems with Applications (2010)Google Scholar
  21. 21.
    Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Machine Learning Res. 3, 1439–1461 (2003)zbMATHGoogle Scholar
  22. 22.
    Fung, G.M., Mangasarian, O.L., Smola, A.J.: Minimal kernel classifiers. J. Machine Learning Res. 3, 303–321 (2002)MathSciNetGoogle Scholar
  23. 23.
    Amaldi, E., Kann, V.: On the approximability of minimizing non zero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 237–260 (1998)Google Scholar
  24. 24.
    Moody, J.E., Rögnvaldsson, T.: Smoothing regularizers for projective basis function networks. In: Mozer, M., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems (9), pp. 585–591. MIT Press, Denver (1997)Google Scholar
  25. 25.
    Hanson, S.J., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems (1), pp. 177–185. Morgan Kaufmann, San Francisco (1989)Google Scholar
  26. 26.
    Parsopoulos, K.E., Tasoulis, D.K., Vrahatis, M.N.: Multi-objective optimization using parallel vector evaluated particle swarm optimization. In: Proc. of the IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, vol. 2, pp. 823–828 (2004)Google Scholar
  27. 27.
    van de Bergh, F., Engelbrecht, A.P.: A cooperative approach to particle swarm optimization. IEEE Trans. Evolutionary Computation 8, 1–15 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • S. P. Adam
    • 1
    • 2
  • George D. Magoulas
    • 3
  • M. N. Vrahatis
    • 1
  1. 1.Computational Intelligence Laboratory, Dept. of MathematicsUniversity of PatrasRionGreece
  2. 2.Dept. of Informatics and TelecommunicationsTechnological Education Institute of EpirusArtaGreece
  3. 3.Dept. of Computer Science and Information SystemsBirkbeck College, University of LondonUnited Kingdom

Personalised recommendations