Neural Processing Letters

, Volume 42, Issue 2, pp 437–458 | Cite as

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

  • Philippe Thomas
  • Marie-Christine Suhner


Optimizing the structure of neural networks remains a hard task. If too small, the architecture does not allow for proper learning from the data, whereas if the structure is too large, learning leads to the well-known overfitting problem. This paper considers this issue, and proposes a new pruning approach to determine the optimal structure. Our algorithm is based on variance sensitivity analysis, and prunes the different types of unit (hidden neurons, inputs, and weights) sequentially. The stop criterion is based on a performance evaluation of the network results from both the learning and validation datasets. Four variants of this algorithm are proposed. These variants use two different estimators of the variance. They are tested and compared with four classical algorithms on three classification and three regression problems. The results show that the proposed algorithms outperform the classical approaches in terms of both computational time and accuracy.


Neural network Multilayer perceptron Pruning  Classification Regression Data mining 


  1. 1.
    Rumelhart DE, McClelland JL (1986) Parallel distributed processing: exploration in the microstructure of cognition, vol 1. MIT Press, CambridgeGoogle Scholar
  2. 2.
    Han HG, Qiao JF (2013) A structure optimisation algorithm for feedforward neural network construction. Neurocomputing 99:347–357CrossRefGoogle Scholar
  3. 3.
    Patel MC, Panchal M (2012) A review on ensemble of diverse artificial neural networks. Int J Adv Res Comput Eng Technol 1:63–70CrossRefGoogle Scholar
  4. 4.
    Drucker H (2002) Effect of pruning and early stopping on performance of a boosting ensemble. Comput Stat Data Anal 38:393–406Google Scholar
  5. 5.
    Kerlirzin P, Vallet F (1993) Robustness in multilayer perceptrons. Neural Comput 5:473–482CrossRefMATHGoogle Scholar
  6. 6.
    Thomas P, Bloch G, Sirou F, Eustache V (1999) Neural modeling of an induction furnace using robust learning criteria. J Integr Comput Aided Eng 6:15–25MATHGoogle Scholar
  7. 7.
    Williams PM (1995) Bayesian regularization and pruning using a Laplace prior. Neural Comput 7:117–143CrossRefGoogle Scholar
  8. 8.
    Bartlett PL (1997) For valid generalization, the size of the weights is more important than the size of the network. Adv Neural Inf Process Systs 9:134–140. Accessed 10 July 2013
  9. 9.
    Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861Google Scholar
  10. 10.
    Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, OxfordGoogle Scholar
  11. 11.
    Beigy H, Meybodi MR (2009) A learning automata-based algorithm for determination of the number of hidden units for three-layer neural networks. Int J Syst Sci 40:101–118CrossRefMATHGoogle Scholar
  12. 12.
    Kruschke JH (1989) Improving generalization in backpropagation networks. Proc Int Joint Conf Neural Netw 1:443–447CrossRefGoogle Scholar
  13. 13.
    LeCun Y, Denker J, Solla S, Howard RE, Jackel LD (1990) Optimal brain damage. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San MateoGoogle Scholar
  14. 14.
    Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SH, Cowan JD, Gilles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San MateoGoogle Scholar
  15. 15.
    Reed R (1993) Pruning algorithm—a survey. IEEE Trans Neural Netw 4:740–747CrossRefGoogle Scholar
  16. 16.
    Jutten C, Fambon O (1995) Pruning methods: a review. Proc Eur Symp Artif Neural Netw pp 129–140Google Scholar
  17. 17.
    Mezard M, Nadal JP (1989) Learning in feedforward neural networks: the tiling algorithm. J Phys 22:1285–1296MathSciNetMATHGoogle Scholar
  18. 18.
    Fahlman SE, Lebier C (1990) The cascade-correlation learning architecture. Computer Science Department paper 138. Accessed 10 July 2013
  19. 19.
    Yeung DY (1991) Automatic determination of network size for supervised learning. IEEE Int J Conf Neural Netw 1:158–164MathSciNetMATHGoogle Scholar
  20. 20.
    Chentouf R, Jutten C (1996) Combining sigmoids and radial basis function in evolutive neural architecture. Eur Symp Artif Neural Netw 129–134Google Scholar
  21. 21.
    Hirose Y, Yamashita K, Hijita S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66CrossRefGoogle Scholar
  22. 22.
    Nabhan TM, Zomaya AY (1994) Toward generating neural network structures for function approximation. Neural Netw 7:89–99CrossRefMATHGoogle Scholar
  23. 23.
    Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing-pruning method for feedforward network training. Neurocomputing 71:2831–2847CrossRefGoogle Scholar
  24. 24.
    Whitley D, Bogart C (1990) The evolution of connectivity: pruning neural networks using genetic algorithms. Proc Int J Conf Neural Netw 134–137Google Scholar
  25. 25.
    Angeline PJ, Saunders GM, Pollack JB (1994) Evolutionary algorithm that construct recurrent neural networks. IEEE Trans Neural Netw 5:54–65CrossRefMATHGoogle Scholar
  26. 26.
    Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12:1386–1399CrossRefMATHGoogle Scholar
  27. 27.
    Kolmogorov AN (1957) On the representational of continuous functions of many variables by superpositions of continuous functional of one variable and addition. Doklady Akademii Nauk URSS 114:953–956MathSciNetGoogle Scholar
  28. 28.
    Cybenko G (1989) Approximation by superposition of a sigmoïdal function. Math Control Syst Signals 2:303–314MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Funahashi K (1989) On the approximate realisation of continuous mapping by neural networks. Neural Netw 2:183–192CrossRefMATHGoogle Scholar
  30. 30.
    Hecht-Nielsen R (1987) Kolmogorov’s mapping neural network existence theorem. IEEE First Ann Int Conf Neural Netw 3:11–13MATHGoogle Scholar
  31. 31.
    Huang GB (2003) Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans Neural Netw 14:274–281CrossRefGoogle Scholar
  32. 32.
    Dreyfus G, Martinez JM, Samuelides M, Gordon MB, Badran F, Thiria S, Hérault L (2002) Réseaux de neurones: méthodologies et applications. Editions Eyrolles, ParisGoogle Scholar
  33. 33.
    Yao X (1993) Evolutionary artificial neural networks. Int J Neural Syst 4:203–222CrossRefGoogle Scholar
  34. 34.
    Huang DS, Du JX (2008) A constructive hybrid structure approach to solve routing problems. IEEE Trans Neural Netw 19:2099–2115CrossRefMATHGoogle Scholar
  35. 35.
    Mantzaris D, Anastassopoulos G, Adamopoulos A (2011) Genetic algorithm pruning of probabilitic neural networks in medical disease estimation. Neural Netw 24:831–835CrossRefGoogle Scholar
  36. 36.
    Stathakis D (2009) How many hidden layers and nodes? Int J Remote Sens 30:2133–2147CrossRefGoogle Scholar
  37. 37.
    Yang SH, Chen YP (2012) An evolutionary constructive and pruning algorithm for artificial neural networks and its applications. Neurocomputing 86:140–149CrossRefGoogle Scholar
  38. 38.
    Maniezzo V (1994) Genetic evolution of the topology and weight distribution of neural networks. IEEE Trans Neural Netw 5:39–53CrossRefMATHGoogle Scholar
  39. 39.
    Wilamowski BM, Cotton NJ, Kaynak O, Dundar G (2007) Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. Proc IEEE ISIE 3298–3789Google Scholar
  40. 40.
    Puma-Villanueva WJ, Von Zuben FJ (2010) Evolving arbitrarily connected feedforward neural networks via genetic algorithm. Proc Brazilian Symp Artif Neural Netw 23–28Google Scholar
  41. 41.
    Puma-Villanueva WJ, Dos Santos EP, Von Zuben FJ (2012) A constructive algorithm to synthesize arbitrarily connected feedforward neural networks. Neurocomputing 75:23–28CrossRefMATHGoogle Scholar
  42. 42.
    Kwok TY, Yeung DY (1997) Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans Neural Netw 8:630–645CrossRefGoogle Scholar
  43. 43.
    Ash T (1989) Dynamic node creation in backpropagation networks. Connect Sci 1:365–375CrossRefGoogle Scholar
  44. 44.
    Setiono R, Hui LCK (1995) Use of a quasi-Newton method in a feedforward neural network construction algorithm. IEEE Trans Neural Netw 6:273–277CrossRefGoogle Scholar
  45. 45.
    Lee TC (1991) Neural network systems techniques and applications: algorithms and architectures. Academic Press, New-YorkGoogle Scholar
  46. 46.
    Weng W, Khorasani K (1996) An adaptive structural neural network with application to EEG automatic seizure detection. Neural Netw 9:1223–1240CrossRefGoogle Scholar
  47. 47.
    Prechelt L (1997) Investigation of the CasCor family of learning algorithms. Neural Netw 10:885–896CrossRefGoogle Scholar
  48. 48.
    Ma L, Khorasani K (2004) New training strategies for constructive neural networks with application to regression problems. Neural Netw 17:589–609CrossRefGoogle Scholar
  49. 49.
    Ma L, Khorasani K (2005) Constructive feedforward neural networks using Hermite polynomial activation functions. IEEE Trans Neural Netw 16:821–833CrossRefGoogle Scholar
  50. 50.
    Islam M, Sattar A, Amin F, Yao X, Murase K (2009) A new constructive algorithm for architectural and functional adaptation of artificial neural networks. IEEE Trans Syst Man Cybern Part B 39:1590–1605CrossRefGoogle Scholar
  51. 51.
    Bebis G, Georgiopoulos M (1994) Feed-forward neural networks: why network size is so important. IEEE Potentials 13:27–31CrossRefMATHGoogle Scholar
  52. 52.
    Khosravi A, Nahavendi S, Creighton D (2010) A prediction interval-based approach to determine optimal structures of neural network metamodels. Expert Syst Appl 37:2377–2387CrossRefGoogle Scholar
  53. 53.
    Chauvin Y (1988) A back-propagation algorithm with optimal use of hidden units. In: Touretzky DS (ed) Advances in neural information processing, pp 519–526. Accessed 10 July 2013
  54. 54.
    Weigend AS, Rumelhart DE, Huberman BA (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann R, Moody J, Touretzky DS (eds) Advances in neural information processing, pp 875–882. Accessed 10 July 2013
  55. 55.
    Nowland SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4:473–493CrossRefGoogle Scholar
  56. 56.
    Larsen J, Hansen LK (1994) Generalization performance of regularized neural network models. IEEE workshop on neural network for signal processing, pp 42–51Google Scholar
  57. 57.
    Leung CS, Wang HJ, Sum J (2010) On the selection of weight decay parameter for faulty networks. IEEE Trans Neural Netw 21:1232–1244CrossRefMATHGoogle Scholar
  58. 58.
    Wang J, Wu W, Zurada JM (2012) Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty. Neural Netw 33:127–135CrossRefMATHGoogle Scholar
  59. 59.
    Setiono R, Leow WK (2000) Pruned neural networks for regression. 6th Pacific RIM international conference on artificial intelligence PRICAI’00, pp 500–509Google Scholar
  60. 60.
    Norgaard M (1996) System identification and control with neural networks, Ph.D. dissertation, Institute of Automation, Technical University of DenmarkGoogle Scholar
  61. 61.
    Thomas P, Bloch G (1998) Robust pruning for multilayer perceptrons. IMACS/IEEE multiconference on computational engineering in systems applications CESA’98, pp 17–22Google Scholar
  62. 62.
    Leung CS, Wang HJ, Sum J, Chan LW (2001) A pruning method for the recursive least squared algorithm. Neural Netw 14:147–174CrossRefGoogle Scholar
  63. 63.
    Tang HS, Xue ST, Chen R, Sato T (2007) \(\text{ H }\infty \) Filtering method for neural network training and pruning. J Comp Civ Eng 21:47–58CrossRefGoogle Scholar
  64. 64.
    Ricotti ME, Zio E (1999) Neural network approach to sensitivity and uncertainty analysis. Reliab Eng Syst Saf 64:59–71CrossRefGoogle Scholar
  65. 65.
    Zeng X, Yeung DS (2006) Hidden neuron pruning of multilayer perceptrons using a quantified sensitivity measure. Neurocomputing 69:825–837CrossRefGoogle Scholar
  66. 66.
    Chandrasekaran H, Chen HH, Maury MT (2000) Pruning of basis functions in nonlinear approximators. Neurocomputing 34:29–53CrossRefMATHGoogle Scholar
  67. 67.
    Lauret P, Fock E, Mara TA (2006) A node pruning algorithm based on a Fourier amplitude sensitivity test method. IEEE Trans Neural Netw 17:273–293CrossRefGoogle Scholar
  68. 68.
    Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258CrossRefGoogle Scholar
  69. 69.
    Levin AU, Leen TK, Moody JE (1994) Fast pruning using principal components. In; Cowan J, Tesauro G, Alspector J (eds) Advances in neural information processing. Accessed 10 July 2013
  70. 70.
    Medeiros CMS, Baretto GA (2013) A novel weight pruning method for MLP classifiers on the MAXCORE principle. Neural Comput Appl 22:71–84CrossRefGoogle Scholar
  71. 71.
    Liang X (2007) Removal of hidden neurons in MLP by orthogonal projection and weight crosswise propagation. Neural Comput Appl 16:57–68CrossRefGoogle Scholar
  72. 72.
    Rivals I, Personnaz L (2003) Neural network construction and selection in nonlinear modeling. IEEE Trans Neural Netw 14:804–819CrossRefGoogle Scholar
  73. 73.
    Huang GB, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16:57–67CrossRefGoogle Scholar
  74. 74.
    Hsu CF (2008) Adaptive growing and pruning neural network control for a linear piezoelectric ceramic motor. Eng Appl Artif Intell 21:1153–1163CrossRefGoogle Scholar
  75. 75.
    Qi M, Zhang GP (2001) An investigation of model selection criteria for neural network time series forecasting. Eur J Oper Res 132:666–680CrossRefGoogle Scholar
  76. 76.
    Egrioglu E, Aladag CH, Gunay S (2008) A new model selection strategy in artificial neural networks. Appl Math Comput 195:591–597MathSciNetCrossRefGoogle Scholar
  77. 77.
    Hand D, Mannila H, Smyth P (2001) Principles of data mining. The MIT press, CambridgeGoogle Scholar
  78. 78.
    Thomas P, Bloch G (1997) Initialization of one hidden layer feed-forward neural networks for non-linear system identification. Proceedings of the 15th IMACS world congress on scientific computation, modelling and applied mathematics WC’97, pp 295–300Google Scholar
  79. 79.
    Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int J Conf Neural Netw 3:21–26Google Scholar
  80. 80.
    Demuth H, Beale P (1994) Neural networks toolbox user’s guide V2.0. The MathWorks Inc., NatickGoogle Scholar
  81. 81.
    Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101CrossRefMATHGoogle Scholar
  82. 82.
    Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. Neural Netw Comput 2:40–48Google Scholar
  83. 83.
    Tarr G (1991) Multilayered feedforward networks for image segmentation, Ph.D. dissertation, Air Force Inst. Technol. Wright-Patterson AFBGoogle Scholar
  84. 84.
    Ljung L (1987) System identification: theory for the user. Prentice-Hall, Englewood CliffsMATHGoogle Scholar
  85. 85.
    Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proceedings of the symposium on computer application and medical care, pp 261–265Google Scholar
  86. 86.
    WEKA. Accessed 10 July 2013
  87. 87.
    Sigillito VG, Wing SP, Hutton LV, Baker KB (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech Digest 10:262–266Google Scholar
  88. 88.
    Noyel M, Thomas P, Charpentier P, Thomas A, Beaupretre B (2013) Improving production process performance thanks to neuronal analysis. Proceedings of 11th IFAC workshop on intelligent manufacturing system IMS’13Google Scholar
  89. 89.
    Thomas P, Thomas A (2008) Elagage d’un perceptron multicouches: utilisation de l’analyse de la variance de la sensibilité des paramètres. \(5^{{\grave{\rm e}}{\rm me}}\) Conférence Internationale Francophone d’Automatique CIFA’08Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Université de Lorraine, CRANVandœuvre-lès-Nancy CedexFrance
  2. 2.CNRS, CRAN7039France

Personalised recommendations