Minimum Risk Neural Networks and Weight Decay Technique

  • I-Cheng Yeh
  • Pei-Yen Tseng
  • Kuan-Chieh Huang
  • Yau-Hwang Kuo
Part of the Communications in Computer and Information Science book series (CCIS, volume 304)


To enhance the generalization of neural network model, we proposed a novel neural network, Minimum Risk Neural Networks (MRNN), whose principle is the combination of minimizing the sum of squares of error and maximizing the classification margin, based on the principle of structural risk minimization. Therefore, the objective function of MRNN is the combination of the sum of squared error and the sum of squares of the slopes of the classification function. Besides, we derived a more sophisticated formula similar to the traditional weight decay technique from the MRNN, establishing a more rigorous theoretical basis for the technique. This study employed several real application examples to test the MRNN. The results led to the following conclusions. (1) As long as the penalty coefficient was in the appropriate range, MRNN performed better than pure MLP. (2) MRNN may perform better in difficult classification problems than MLP using weight decay technique.


multi-layer perceptrons weight decay support vector machine structural risk minimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wu, L.Z., Moody, J.: A Smoothing Regularizer for Feedforward and Recurrent Neural Networks. Neural Computation 8(3), 461–489 (1996)zbMATHCrossRefGoogle Scholar
  2. 2.
    Krogh, A., Hertz, J.A.: A Simple Weight Decay Can Improve Generalization. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, San Mateo, CA, pp. 450–957 (1992)Google Scholar
  3. 3.
    Krogh, A., Hertz, J.A.: A Simple Weight Decay Can Improve Generalization. In: Advances in Neural Information Processing Systems, vol. 4, pp. 950–957 (1992)Google Scholar
  4. 4.
    Hinton, G.E., Camp, D.: Keeping the Neural Networks Simple by Minimizing the Description Length of the Weights. In: Proceedings of the Sixth Annual Conference on Computational Learning Theory, pp. 5–13 (1993)Google Scholar
  5. 5.
    Treadgold, N.K., Gedeon, T.D.: Simulated Annealing and Weight Decay in Adaptive Learning: the SARPROP algorithm. IEEE Transactions on Neural Networks 9(4), 662–668 (1998)CrossRefGoogle Scholar
  6. 6.
    Cortes, F., Vapnik, V.: Support Vector Networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  7. 7.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)zbMATHGoogle Scholar
  8. 8.
    Drucker, H., Wu, D., Vapink, V.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)CrossRefGoogle Scholar
  9. 9.
    Burges, C.: A Tutorial on Support Vector Machines for Pattern Recognitionl. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)CrossRefGoogle Scholar
  10. 10.
    Fan, R.E., Chen, P.H., Lin, C.J.: Working Set Selection using Second Order Information for Training Support Vector Machines. The Journal of Machine Learning Research 6, 1889–1918 (2005)MathSciNetzbMATHGoogle Scholar
  11. 11.
    UCI Machine Learning Repository Content Summary (2008),

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • I-Cheng Yeh
    • 1
  • Pei-Yen Tseng
    • 2
  • Kuan-Chieh Huang
    • 3
  • Yau-Hwang Kuo
    • 3
  1. 1.Department of Civil EngineeringTamkang UniversityTaiwan
  2. 2.Department of Information ManagementChung Hua UniversityTaiwan
  3. 3.Department of Computer Science and Information EngineeringNational Cheng Kung UniversityTaiwan

Personalised recommendations