Skip to main content

Multilayer Perceptrons: Architecture and Error Backpropagation

  • Chapter
  • First Online:

Abstract

MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abid, S., Fnaiech, F., & Najim, M. (2001). A fast feedforward training algorithm using a modified form of the standard backpropagation algorithm. IEEE Transactions on Neural Networks, 12(2), 424–430.

    Google Scholar 

  2. Aires, F., Schmitt, M., Chedin, A., & Scott, N. (1999). The “weight smoothing” regularization of MLP for Jacobian stabilization. IEEE Transactions on Neural Networks, 10(6), 1502–1510.

    Google Scholar 

  3. Anastasiadis, A. D., Magoulas, G. D., & Vrahatis, M. N. (2005). New globally convergent training scheme based on the resilient propagation algorithm. Neurocomputing, 64, 253–270.

    Google Scholar 

  4. Barhen, J., Protopopescu, V., & Reister, D. (1997). TRUST: A deterministic algorithm for global optimization. Science, 276, 1094–1097.

    MATH  MathSciNet  Google Scholar 

  5. Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945.

    MATH  MathSciNet  Google Scholar 

  6. Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods. Complex Systems, 3, 331–342.

    MATH  Google Scholar 

  7. Baykal, N. & Erkmen, A. M. (2000). Resilient backpropagation for RBF networks. In Proceedings of 4th International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies (pp. 624–627). Brighton, UK.

    Google Scholar 

  8. Behera, L., Kumar, S., & Patnaik, A. (2006). On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Transactions on Neural Networks, 17(5), 1116–1125.

    Google Scholar 

  9. Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1), 108–116.

    Google Scholar 

  10. Brouwer, R. K. (1997). Training a feed-forward network by feeding gradients forward rather than by back-propagation of errors. Neurocomputing, 16, 117–126.

    Google Scholar 

  11. Castellano, G., Fanelli, A. M., & Pelillo, M. (1997). An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks, 8(3), 519–531.

    Google Scholar 

  12. Cetin, B. C., Burdick, J. W. & Barhen, J. (1993). Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In Proceedings of IEEE International Conference on Neural Networks (pp. 836–842). San Francisco.

    Google Scholar 

  13. Chandra, P., & Singh, Y. (2004). An activation function adapting training algorithm for sigmoidal feedforward networks. Neurocomputing, 61, 429–437.

    Google Scholar 

  14. Chandrasekaran, H., Chen, H. H., & Manry, M. T. (2000). Pruning of basis functions in nonlinear approximators. Neurocomputing, 34, 29–53.

    MATH  Google Scholar 

  15. Chen, D. S., & Jain, R. C. (1994). A robust backpropagation learning algorithm for function approximation. IEEE Transactions on Neural Networks, 5(3), 467–479.

    Google Scholar 

  16. Chester, D. L. (1990). Why two hidden layers are better than one. In Proceedings of International Joint Conference on Neural Networks (pp. 265–268). Washington DC.

    Google Scholar 

  17. Choi, J. J., Arabshahi, P., Marks II, R. J. & Caudell, T. P. (1992). Fuzzy parameter adaptation in neural systems. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 232–238). Baltimore, MD.

    Google Scholar 

  18. Chuang, C. C., Su, S. F., & Hsiao, C. C. (2000). The annealing robust backpropagation (ARBP) learning algorithm. IEEE Transactions on Neural Networks, 11(5), 1067–1077.

    Google Scholar 

  19. Cibas, T., Soulie, F. F., Gallinari, P., & Raudys, S. (1996). Variable selection with neural networks. Neurocomputing, 12, 223–248.

    MATH  Google Scholar 

  20. Cichocki, A., & Unbehauen, R. (1992). Neural networks for optimization and signal processing. New York: Wiley.

    Google Scholar 

  21. Costa, P., & Larzabal, P. (1999). Initialization of supervised training for parametric estimation. Neural Processing Letters, 9, 53–61.

    MATH  Google Scholar 

  22. Cybenko, G. (1989). Approximation by superposition of a sigmoid function. Mathematics of Control, Signals, and Systems, 2, 303–314.

    Google Scholar 

  23. Denoeux, T., & Lengelle, R. (1993). Initializing backpropagation networks with prototypes. Neural Networks, 6(3), 351–363.

    Google Scholar 

  24. Drago, G., & Ridella, S. (1992). Statistically controlled activation weight initialization (SCAWI). IEEE Transactions on Neural Networks, 3(4), 627–631.

    Google Scholar 

  25. Duch, W. (2005). Uncertainty of data, fuzzy membership functions, and multilayer perceptrons. IEEE Transactions on Neural Networks, 16(1), 10–23.

    Google Scholar 

  26. Engelbrecht, A. P. (2001). A new pruning heuristic based on variance analysis of sensitivity information. IEEE Transactions on Neural Networks, 12(6), 1386–1399.

    Google Scholar 

  27. Eom, K., Jung, K., & Sirisena, H. (2003). Performance improvement of backpropagation algorithm by automatic activation function gain tuning using fuzzy logic. Neurocomputing, 50, 439–460.

    MATH  Google Scholar 

  28. Fabisch, A., Kassahun, Y., Wohrle, H., & Kirchner, F. (2013). Learning in compressed space. Neural Networks, 42, 83–93.

    Google Scholar 

  29. Fahlman, S. E. (1988). Fast learning variations on back-propation: an empirical study. In D. S. Touretzky, G. E. Hinton & T. Sejnowski (Eds.), Proceedings of 1988 Connectionist Models Summer School (San Mateo, CA: Morgan Kaufmann, 1988) (pp. 38–51). Pittsburgh.

    Google Scholar 

  30. Fahlman, S. E., & Lebiere, C. (1990). The cascade-correlation learning architecture. In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2 (San Mateo (pp. 524–532). CA: Morgan Kaufmann.

    Google Scholar 

  31. Finnoff, W. (1994). Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. Neural Computation, 6(2), 285–295.

    MathSciNet  Google Scholar 

  32. Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2(2), 198–209.

    Google Scholar 

  33. Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2(3), 183–192.

    Google Scholar 

  34. Gallant, S. I. (1990). Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2), 179–191.

    MathSciNet  Google Scholar 

  35. Goh, Y. S. & Tan, E. C. (1994). Pruning neural networks during training by backpropagation. In Proceedings of IEEE Region 10’s Ninth Ann. Int. Conf. (TENCON’94) (pp. 805–808). Singapore.

    Google Scholar 

  36. Gupta, A., & Lam, S. M. (1998). Weight decay backpropagation for noisy data. Neural Networks, 11, 1127–1137.

    Google Scholar 

  37. Hannan, J. M., & Bishop, J. M. (1996). A Class of fast artificial NN training algorithms. Technical Report JMH-JMB 01/96, Department of Cybernetics, University of Reading, UK.

    Google Scholar 

  38. Hannan, J. M. & Bishop, J. M. (1997) A comparison of fast training algorithms over two real problems. In Proceedings of IEE Conference on Artificial Neural Networks (pp. 1–6). Cambridge, UK.

    Google Scholar 

  39. Hassibi, B., Stork, D. G., & Wolff, G. J. (1992). Optimal brain surgeon and general network pruning. In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco.

    Google Scholar 

  40. Heskes, T., & Wiegerinck, W. (1996). A theoretical comparison of batch-mode, online, cyclic, and almost-cyclic learning. IEEE Transactions on Neural Networks, 7, 919–925.

    Google Scholar 

  41. Hinton, G. E. (1987). Connectionist learning procedures. Technical Report CMU-CS-87-115, Carnegie-Mellon University, Computer Sci. Department, Pittsburgh, PA.

    Google Scholar 

  42. Hinton, G. E. (1989). Connectionist learning procedure. Artificial Intelligence, 40, 185–234.

    Google Scholar 

  43. Hornik, K. M., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366.

    Google Scholar 

  44. Huang, G. B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2), 274–281.

    Google Scholar 

  45. Igel, C., & Husken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105–123.

    MATH  Google Scholar 

  46. Ishikawa, M. (1995). Learning of modular structured networks. Artificial Intelligence, 75, 51–62.

    MathSciNet  Google Scholar 

  47. Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.

    Google Scholar 

  48. Jiang, X., Chen, M., Manry, M. T., Dawson, M. S., & Fung, A. K. (1994). Analysis and optimization of neural networks for remote sensing. Remote Sensing Reviews, 9, 97–144.

    Google Scholar 

  49. Jiang, M., & Yu, X. (2001). Terminal attractor based back propagation learning for feedforward neural networks. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (Vol. 2, pp. 711–714). Sydney, Australia.

    Google Scholar 

  50. Kamarthi, S. V., & Pittner, S. (1999). Accelerating neural network training using weight extrapolations. Neural Networks, 12, 1285–1299.

    Google Scholar 

  51. Kanjilal, P. P., & Banerjee, D. N. (1995). On the application of orthogonal transformation for the design and analysis of feedforward networks. IEEE Transactions on Neural Networks, 6(5), 1061–1070.

    Google Scholar 

  52. Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks, 1(2), 239–242.

    Google Scholar 

  53. Khashman, A. (2008). A modified backpropagation learning algorithm with added emotional coefficients. IEEE Transactions on Neural Networks, 19(11), 1896–1909.

    Google Scholar 

  54. Kolen, J. F., & Pollack, J. B. (1990). Backpropagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280.

    MATH  Google Scholar 

  55. Kozma, R., Sakuma, M., Yokoyama, Y., & Kitamura, M. (1996). On the accuracy of mapping by neural networks trained by backpropagation with forgetting. Neurocomputing, 13, 295–311.

    Google Scholar 

  56. Kruschke, J. K., & Movellan, J. R. (1991). Benefits of gain: Speeded learning and minimal layers in back-propagation networks. IEEE Transactions on Systems, Man, and Cybernetics, 21(1), 273–280.

    MathSciNet  Google Scholar 

  57. Kwok, T. Y., & Yeung, D. Y. (1997). Objective functions for training new hidden units in constructive neural networks. IEEE Transactions on Neural Networks, 8(5), 1131–1148.

    Google Scholar 

  58. Le Cun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2 (pp. 598–605). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  59. Le Cun, Y., Kanter, I., & Solla, S. A. (1991). Second order properties of error surfaces: learning time and generalization. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in Neural Information Processing Systems 3 (pp. 918–924). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  60. Le Cun, Y., Simard, P. Y., & Pearlmutter, B. (1993). Automatic learning rate maximization by on-line estimation of the Hessian’s eigenvectors. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in neural information processing systems 5 (pp. 156–163). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  61. Lee, Y., Oh, S. H., & Kim, M. W. (1991). The effect of initial weights on premature saturation in back-propagation training. In Proc. IEEE International Joint Conference on Neural Networks (Vol. 1, pp. 765–770). Seattle, WA.

    Google Scholar 

  62. Lee, H. M., Chen, C. M., & Huang, T. C. (2001). Learning efficiency improvement of back-propagation algorithm by error saturation prevention method. Neurocomputing, 41, 125–143.

    MATH  Google Scholar 

  63. Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1995). Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm. Neural Computation, 7, 982–999.

    Google Scholar 

  64. Lehtokangas, M., Korpisaari, P., & Kaski, K. (1996). Maximum covariance method for weight initialization of multilayer perceptron networks. In Proceedings of European Symposium on Artificial Neural Networks (ESANN’96) (pp. 243–248). Bruges, Belgium.

    Google Scholar 

  65. Lehtokangas, M. (1999). Modelling with constructive backpropagation. Neural Networks, 12, 707–716.

    Google Scholar 

  66. Leung, C. S., Wong, K. W., Sum, P. F., & Chan, L. W. (2001). A pruning method for the recursive least squared algorithm. Neural Networks, 14, 147–174.

    Google Scholar 

  67. Levin, A. U., Leen, T. K., & Moody, J. E. (1994). Fast pruning using principal components. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6 (pp. 35–42). San Francisco, CA: Morgan Kaufman.

    Google Scholar 

  68. Liang, Y. C., Feng, D. P., Lee, H. P., Lim, S. P., & Lee, K. H. (2002). Successive approximation training algorithm for feedforward neural networks. Neurocomputing, 42, 311–322.

    MATH  Google Scholar 

  69. Liu, D., Chang, T. S., & Zhang, Y. (2002). A constructive algorithm for feedforward neural networks with incremental training. IEEE Transactions on Circuits and Systems I, 49(12), 1876–1879.

    Google Scholar 

  70. Llanas, B., Lantaron, S., & Sainz, F. J. (2008). Constructive approximation of discontinuous functions by neural networks. Neural Processing Letters, 27, 209–226.

    Google Scholar 

  71. MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4(3), 415–447.

    Google Scholar 

  72. Magdon-Ismail, M., & Atiya, A. F. (2000). The early restart algorithm. Neural Computation, 12, 1303–1312.

    Google Scholar 

  73. Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1997). Effective backpropagation training with variable stepsize. Neural Networks, 10(1), 69–82.

    Google Scholar 

  74. Magoulas, G. D., Plagianakos, V. P., & Vrahatis, M. N. (2002). Globally convergent algorithms with local learning rates. IEEE Transactions on Neural Networks, 13(3), 774–779.

    Google Scholar 

  75. Maiorov, V., & Pinkus, A. (1999). Lower bounds for approximation by MLP neural networks. Neurocomputing, 25, 81–91.

    MATH  Google Scholar 

  76. Man, Z., Wu, H. R., Liu, S., & Yu, X. (2006). A new adaptive backpropagation algorithm based on Lyapunov stability theory for neural networks. IEEE Transactions on Neural Networks, 17(6), 1580–1591.

    Google Scholar 

  77. Manry, M. T., Apollo, S. J., Allen, L. S., Lyle, W. D., Gong, W., Dawson, M. S., et al. (1994). Fast training of neural networks for remote sensing. Remote Sensing Reviews, 9, 77–96.

    Google Scholar 

  78. Martens, J. P., & Weymaere, N. (2002). An equalized error backpropagation algorithm for the on-line training of multilayer perceptrons. IEEE Transactions on Neural Networks, 13(3), 532–541.

    Google Scholar 

  79. Mastorocostas, P. A. (2004). Resilient back propagation learning algorithm for recurrent fuzzy neural networks. Electronics Letters, 40(1), 57–58.

    Google Scholar 

  80. McLoone, S., Brown, M. D., Irwin, G., & Lightbody, G. (1998). A hybrid linear/nonlinear training algorithm for feedforward neural networks. IEEE Transactions on Neural Networks, 9(4), 669–684.

    Google Scholar 

  81. Mezard, M., & Nadal, J. P. (1989). Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A, 22, 2191–2203.

    MathSciNet  Google Scholar 

  82. Minai, A. A., & Williams, R. D. (1990) Backpropagation heuristics: A study of the extended delta-bar-delta algorithm. In Proceedings of IEEE International Conference on Neural Networks (Vol. 1, pp. 595–600) San Diego, CA.

    Google Scholar 

  83. Moody, J. O., & Antsaklis, P. J. (1996). The dependence identification neural network construction algorithm. IEEE Transactions on Neural Networks, 7(1), 3–13.

    MathSciNet  Google Scholar 

  84. Mozer, M. C., & Smolensky, P. (1989). Using relevance to reduce network size automatically. Connection Science, 1(1), 3–16.

    Google Scholar 

  85. Nakama, T. (2009). Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing, 73, 151–159.

    Google Scholar 

  86. Narayan, S. (1997). The generalized sigmoid activation function: Competitive supervised learning. Information Sciences, 99, 69–82.

    MathSciNet  Google Scholar 

  87. Ng, S. C., Leung, S. H., & Luk, A. (1999). Fast convergent generalized back-propagation algorithm with constant learning rate. Neural Processing Letters, 9, 13–23.

    Google Scholar 

  88. Nguyen, D. & Widrow, B. (1990). Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of International Joint Conference on Neural Networks (Vol. 3, pp. 21–26). San Diego, CA.

    Google Scholar 

  89. Oh, S. H. (1997). Improving the error back-propagation algorithm with a modified error function. IEEE Transactions on Neural Networks, 8(3), 799–803.

    Google Scholar 

  90. Parlos, A. G., Femandez, B., Atiya, A. F., Muthusami, J., & Tsai, W. K. (1994). An accelerated learning algorithm for multilayer perceptron networks. IEEE Transactions on Neural Networks, 5(3), 493–497.

    Google Scholar 

  91. Parma, G. G., Menezes, B. R., & Braga, A. P. (1998). Sliding mode algorithm for training multilayer artificial neural networks. Electronics Letters, 34(1), 97–98.

    Google Scholar 

  92. Perantonis, S. J., & Virvilis, V. (1999). Input feature extraction for multilayered perceptrons using supervised principal component analysis. Neural Processing Letters, 10, 243–252.

    Google Scholar 

  93. Pernia-Espinoza, A. V., Ordieres-Mere, J. B., Martinez-de-Pison, F. J., & Gonzalez-Marcos, A. (2005). TAO-robust backpropagation learning algorithm. Neural Networks, 18, 191–204.

    Google Scholar 

  94. Pfister, M., & Rojas, R. (1993) Speeding-up backpropagation—A comparison of orthogonal techniques. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 517–523). Nagoya, Japan.

    Google Scholar 

  95. Pfister, M., & Rojas, R. (1994). Qrprop-a hybrid learning algorithm which adaptively includes second order information. In Proceedings of 4th Dortmund Fuzzy Days (pp. 55–62).

    Google Scholar 

  96. Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.

    Google Scholar 

  97. Ponnapalli, P. V. S., Ho, K. C., & Thomson, M. (1999). A formal selection and pruning algorithm for feedforward artificial neural network optimization. IEEE Transactions on Neural Networks, 10(4), 964–968.

    Google Scholar 

  98. Rathbun, T. F., Rogers, S. K., DeSimio, M. P., & Oxley, M. E. (1997). MLP iterative construction algorithm. Neurocomputing, 17, 195–216.

    Google Scholar 

  99. M. Riedmiller & H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In Proceedings of IEEE International Conference on Neural Networks (pp. 586–591). San Francisco, CA.

    Google Scholar 

  100. RoyChowdhury, P., Singh, Y. P., & Chansarkar, R. A. (1999). Dynamic tunneling technique for efficient training of multilayer perceptrons. IEEE Transactions on Neural Networks, 10(1), 48–55.

    Google Scholar 

  101. Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1990). Feature selection using a multilayer perceptron. Neural Network Computing, 2(2), 40–48.

    Google Scholar 

  102. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1: Foundation (pp. 318–362). Cambridge: MIT Press.

    Google Scholar 

  103. Rumelhart, D. E., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: the basic theory. In Y. Chauvin & D. E. Rumelhart (Eds.), Backpropagation: Theory, Architecture, and Applications (pp. 1–34). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  104. Satoh, S., & Nakano, R. (2013). Fast and stable learning utilizing singular regions of multilayer perceptron. Neural Processing Letters. doi:10.1007/s11063-013-9283-z.

  105. Selmic, R. R., & Lewis, F. L. (2002). Neural network approximation of piecewise continuous functions: Application to friction compensation. IEEE Transactions on Neural Networks, 13(3), 745–751.

    Google Scholar 

  106. Setiono, R., & Hui, L. C. K. (1995). Use of quasi-Newton method in a feed-forward neural network construction algorithm. IEEE Transactions on Neural Networks, 6(1), 273–277.

    Google Scholar 

  107. Sietsma, J., & Dow, R. J. F. (1991). Creating artificial neural networks that generalize. Neural Networks, 4, 67–79.

    Google Scholar 

  108. Silva, F. M., & Almeida, L. B. (1990). Speeding-up backpropagation. In R. Eckmiller (Ed.), Advanced Neural Computers (pp. 151–156). Amsterdam: North-Holland.

    Google Scholar 

  109. Sira-Ramirez, H., & Colina-Morles, E. (1995). A sliding mode strategy for adaptive learning in adalines. IEEE Transactions on Circuits and Systems I, 42(12), 1001–1012.

    Google Scholar 

  110. Smyth, S. G. (1992). Designing multilayer perceptrons from nearest neighbor systems. IEEE Transactions on Neural Networks, 3(2), 329–333.

    MathSciNet  Google Scholar 

  111. Sperduti, A., & Starita, A. (1993). Speed up learning and networks optimization with extended back propagation. Neural Networks, 6(3), 365–383.

    Google Scholar 

  112. Stahlberger, A., & Riedmiller, M. (1997). Fast network pruning and feature extraction using the unit-OBS algorithm. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9 (pp. 655–661). Cambridge, MA: MIT Press.

    Google Scholar 

  113. Sum, J., Leung, C. S., Young, G. H., & Kan, W. K. (1999). On the Kalman filtering method in neural network training and pruning. IEEE Transactions on Neural Networks, 10, 161–166.

    Google Scholar 

  114. Tamura, S., & Tateishi, M. (1997). Capabilities of a four-layered feedforward neural network: four layers versus three. IEEE Transactions on Neural Networks, 8(2), 251–255.

    Google Scholar 

  115. Tang, Z., Wang, X., Tamura, H., & Ishii, M. (2003). An algorithm of supervised learning for multilayer neural networks. Neural Computation, 15, 1125–1142.

    Google Scholar 

  116. Teoh, E. J., Tan, K. C., & Xiang, C. (2006). Estimating the number of hidden neurons in a feedforward network using the singular value decomposition. IEEE Transactions on Neural Networks, 17(6), 1623–1629.

    Google Scholar 

  117. Tesauro, G., & Janssens, B. (1988). Scaling relationships in back-propagation learning. Complex Systems, 2, 39–44.

    MATH  Google Scholar 

  118. Thimm, G., & Fiesler, E. (1997). High-order and multilayer perceptron initialization. IEEE Transactions on Neural Networks, 8(2), 349–359.

    Google Scholar 

  119. Tollenaere, T. (1990). SuperSAB: fast adaptive backpropation with good scaling properties. Neural Networks, 3(5), 561–573.

    Google Scholar 

  120. Treadgold, N. K., & Gedeon, T. D. (1998). Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm. IEEE Transactions on Neural Networks, 9(4), 662–668.

    Google Scholar 

  121. Trenn, S. (2008). Multilayer perceptrons: approximation order and necessary number of hidden units. IEEE Transactions on Neural Networks, 19(5), 836–844.

    Google Scholar 

  122. Tresp, V., Neuneier, R., & Zimmermann, H. G. (1997). Early brain damage. In M. Mozer, M. I. Jordan, & P. Petsche (Eds.), Advances in Neural Information Processing Systems 9 (pp. 669–675). Cambridge, MA: MIT Press.

    Google Scholar 

  123. Tripathi, B. K., & Kalra, P. K. (2011). On efficient learning machine with root-power mean neuron in complex domain. IEEE Transactions on Neural Networks, 22(5), 727–738.

    Google Scholar 

  124. Vitela, J. E., & Reifman, J. (1997). Premature saturation in backpropagation networks: mechanism and necessary condition. Neural Networks, 10(4), 721–735.

    Google Scholar 

  125. Vogl, T. P., Mangis, J. K., Rigler, A. K., Zink, W. T., & Alkon, D. L. (1988). Accelerating the convergence of the backpropagation method. Biological Cybernetics, 59, 257–263.

    Google Scholar 

  126. Wang, S. D., & Hsu, C. H. (1991). Terminal attractor learning algorithms for back propagation neural networks. In Proceedings of International Joint Conference on Neural Networks (pp. 183–189). Seattle, WA.

    Google Scholar 

  127. Wang, X. G., Tang, Z., Tamura, H., & Ishii, M. (2004). A modified error function for the backpropagation algorithm. Neurocomputing, 57, 477–484.

    Google Scholar 

  128. Wang, J., Yang, J., & Wu, W. (2011). Convergence of cyclic and almost-cyclic learning with momentum for feedforward neural networks. IEEE Transactions on Neural Networks, 22(8), 1297–1306.

    Google Scholar 

  129. Weigend, A. S., Rumelhart, D. E., & Huberman, B. A. (1991). Generalization by weight-elimination with application to forecasting. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in Neural Information Processing Systems 3 (pp. 875–882). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  130. Wessels, L. F. A., & Barnard, E. (1992). Avoiding false local minima by proper initialization of connections. IEEE Transactions on Neural Networks, 3(6), 899–905.

    Google Scholar 

  131. Werbos, P. J. (1974). Beyond Regressions: New Tools for Prediction and Analysis in the Behavioral Sciences, PhD Thesis, Harvard University, Cambridge, MA.

    Google Scholar 

  132. Weymaere, N., & Martens, J. P. (1994). On the initializing and optimization of multilayer perceptrons. IEEE Transactions on Neural Networks, 5, 738–751.

    Google Scholar 

  133. White, H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1(4), 425–469.

    Google Scholar 

  134. Widrow, B., & Stearns, S. D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.

    MATH  Google Scholar 

  135. Wilson, D. R., & Martinez, T. R. (2003). The general inefficiency of batch training for gradient descent learning. Neural Networks, 16, 1429–1451.

    Google Scholar 

  136. Xiang, C., Ding, S. Q., & Lee, T. H. (2005). Geometrical interpretation and architecture selection of MLP. IEEE Transactions on Neural Networks, 16(1), 84–96.

    Google Scholar 

  137. Xing, H.-J., & Hu, B.-G. (2009). Two-phase construction of multilayer perceptrons using information theory. IEEE Transactions on Neural Networks, 20(4), 715–721.

    Google Scholar 

  138. Xu, Z.-B., Zhang, R., & Jing, W.-F. (2009). When does online BP training converge? IEEE Transactions on Neural Networks, 20(10), 1529–1539.

    Google Scholar 

  139. Yam, J. Y. F., & Chow, T. W. S. (2000). A weight initialization method for improving training speed in feedforward neural network. Neurocomputing, 30, 219–232.

    Google Scholar 

  140. Yam, Y. F., Chow, T. W. S., & Leung, C. T. (1997). A new method in determining the initial weights of feedforward neural networks. Neurocomputing, 16, 23–32.

    Google Scholar 

  141. Yam, J. Y. F., & Chow, T. W. S. (2001). Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Transactions on Neural Networks, 12(2), 430–434.

    Google Scholar 

  142. Yam, Y. F., Leung, C. T., Tam, P. K. S., & Siu, W. C. (2002). An independent component analysis based weight initialization method for multilayer perceptrons. Neurocomputing, 48, 807–818.

    MATH  Google Scholar 

  143. Yang, L., & Yu, W. (1993). Backpropagation with homotopy. Neural Computation, 5(3), 363–366.

    MathSciNet  Google Scholar 

  144. Yu, X. H., & Chen, G. A. (1997). Efficient backpropagation learning using optimal learning rate and momentum. Neural Networks, 10(3), 517–527.

    Google Scholar 

  145. Yu, X. H., Chen, G. A., & Cheng, S. X. (1995). Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks, 6(3), 669–677.

    Google Scholar 

  146. Yu, X., Efe, M. O., & Kaynak, O. (2002). A general backpropagation algorithm for feedforward neural networks learning. IEEE Transactions on Neural Networks, 13(1), 251–254.

    Google Scholar 

  147. Zak, M. (1989). Terminal attractors in neural networks. Neural Networks, 2, 259–274.

    Google Scholar 

  148. Zhang, X. M., Chen, Y. Q., Ansari, N., & Shi, Y. Q. (2004). Mini-max initialization for function approximation. Neurocomputing, 57, 389–409.

    Google Scholar 

  149. Zhang, R., Xu, Z.-B., Huang, G.-B., & Wang, D. (2012). Global convergence of online BP training with dynamic learning rate. IEEE Transactions on Neural Networks and Learning Systems, 23(2), 330–341.

    Google Scholar 

  150. Zurada, J. M., Malinowski, A., & Usui, S. (1997). Perturbation method for deleting redundant inputs of perceptron networks. Neurocomputing, 14, 177–193.

    Google Scholar 

  151. Zweiri, Y. H., Whidborne, J. F., & Seneviratne, L. D. (2003). A three-term backpropagation algorithm. Neurocomputing, 50, 305–318.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2014). Multilayer Perceptrons: Architecture and Error Backpropagation. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5571-3_4

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5570-6

  • Online ISBN: 978-1-4471-5571-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics