Multilayer Perceptrons: Architecture and Error Backpropagation

Du, Ke-Lin; Swamy, M. N. S.

doi:10.1007/978-1-4471-5571-3_4

Multilayer Perceptrons: Architecture and Error Backpropagation

Ke-Lin Du^3,4 &
M. N. S. Swamy³

Chapter
First Online: 01 January 2013

9956 Accesses
9 Citations

Abstract

MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abid, S., Fnaiech, F., & Najim, M. (2001). A fast feedforward training algorithm using a modified form of the standard backpropagation algorithm. IEEE Transactions on Neural Networks, 12(2), 424–430.
Google Scholar
Aires, F., Schmitt, M., Chedin, A., & Scott, N. (1999). The “weight smoothing” regularization of MLP for Jacobian stabilization. IEEE Transactions on Neural Networks, 10(6), 1502–1510.
Google Scholar
Anastasiadis, A. D., Magoulas, G. D., & Vrahatis, M. N. (2005). New globally convergent training scheme based on the resilient propagation algorithm. Neurocomputing, 64, 253–270.
Google Scholar
Barhen, J., Protopopescu, V., & Reister, D. (1997). TRUST: A deterministic algorithm for global optimization. Science, 276, 1094–1097.
MATH MathSciNet Google Scholar
Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945.
MATH MathSciNet Google Scholar
Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods. Complex Systems, 3, 331–342.
MATH Google Scholar
Baykal, N. & Erkmen, A. M. (2000). Resilient backpropagation for RBF networks. In Proceedings of 4th International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies (pp. 624–627). Brighton, UK.
Google Scholar
Behera, L., Kumar, S., & Patnaik, A. (2006). On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Transactions on Neural Networks, 17(5), 1116–1125.
Google Scholar
Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1), 108–116.
Google Scholar
Brouwer, R. K. (1997). Training a feed-forward network by feeding gradients forward rather than by back-propagation of errors. Neurocomputing, 16, 117–126.
Google Scholar
Castellano, G., Fanelli, A. M., & Pelillo, M. (1997). An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks, 8(3), 519–531.
Google Scholar
Cetin, B. C., Burdick, J. W. & Barhen, J. (1993). Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In Proceedings of IEEE International Conference on Neural Networks (pp. 836–842). San Francisco.
Google Scholar
Chandra, P., & Singh, Y. (2004). An activation function adapting training algorithm for sigmoidal feedforward networks. Neurocomputing, 61, 429–437.
Google Scholar
Chandrasekaran, H., Chen, H. H., & Manry, M. T. (2000). Pruning of basis functions in nonlinear approximators. Neurocomputing, 34, 29–53.
MATH Google Scholar
Chen, D. S., & Jain, R. C. (1994). A robust backpropagation learning algorithm for function approximation. IEEE Transactions on Neural Networks, 5(3), 467–479.
Google Scholar
Chester, D. L. (1990). Why two hidden layers are better than one. In Proceedings of International Joint Conference on Neural Networks (pp. 265–268). Washington DC.
Google Scholar
Choi, J. J., Arabshahi, P., Marks II, R. J. & Caudell, T. P. (1992). Fuzzy parameter adaptation in neural systems. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 232–238). Baltimore, MD.
Google Scholar
Chuang, C. C., Su, S. F., & Hsiao, C. C. (2000). The annealing robust backpropagation (ARBP) learning algorithm. IEEE Transactions on Neural Networks, 11(5), 1067–1077.
Google Scholar
Cibas, T., Soulie, F. F., Gallinari, P., & Raudys, S. (1996). Variable selection with neural networks. Neurocomputing, 12, 223–248.
MATH Google Scholar
Cichocki, A., & Unbehauen, R. (1992). Neural networks for optimization and signal processing. New York: Wiley.
Google Scholar
Costa, P., & Larzabal, P. (1999). Initialization of supervised training for parametric estimation. Neural Processing Letters, 9, 53–61.
MATH Google Scholar
Cybenko, G. (1989). Approximation by superposition of a sigmoid function. Mathematics of Control, Signals, and Systems, 2, 303–314.
Google Scholar
Denoeux, T., & Lengelle, R. (1993). Initializing backpropagation networks with prototypes. Neural Networks, 6(3), 351–363.
Google Scholar
Drago, G., & Ridella, S. (1992). Statistically controlled activation weight initialization (SCAWI). IEEE Transactions on Neural Networks, 3(4), 627–631.
Google Scholar
Duch, W. (2005). Uncertainty of data, fuzzy membership functions, and multilayer perceptrons. IEEE Transactions on Neural Networks, 16(1), 10–23.
Google Scholar
Engelbrecht, A. P. (2001). A new pruning heuristic based on variance analysis of sensitivity information. IEEE Transactions on Neural Networks, 12(6), 1386–1399.
Google Scholar
Eom, K., Jung, K., & Sirisena, H. (2003). Performance improvement of backpropagation algorithm by automatic activation function gain tuning using fuzzy logic. Neurocomputing, 50, 439–460.
MATH Google Scholar
Fabisch, A., Kassahun, Y., Wohrle, H., & Kirchner, F. (2013). Learning in compressed space. Neural Networks, 42, 83–93.
Google Scholar
Fahlman, S. E. (1988). Fast learning variations on back-propation: an empirical study. In D. S. Touretzky, G. E. Hinton & T. Sejnowski (Eds.), Proceedings of 1988 Connectionist Models Summer School (San Mateo, CA: Morgan Kaufmann, 1988) (pp. 38–51). Pittsburgh.
Google Scholar
Fahlman, S. E., & Lebiere, C. (1990). The cascade-correlation learning architecture. In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2 (San Mateo (pp. 524–532). CA: Morgan Kaufmann.
Google Scholar
Finnoff, W. (1994). Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. Neural Computation, 6(2), 285–295.
MathSciNet Google Scholar
Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2(2), 198–209.
Google Scholar
Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2(3), 183–192.
Google Scholar
Gallant, S. I. (1990). Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2), 179–191.
MathSciNet Google Scholar
Goh, Y. S. & Tan, E. C. (1994). Pruning neural networks during training by backpropagation. In Proceedings of IEEE Region 10’s Ninth Ann. Int. Conf. (TENCON’94) (pp. 805–808). Singapore.
Google Scholar
Gupta, A., & Lam, S. M. (1998). Weight decay backpropagation for noisy data. Neural Networks, 11, 1127–1137.
Google Scholar
Hannan, J. M., & Bishop, J. M. (1996). A Class of fast artificial NN training algorithms. Technical Report JMH-JMB 01/96, Department of Cybernetics, University of Reading, UK.
Google Scholar
Hannan, J. M. & Bishop, J. M. (1997) A comparison of fast training algorithms over two real problems. In Proceedings of IEE Conference on Artificial Neural Networks (pp. 1–6). Cambridge, UK.
Google Scholar
Hassibi, B., Stork, D. G., & Wolff, G. J. (1992). Optimal brain surgeon and general network pruning. In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco.
Google Scholar
Heskes, T., & Wiegerinck, W. (1996). A theoretical comparison of batch-mode, online, cyclic, and almost-cyclic learning. IEEE Transactions on Neural Networks, 7, 919–925.
Google Scholar
Hinton, G. E. (1987). Connectionist learning procedures. Technical Report CMU-CS-87-115, Carnegie-Mellon University, Computer Sci. Department, Pittsburgh, PA.
Google Scholar
Hinton, G. E. (1989). Connectionist learning procedure. Artificial Intelligence, 40, 185–234.
Google Scholar
Hornik, K. M., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366.
Google Scholar
Huang, G. B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2), 274–281.
Google Scholar
Igel, C., & Husken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105–123.
MATH Google Scholar
Ishikawa, M. (1995). Learning of modular structured networks. Artificial Intelligence, 75, 51–62.
MathSciNet Google Scholar
Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.
Google Scholar
Jiang, X., Chen, M., Manry, M. T., Dawson, M. S., & Fung, A. K. (1994). Analysis and optimization of neural networks for remote sensing. Remote Sensing Reviews, 9, 97–144.
Google Scholar
Jiang, M., & Yu, X. (2001). Terminal attractor based back propagation learning for feedforward neural networks. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (Vol. 2, pp. 711–714). Sydney, Australia.
Google Scholar
Kamarthi, S. V., & Pittner, S. (1999). Accelerating neural network training using weight extrapolations. Neural Networks, 12, 1285–1299.
Google Scholar
Kanjilal, P. P., & Banerjee, D. N. (1995). On the application of orthogonal transformation for the design and analysis of feedforward networks. IEEE Transactions on Neural Networks, 6(5), 1061–1070.
Google Scholar
Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks, 1(2), 239–242.
Google Scholar
Khashman, A. (2008). A modified backpropagation learning algorithm with added emotional coefficients. IEEE Transactions on Neural Networks, 19(11), 1896–1909.
Google Scholar
Kolen, J. F., & Pollack, J. B. (1990). Backpropagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280.
MATH Google Scholar
Kozma, R., Sakuma, M., Yokoyama, Y., & Kitamura, M. (1996). On the accuracy of mapping by neural networks trained by backpropagation with forgetting. Neurocomputing, 13, 295–311.
Google Scholar
Kruschke, J. K., & Movellan, J. R. (1991). Benefits of gain: Speeded learning and minimal layers in back-propagation networks. IEEE Transactions on Systems, Man, and Cybernetics, 21(1), 273–280.
MathSciNet Google Scholar
Kwok, T. Y., & Yeung, D. Y. (1997). Objective functions for training new hidden units in constructive neural networks. IEEE Transactions on Neural Networks, 8(5), 1131–1148.
Google Scholar
Le Cun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2 (pp. 598–605). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Le Cun, Y., Kanter, I., & Solla, S. A. (1991). Second order properties of error surfaces: learning time and generalization. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in Neural Information Processing Systems 3 (pp. 918–924). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Le Cun, Y., Simard, P. Y., & Pearlmutter, B. (1993). Automatic learning rate maximization by on-line estimation of the Hessian’s eigenvectors. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in neural information processing systems 5 (pp. 156–163). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Lee, Y., Oh, S. H., & Kim, M. W. (1991). The effect of initial weights on premature saturation in back-propagation training. In Proc. IEEE International Joint Conference on Neural Networks (Vol. 1, pp. 765–770). Seattle, WA.
Google Scholar
Lee, H. M., Chen, C. M., & Huang, T. C. (2001). Learning efficiency improvement of back-propagation algorithm by error saturation prevention method. Neurocomputing, 41, 125–143.
MATH Google Scholar
Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1995). Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm. Neural Computation, 7, 982–999.
Google Scholar
Lehtokangas, M., Korpisaari, P., & Kaski, K. (1996). Maximum covariance method for weight initialization of multilayer perceptron networks. In Proceedings of European Symposium on Artificial Neural Networks (ESANN’96) (pp. 243–248). Bruges, Belgium.
Google Scholar
Lehtokangas, M. (1999). Modelling with constructive backpropagation. Neural Networks, 12, 707–716.
Google Scholar
Leung, C. S., Wong, K. W., Sum, P. F., & Chan, L. W. (2001). A pruning method for the recursive least squared algorithm. Neural Networks, 14, 147–174.
Google Scholar
Levin, A. U., Leen, T. K., & Moody, J. E. (1994). Fast pruning using principal components. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6 (pp. 35–42). San Francisco, CA: Morgan Kaufman.
Google Scholar
Liang, Y. C., Feng, D. P., Lee, H. P., Lim, S. P., & Lee, K. H. (2002). Successive approximation training algorithm for feedforward neural networks. Neurocomputing, 42, 311–322.
MATH Google Scholar
Liu, D., Chang, T. S., & Zhang, Y. (2002). A constructive algorithm for feedforward neural networks with incremental training. IEEE Transactions on Circuits and Systems I, 49(12), 1876–1879.
Google Scholar
Llanas, B., Lantaron, S., & Sainz, F. J. (2008). Constructive approximation of discontinuous functions by neural networks. Neural Processing Letters, 27, 209–226.
Google Scholar
MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4(3), 415–447.
Google Scholar
Magdon-Ismail, M., & Atiya, A. F. (2000). The early restart algorithm. Neural Computation, 12, 1303–1312.
Google Scholar
Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1997). Effective backpropagation training with variable stepsize. Neural Networks, 10(1), 69–82.
Google Scholar
Magoulas, G. D., Plagianakos, V. P., & Vrahatis, M. N. (2002). Globally convergent algorithms with local learning rates. IEEE Transactions on Neural Networks, 13(3), 774–779.
Google Scholar
Maiorov, V., & Pinkus, A. (1999). Lower bounds for approximation by MLP neural networks. Neurocomputing, 25, 81–91.
MATH Google Scholar
Man, Z., Wu, H. R., Liu, S., & Yu, X. (2006). A new adaptive backpropagation algorithm based on Lyapunov stability theory for neural networks. IEEE Transactions on Neural Networks, 17(6), 1580–1591.
Google Scholar
Manry, M. T., Apollo, S. J., Allen, L. S., Lyle, W. D., Gong, W., Dawson, M. S., et al. (1994). Fast training of neural networks for remote sensing. Remote Sensing Reviews, 9, 77–96.
Google Scholar
Martens, J. P., & Weymaere, N. (2002). An equalized error backpropagation algorithm for the on-line training of multilayer perceptrons. IEEE Transactions on Neural Networks, 13(3), 532–541.
Google Scholar
Mastorocostas, P. A. (2004). Resilient back propagation learning algorithm for recurrent fuzzy neural networks. Electronics Letters, 40(1), 57–58.
Google Scholar
McLoone, S., Brown, M. D., Irwin, G., & Lightbody, G. (1998). A hybrid linear/nonlinear training algorithm for feedforward neural networks. IEEE Transactions on Neural Networks, 9(4), 669–684.
Google Scholar
Mezard, M., & Nadal, J. P. (1989). Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A, 22, 2191–2203.
MathSciNet Google Scholar
Minai, A. A., & Williams, R. D. (1990) Backpropagation heuristics: A study of the extended delta-bar-delta algorithm. In Proceedings of IEEE International Conference on Neural Networks (Vol. 1, pp. 595–600) San Diego, CA.
Google Scholar
Moody, J. O., & Antsaklis, P. J. (1996). The dependence identification neural network construction algorithm. IEEE Transactions on Neural Networks, 7(1), 3–13.
MathSciNet Google Scholar
Mozer, M. C., & Smolensky, P. (1989). Using relevance to reduce network size automatically. Connection Science, 1(1), 3–16.
Google Scholar
Nakama, T. (2009). Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing, 73, 151–159.
Google Scholar
Narayan, S. (1997). The generalized sigmoid activation function: Competitive supervised learning. Information Sciences, 99, 69–82.
MathSciNet Google Scholar
Ng, S. C., Leung, S. H., & Luk, A. (1999). Fast convergent generalized back-propagation algorithm with constant learning rate. Neural Processing Letters, 9, 13–23.
Google Scholar
Nguyen, D. & Widrow, B. (1990). Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of International Joint Conference on Neural Networks (Vol. 3, pp. 21–26). San Diego, CA.
Google Scholar
Oh, S. H. (1997). Improving the error back-propagation algorithm with a modified error function. IEEE Transactions on Neural Networks, 8(3), 799–803.
Google Scholar
Parlos, A. G., Femandez, B., Atiya, A. F., Muthusami, J., & Tsai, W. K. (1994). An accelerated learning algorithm for multilayer perceptron networks. IEEE Transactions on Neural Networks, 5(3), 493–497.
Google Scholar
Parma, G. G., Menezes, B. R., & Braga, A. P. (1998). Sliding mode algorithm for training multilayer artificial neural networks. Electronics Letters, 34(1), 97–98.
Google Scholar
Perantonis, S. J., & Virvilis, V. (1999). Input feature extraction for multilayered perceptrons using supervised principal component analysis. Neural Processing Letters, 10, 243–252.
Google Scholar
Pernia-Espinoza, A. V., Ordieres-Mere, J. B., Martinez-de-Pison, F. J., & Gonzalez-Marcos, A. (2005). TAO-robust backpropagation learning algorithm. Neural Networks, 18, 191–204.
Google Scholar
Pfister, M., & Rojas, R. (1993) Speeding-up backpropagation—A comparison of orthogonal techniques. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 517–523). Nagoya, Japan.
Google Scholar
Pfister, M., & Rojas, R. (1994). Qrprop-a hybrid learning algorithm which adaptively includes second order information. In Proceedings of 4th Dortmund Fuzzy Days (pp. 55–62).
Google Scholar
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.
Google Scholar
Ponnapalli, P. V. S., Ho, K. C., & Thomson, M. (1999). A formal selection and pruning algorithm for feedforward artificial neural network optimization. IEEE Transactions on Neural Networks, 10(4), 964–968.
Google Scholar
Rathbun, T. F., Rogers, S. K., DeSimio, M. P., & Oxley, M. E. (1997). MLP iterative construction algorithm. Neurocomputing, 17, 195–216.
Google Scholar
M. Riedmiller & H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In Proceedings of IEEE International Conference on Neural Networks (pp. 586–591). San Francisco, CA.
Google Scholar
RoyChowdhury, P., Singh, Y. P., & Chansarkar, R. A. (1999). Dynamic tunneling technique for efficient training of multilayer perceptrons. IEEE Transactions on Neural Networks, 10(1), 48–55.
Google Scholar
Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1990). Feature selection using a multilayer perceptron. Neural Network Computing, 2(2), 40–48.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1: Foundation (pp. 318–362). Cambridge: MIT Press.
Google Scholar
Rumelhart, D. E., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: the basic theory. In Y. Chauvin & D. E. Rumelhart (Eds.), Backpropagation: Theory, Architecture, and Applications (pp. 1–34). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Satoh, S., & Nakano, R. (2013). Fast and stable learning utilizing singular regions of multilayer perceptron. Neural Processing Letters. doi:10.1007/s11063-013-9283-z.
Selmic, R. R., & Lewis, F. L. (2002). Neural network approximation of piecewise continuous functions: Application to friction compensation. IEEE Transactions on Neural Networks, 13(3), 745–751.
Google Scholar
Setiono, R., & Hui, L. C. K. (1995). Use of quasi-Newton method in a feed-forward neural network construction algorithm. IEEE Transactions on Neural Networks, 6(1), 273–277.
Google Scholar
Sietsma, J., & Dow, R. J. F. (1991). Creating artificial neural networks that generalize. Neural Networks, 4, 67–79.
Google Scholar
Silva, F. M., & Almeida, L. B. (1990). Speeding-up backpropagation. In R. Eckmiller (Ed.), Advanced Neural Computers (pp. 151–156). Amsterdam: North-Holland.
Google Scholar
Sira-Ramirez, H., & Colina-Morles, E. (1995). A sliding mode strategy for adaptive learning in adalines. IEEE Transactions on Circuits and Systems I, 42(12), 1001–1012.
Google Scholar
Smyth, S. G. (1992). Designing multilayer perceptrons from nearest neighbor systems. IEEE Transactions on Neural Networks, 3(2), 329–333.
MathSciNet Google Scholar
Sperduti, A., & Starita, A. (1993). Speed up learning and networks optimization with extended back propagation. Neural Networks, 6(3), 365–383.
Google Scholar
Stahlberger, A., & Riedmiller, M. (1997). Fast network pruning and feature extraction using the unit-OBS algorithm. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9 (pp. 655–661). Cambridge, MA: MIT Press.
Google Scholar
Sum, J., Leung, C. S., Young, G. H., & Kan, W. K. (1999). On the Kalman filtering method in neural network training and pruning. IEEE Transactions on Neural Networks, 10, 161–166.
Google Scholar
Tamura, S., & Tateishi, M. (1997). Capabilities of a four-layered feedforward neural network: four layers versus three. IEEE Transactions on Neural Networks, 8(2), 251–255.
Google Scholar
Tang, Z., Wang, X., Tamura, H., & Ishii, M. (2003). An algorithm of supervised learning for multilayer neural networks. Neural Computation, 15, 1125–1142.
Google Scholar
Teoh, E. J., Tan, K. C., & Xiang, C. (2006). Estimating the number of hidden neurons in a feedforward network using the singular value decomposition. IEEE Transactions on Neural Networks, 17(6), 1623–1629.
Google Scholar
Tesauro, G., & Janssens, B. (1988). Scaling relationships in back-propagation learning. Complex Systems, 2, 39–44.
MATH Google Scholar
Thimm, G., & Fiesler, E. (1997). High-order and multilayer perceptron initialization. IEEE Transactions on Neural Networks, 8(2), 349–359.
Google Scholar
Tollenaere, T. (1990). SuperSAB: fast adaptive backpropation with good scaling properties. Neural Networks, 3(5), 561–573.
Google Scholar
Treadgold, N. K., & Gedeon, T. D. (1998). Simulated annealing and weight decay in adaptive learning: the SARPROP algorithm. IEEE Transactions on Neural Networks, 9(4), 662–668.
Google Scholar
Trenn, S. (2008). Multilayer perceptrons: approximation order and necessary number of hidden units. IEEE Transactions on Neural Networks, 19(5), 836–844.
Google Scholar
Tresp, V., Neuneier, R., & Zimmermann, H. G. (1997). Early brain damage. In M. Mozer, M. I. Jordan, & P. Petsche (Eds.), Advances in Neural Information Processing Systems 9 (pp. 669–675). Cambridge, MA: MIT Press.
Google Scholar
Tripathi, B. K., & Kalra, P. K. (2011). On efficient learning machine with root-power mean neuron in complex domain. IEEE Transactions on Neural Networks, 22(5), 727–738.
Google Scholar
Vitela, J. E., & Reifman, J. (1997). Premature saturation in backpropagation networks: mechanism and necessary condition. Neural Networks, 10(4), 721–735.
Google Scholar
Vogl, T. P., Mangis, J. K., Rigler, A. K., Zink, W. T., & Alkon, D. L. (1988). Accelerating the convergence of the backpropagation method. Biological Cybernetics, 59, 257–263.
Google Scholar
Wang, S. D., & Hsu, C. H. (1991). Terminal attractor learning algorithms for back propagation neural networks. In Proceedings of International Joint Conference on Neural Networks (pp. 183–189). Seattle, WA.
Google Scholar
Wang, X. G., Tang, Z., Tamura, H., & Ishii, M. (2004). A modified error function for the backpropagation algorithm. Neurocomputing, 57, 477–484.
Google Scholar
Wang, J., Yang, J., & Wu, W. (2011). Convergence of cyclic and almost-cyclic learning with momentum for feedforward neural networks. IEEE Transactions on Neural Networks, 22(8), 1297–1306.
Google Scholar
Weigend, A. S., Rumelhart, D. E., & Huberman, B. A. (1991). Generalization by weight-elimination with application to forecasting. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in Neural Information Processing Systems 3 (pp. 875–882). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Wessels, L. F. A., & Barnard, E. (1992). Avoiding false local minima by proper initialization of connections. IEEE Transactions on Neural Networks, 3(6), 899–905.
Google Scholar
Werbos, P. J. (1974). Beyond Regressions: New Tools for Prediction and Analysis in the Behavioral Sciences, PhD Thesis, Harvard University, Cambridge, MA.
Google Scholar
Weymaere, N., & Martens, J. P. (1994). On the initializing and optimization of multilayer perceptrons. IEEE Transactions on Neural Networks, 5, 738–751.
Google Scholar
White, H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1(4), 425–469.
Google Scholar
Widrow, B., & Stearns, S. D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
MATH Google Scholar
Wilson, D. R., & Martinez, T. R. (2003). The general inefficiency of batch training for gradient descent learning. Neural Networks, 16, 1429–1451.
Google Scholar
Xiang, C., Ding, S. Q., & Lee, T. H. (2005). Geometrical interpretation and architecture selection of MLP. IEEE Transactions on Neural Networks, 16(1), 84–96.
Google Scholar
Xing, H.-J., & Hu, B.-G. (2009). Two-phase construction of multilayer perceptrons using information theory. IEEE Transactions on Neural Networks, 20(4), 715–721.
Google Scholar
Xu, Z.-B., Zhang, R., & Jing, W.-F. (2009). When does online BP training converge? IEEE Transactions on Neural Networks, 20(10), 1529–1539.
Google Scholar
Yam, J. Y. F., & Chow, T. W. S. (2000). A weight initialization method for improving training speed in feedforward neural network. Neurocomputing, 30, 219–232.
Google Scholar
Yam, Y. F., Chow, T. W. S., & Leung, C. T. (1997). A new method in determining the initial weights of feedforward neural networks. Neurocomputing, 16, 23–32.
Google Scholar
Yam, J. Y. F., & Chow, T. W. S. (2001). Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Transactions on Neural Networks, 12(2), 430–434.
Google Scholar
Yam, Y. F., Leung, C. T., Tam, P. K. S., & Siu, W. C. (2002). An independent component analysis based weight initialization method for multilayer perceptrons. Neurocomputing, 48, 807–818.
MATH Google Scholar
Yang, L., & Yu, W. (1993). Backpropagation with homotopy. Neural Computation, 5(3), 363–366.
MathSciNet Google Scholar
Yu, X. H., & Chen, G. A. (1997). Efficient backpropagation learning using optimal learning rate and momentum. Neural Networks, 10(3), 517–527.
Google Scholar
Yu, X. H., Chen, G. A., & Cheng, S. X. (1995). Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks, 6(3), 669–677.
Google Scholar
Yu, X., Efe, M. O., & Kaynak, O. (2002). A general backpropagation algorithm for feedforward neural networks learning. IEEE Transactions on Neural Networks, 13(1), 251–254.
Google Scholar
Zak, M. (1989). Terminal attractors in neural networks. Neural Networks, 2, 259–274.
Google Scholar
Zhang, X. M., Chen, Y. Q., Ansari, N., & Shi, Y. Q. (2004). Mini-max initialization for function approximation. Neurocomputing, 57, 389–409.
Google Scholar
Zhang, R., Xu, Z.-B., Huang, G.-B., & Wang, D. (2012). Global convergence of online BP training with dynamic learning rate. IEEE Transactions on Neural Networks and Learning Systems, 23(2), 330–341.
Google Scholar
Zurada, J. M., Malinowski, A., & Usui, S. (1997). Perturbation method for deleting redundant inputs of perceptron networks. Neurocomputing, 14, 177–193.
Google Scholar
Zweiri, Y. H., Whidborne, J. F., & Seneviratne, L. D. (2003). A three-term backpropagation algorithm. Neurocomputing, 50, 305–318.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Enjoyor Labs, Enjoyor Inc., Hangzhou, China
Ke-Lin Du & M. N. S. Swamy
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du

Authors

Ke-Lin Du
View author publications
You can also search for this author in PubMed Google Scholar
M. N. S. Swamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2014). Multilayer Perceptrons: Architecture and Error Backpropagation. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_4

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5571-3_4
Published: 07 December 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5570-6
Online ISBN: 978-1-4471-5571-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics