Advertisement

Multilayer Perceptrons: Architecture and Error Backpropagation

  • Ke-Lin DuEmail author
  • M. N. S. Swamy
Chapter

Abstract

Multilayer perceptron is one of the most important neural network models. It is a universal approximator for any continuous multivariate function. This chapter centers on the multilayer perceptron model, and the backpropagation learning algorithm. Some related topics, such as network architecture optimization, learning speedup strategies, and first-order gradient-based learning algorithms, are also introduced.

References

  1. 1.
    Abid, S., Fnaiech, F., & Najim, M. (2001). A fast feedforward training algorithm using a modified form of the standard backpropagation algorithm. IEEE Transactions on Neural Networks, 12(2), 424–430.CrossRefGoogle Scholar
  2. 2.
    Aires, F., Schmitt, M., Chedin, A., & Scott, N. (1999). The “weight smoothing” regularization of MLP for Jacobian stabilization. IEEE Transactions on Neural Networks, 10(6), 1502–1510.CrossRefGoogle Scholar
  3. 3.
    Anastasiadis, A. D., Magoulas, G. D., & Vrahatis, M. N. (2005). New globally convergent training scheme based on the resilient propagation algorithm. Neurocomputing, 64, 253–270.CrossRefGoogle Scholar
  4. 4.
    Bailey, T. M. (2015). Convergence of Rprop and variants. Neurocomputing, 159, 90–95.CrossRefGoogle Scholar
  5. 5.
    Barhen, J., Protopopescu, V., & Reister, D. (1997). TRUST: A deterministic algorithm for global optimization. Science, 276, 1094–1097.MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945.MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods. Complex Systems, 3, 331–342.zbMATHGoogle Scholar
  8. 8.
    Baykal N., & Erkmen, A. M. (2000). Resilient backpropagation for RBF networks. In Proceedings of 4th International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies (pp. 624–627). Brighton, UK.Google Scholar
  9. 9.
    Behera, L., Kumar, S., & Patnaik, A. (2006). On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Transactions on Neural Networks, 17(5), 1116–1125.CrossRefGoogle Scholar
  10. 10.
    Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1), 108–116.CrossRefGoogle Scholar
  11. 11.
    Bohte, S. M., Kok, J. N., & La Poutre, H. (2002). Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing, 48, 17–37.zbMATHCrossRefGoogle Scholar
  12. 12.
    Brouwer, R. K. (1997). Training a feed-forward network by feeding gradients forward rather than by back-propagation of errors. Neurocomputing, 16, 117–126.CrossRefGoogle Scholar
  13. 13.
    Castellano, G., Fanelli, A. M., & Pelillo, M. (1997). An iterative pruning algorithm for feedforward neural networks. IEEE Transactions on Neural Networks, 8(3), 519–531.CrossRefGoogle Scholar
  14. 14.
    Cetin, B. C., Burdick, J. W., & Barhen, J. (1993). Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In Proceedings of IEEE International Conference on Neural Networks (pp. 836–842). San Francisco, CA.Google Scholar
  15. 15.
    Chandra, P., & Singh, Y. (2004). An activation function adapting training algorithm for sigmoidal feedforward networks. Neurocomputing, 61, 429–437.CrossRefGoogle Scholar
  16. 16.
    Chandrasekaran, H., Chen, H. H., & Manry, M. T. (2000). Pruning of basis functions in nonlinear approximators. Neurocomputing, 34, 29–53.zbMATHCrossRefGoogle Scholar
  17. 17.
    Chen, D. S., & Jain, R. C. (1994). A robust backpropagation learning algorithm for function approximation. IEEE Transactions on Neural Networks, 5(3), 467–479.CrossRefGoogle Scholar
  18. 18.
    Chester, D. L. (1990). Why two hidden layers are better than one. In Proceedings of International Joint Conference on Neural Networks (pp. 265–268). Washington, DC.Google Scholar
  19. 19.
    Choi, J. J., Arabshahi, P., Marks II, R. J., & Caudell, T. P. (1992). Fuzzy parameter adaptation in neural systems. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 232–238). Baltimore, MD.Google Scholar
  20. 20.
    Chuang, C. C., Su, S. F., & Hsiao, C. C. (2000). The annealing robust backpropagation (ARBP) learning algorithm. IEEE Transactions on Neural Networks, 11(5), 1067–1077.Google Scholar
  21. 21.
    Cibas, T., Soulie, F. F., Gallinari, P., & Raudys, S. (1996). Variable selection with neural networks. Neurocomputing, 12, 223–248.zbMATHCrossRefGoogle Scholar
  22. 22.
    Cichocki, A., & Unbehauen, R. (1992). Neural networks for optimization and signal processing. New York: Wiley.Google Scholar
  23. 23.
    Costa, P., & Larzabal, P. (1999). Initialization of supervised training for parametric estimation. Neural Processing Letters, 9, 53–61.CrossRefGoogle Scholar
  24. 24.
    Cybenko, G. (1989). Approximation by superposition of a sigmoid function. Mathematics of Control, Signals, and Systems, 2, 303–314.MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Denoeux, T., & Lengelle, R. (1993). Initializing backpropagation networks with prototypes. Neural Networks, 6(3), 351–363.CrossRefGoogle Scholar
  26. 26.
    Drago, G., & Ridella, S. (1992). Statistically controlled activation weight initialization (SCAWI). IEEE Transactions on Neural Networks, 3(4), 627–631.CrossRefGoogle Scholar
  27. 27.
    Duch, W. (2005). Uncertainty of data, fuzzy membership functions, and multilayer perceptrons. IEEE Transactions on Neural Networks, 16(1), 10–23.CrossRefGoogle Scholar
  28. 28.
    Engelbrecht, A. P. (2001). A new pruning heuristic based on variance analysis of sensitivity information. IEEE Transactions on Neural Networks, 12(6), 1386–1399.CrossRefGoogle Scholar
  29. 29.
    Eom, K., Jung, K., & Sirisena, H. (2003). Performance improvement of backpropagation algorithm by automatic activation function gain tuning using fuzzy logic. Neurocomputing, 50, 439–460.zbMATHCrossRefGoogle Scholar
  30. 30.
    Fabisch, A., Kassahun, Y., Wohrle, H., & Kirchner, F. (2013). Learning in compressed space. Neural Networks, 42, 83–93.zbMATHCrossRefGoogle Scholar
  31. 31.
    Fahlman, S. E. (1988). Fast learning variations on back-propation: An empirical study. In D. S. Touretzky, G. E. Hinton, & T. Sejnowski (Eds.), Proceedings of 1988 Connectionist Models Summer School (pp. 38–51) (San Mateo, CA: Morgan Kaufmann). Pittsburgh.Google Scholar
  32. 32.
    Fahlman, S. E., & Lebiere, C. (1990). The cascade-correlation learning architecture. In D. S. Touretzky (Ed.), Advances in neural information processing systems (Vol. 2, pp. 524–532). San Mateo, CA: Morgan Kaufmann.Google Scholar
  33. 33.
    Finnoff, W. (1994). Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. Neural Computation, 6(2), 285–295.MathSciNetCrossRefGoogle Scholar
  34. 34.
    Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2(2), 198–209.CrossRefGoogle Scholar
  35. 35.
    Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2(3), 183–192.CrossRefGoogle Scholar
  36. 36.
    Gallant, S. I. (1990). Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2), 179–191.MathSciNetCrossRefGoogle Scholar
  37. 37.
    Goh, Y. S., & Tan, E. C. (1994). Pruning neural networks during training by backpropagation. In Proceedings of IEEE Region 10’s Ninth Annual International Conference (TENCON’94) (pp. 805–808). Singapore.Google Scholar
  38. 38.
    Gupta, A., & Lam, S. M. (1998). Weight decay backpropagation for noisy data. Neural Networks, 11, 1127–1137.CrossRefGoogle Scholar
  39. 39.
    Hannan, J. M., & Bishop, J. M. (1996). A class of fast artificial NN training algorithms. Tech. Report JMH-JMB 01/96, Department of Cybernetics, University of Reading, UK.Google Scholar
  40. 40.
    Hannan, J. M., & Bishop, J. M. (1997). A comparison of fast training algorithms over two real problems. In Proceedings of IEE Conference on Artificial Neural Networks (pp. 1–6). Cambridge, UK.Google Scholar
  41. 41.
    Hassibi, B., Stork, D. G., & Wolff, G. J. (1992). Optimal brain surgeon and general network pruning. In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco, CA.Google Scholar
  42. 42.
    Heskes, T., & Wiegerinck, W. (1996). A theoretical comparison of batch-mode, online, cyclic, and almost-cyclic learning. IEEE Transactions on Neural Networks, 7, 919–925.CrossRefGoogle Scholar
  43. 43.
    Hinton, G. E. (1987). Connectionist Learning Procedures. Technical Report CMU-CS-87-115, Carnegie-Mellon University, Computer Science Department, Pittsburgh, PA.Google Scholar
  44. 44.
    Hinton, G. E. (1989). Connectionist learning procedure. Artificial Intelligence, 40, 185–234.CrossRefGoogle Scholar
  45. 45.
    Hornik, K. M., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359–366.zbMATHCrossRefGoogle Scholar
  46. 46.
    Huang, G. B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2), 274–281.CrossRefGoogle Scholar
  47. 47.
    Igel, C., & Husken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105–123.zbMATHCrossRefGoogle Scholar
  48. 48.
    Ishikawa, M. (1995). Learning of modular structured networks. Artificial Intelligence, 75, 51–62.CrossRefGoogle Scholar
  49. 49.
    Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.CrossRefGoogle Scholar
  50. 50.
    Jiang, X., Chen, M., Manry, M. T., Dawson, M. S., & Fung, A. K. (1994). Analysis and optimization of neural networks for remote sensing. Remote Sensing Reviews, 9, 97–144.CrossRefGoogle Scholar
  51. 51.
    Jiang, M., & Yu, X. (2001). Terminal attractor based back propagation learning for feedforward neural networks. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS) (Vol. 2, pp. 711–714). Sydney, Australia.Google Scholar
  52. 52.
    Kamarthi, S. V., & Pittner, S. (1999). Accelerating neural network training using weight extrapolations. Neural Networks, 12, 1285–1299.CrossRefGoogle Scholar
  53. 53.
    Kanjilal, P. P., & Banerjee, D. N. (1995). On the application of orthogonal transformation for the design and analysis of feedforward networks. IEEE Transactions on Neural Networks, 6(5), 1061–1070.CrossRefGoogle Scholar
  54. 54.
    Karnin, E. D. (1990). A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks, 1(2), 239–242.CrossRefGoogle Scholar
  55. 55.
    Khashman, A. (2008). A modified backpropagation learning algorithm with added emotional coefficients. IEEE Transactions on Neural Networks, 19(11), 1896–1909.CrossRefGoogle Scholar
  56. 56.
    Kolen, J. F., & Pollack, J. B. (1990). Backpropagation is sensitive to initial conditions. Complex Systems, 4(3), 269–280.zbMATHGoogle Scholar
  57. 57.
    Kozma, R., Sakuma, M., Yokoyama, Y., & Kitamura, M. (1996). On the accuracy of mapping by neural networks trained by backpropagation with forgetting. Neurocomputing, 13, 295–311.CrossRefGoogle Scholar
  58. 58.
    Kruschke, J. K., & Movellan, J. R. (1991). Benefits of gain: Speeded learning and minimal layers in back-propagation networks. IEEE Transactions on Systems, Man, and Cybernetics, 21(1), 273–280.MathSciNetCrossRefGoogle Scholar
  59. 59.
    Kwok, T. Y., & Yeung, D. Y. (1997). Objective functions for training new hidden units in constructive neural networks. IEEE Transactions on Neural Networks, 8(5), 1131–1148.CrossRefGoogle Scholar
  60. 60.
    Le Cun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In D. S. Touretzky (Ed.), Advances in neural information processing systems (Vol. 2, pp. 598–605). San Mateo, CA: Morgan Kaufmann.Google Scholar
  61. 61.
    Le Cun, Y., Kanter, I., & Solla, S. A. (1991). Second order properties of error surfaces: learning time and generalization. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in neural information processing systems (Vol. 3, pp. 918–924). San Mateo, CA: Morgan Kaufmann.Google Scholar
  62. 62.
    Le Cun, Y., Simard, P. Y., & Pearlmutter, B. (1993). Automatic learning rate maximization by on-line estimation of the Hessian’s eigenvectors. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in neural information processing systems (Vol. 5, pp. 156–163). San Mateo, CA: Morgan Kaufmann.Google Scholar
  63. 63.
    Lee, Y., Oh, S. H., & Kim, M. W. (1991). The effect of initial weights on premature saturation in back-propagation training. In Proceedings of IEEE International Joint Conference on Neural Networks (Vol. 1, pp. 765–770). Seattle, WA.Google Scholar
  64. 64.
    Lee, H. M., Chen, C. M., & Huang, T. C. (2001). Learning efficiency improvement of back-propagation algorithm by error saturation prevention method. Neurocomputing, 41, 125–143.zbMATHCrossRefGoogle Scholar
  65. 65.
    Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1995). Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm. Neural Computation, 7, 982–999.CrossRefGoogle Scholar
  66. 66.
    Lehtokangas, M., Korpisaari, P., & Kaski, K. (1996). Maximum covariance method for weight initialization of multilayer perceptron networks. In Proceedings of European Symposium on Artificial Neural Networks (ESANN’96) (pp. 243–248). Bruges, Belgium.Google Scholar
  67. 67.
    Lehtokangas, M. (1999). Modelling with constructive backpropagation. Neural Networks, 12, 707–716.CrossRefGoogle Scholar
  68. 68.
    Leung, C. S., Wong, K. W., Sum, P. F., & Chan, L. W. (2001). A pruning method for the recursive least squared algorithm. Neural Networks, 14, 147–174.CrossRefGoogle Scholar
  69. 69.
    Levin, A. U., Leen, T. K., & Moody, J. E. (1994). Fast pruning using principal components. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 35–42). San Francisco, CA: Morgan Kaufman.Google Scholar
  70. 70.
    Liang, Y. C., Feng, D. P., Lee, H. P., Lim, S. P., & Lee, K. H. (2002). Successive approximation training algorithm for feedforward neural networks. Neurocomputing, 42, 311–322.zbMATHCrossRefGoogle Scholar
  71. 71.
    Lin, J., & Rosasco, L. (2017). Optimal rates for multi-pass stochastic gradient methods. Journal of Machine Learning Research, 18, 1–47.MathSciNetzbMATHGoogle Scholar
  72. 72.
    Liu, D., Chang, T. S., & Zhang, Y. (2002). A constructive algorithm for feedforward neural networks with incremental training. IEEE Transactions on Circuits and Systems I, 49(12), 1876–1879.CrossRefGoogle Scholar
  73. 73.
    Llanas, B., Lantaron, S., & Sainz, F. J. (2008). Constructive approximation of discontinuous functions by neural networks. Neural Processing Letters, 27, 209–226.CrossRefGoogle Scholar
  74. 74.
    Maass, W. (1996). Lower bounds for the computational power of networks of spiking neurons. Neural Computation, 8(1), 1–40.MathSciNetzbMATHCrossRefGoogle Scholar
  75. 75.
    MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4(3), 415–447.zbMATHCrossRefGoogle Scholar
  76. 76.
    Magdon-Ismail, M., & Atiya, A. F. (2000). The early restart algorithm. Neural Computation, 12, 1303–1312.CrossRefGoogle Scholar
  77. 77.
    Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1997). Effective backpropagation training with variable stepsize. Neural Networks, 10(1), 69–82.CrossRefGoogle Scholar
  78. 78.
    Magoulas, G. D., Plagianakos, V. P., & Vrahatis, M. N. (2002). Globally convergent algorithms with local learning rates. IEEE Transactions on Neural Networks, 13(3), 774–779.CrossRefGoogle Scholar
  79. 79.
    Maiorov, V., & Pinkus, A. (1999). Lower bounds for approximation by MLP neural networks. Neurocomputing, 25, 81–91.zbMATHCrossRefGoogle Scholar
  80. 80.
    Man, Z., Wu, H. R., Liu, S., & Yu, X. (2006). A new adaptive backpropagation algorithm based on Lyapunov stability theory for neural networks. IEEE Transactions on Neural Networks, 17(6), 1580–1591.CrossRefGoogle Scholar
  81. 81.
    Manry, M. T., Apollo, S. J., Allen, L. S., Lyle, W. D., Gong, W., Dawson, M. S., et al. (1994). Fast training of neural networks for remote sensing. Remote Sensing Reviews, 9, 77–96.CrossRefGoogle Scholar
  82. 82.
    Martens, J. P., & Weymaere, N. (2002). An equalized error backpropagation algorithm for the on-line training of multilayer perceptrons. IEEE Transactions on Neural Networks, 13(3), 532–541.CrossRefGoogle Scholar
  83. 83.
    Mastorocostas, P. A. (2004). Resilient back propagation learning algorithm for recurrent fuzzy neural networks. Electronics Letters, 40(1), 57–58.CrossRefGoogle Scholar
  84. 84.
    McKennoch, S., Liu, D., & Bushnell, L. G. (2006). Fast modifications of the SpikeProp algorithm. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’06) (pp. 3970–3977).Google Scholar
  85. 85.
    McLoone, S., Brown, M. D., Irwin, G., & Lightbody, G. (1998). A hybrid linear/nonlinear training algorithm for feedforward neural networks. IEEE Transactions on Neural Networks, 9(4), 669–684.CrossRefGoogle Scholar
  86. 86.
    Mezard, M., & Nadal, J. P. (1989). Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A, 22, 2191–2203.MathSciNetCrossRefGoogle Scholar
  87. 87.
    Minai, A. A., & Williams, R. D. (1990). Backpropagation heuristics: A study of the extended delta-bar-delta algorithm. In Proceedings of IEEE International Conference on Neural Networks (Vol. 1, pp. 595–600). San Diego, CA.Google Scholar
  88. 88.
    Moody, J. O., & Antsaklis, P. J. (1996). The dependence identification neural network construction algorithm. IEEE Transactions on Neural Networks, 7(1), 3–13.CrossRefGoogle Scholar
  89. 89.
    Mozer, M. C., & Smolensky, P. (1989). Using relevance to reduce network size automatically. Connection Science, 1(1), 3–16.CrossRefGoogle Scholar
  90. 90.
    Nakama, T. (2009). Theoretical analysis of batch and on-line training for gradient descent learning in neural networks. Neurocomputing, 73, 151–159.CrossRefGoogle Scholar
  91. 91.
    Narayan, S. (1997). The generalized sigmoid activation function: Competitive supervised learning. Information Sciences, 99, 69–82.MathSciNetCrossRefGoogle Scholar
  92. 92.
    Ng, S. C., Leung, S. H., & Luk, A. (1999). Fast convergent generalized back-propagation algorithm with constant learning rate. Neural Processing Letters, 9, 13–23.CrossRefGoogle Scholar
  93. 93.
    Nguyen, D., & Widrow, B. (1990). Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of International Joint Conference on Neural Networks (Vol. 3, pp. 21–26). San Diego, CA.Google Scholar
  94. 94.
    Oh, S. H. (1997). Improving the error back-propagation algorithm with a modified error function. IEEE Transactions on Neural Networks, 8(3), 799–803.CrossRefGoogle Scholar
  95. 95.
    Parlos, A. G., Femandez, B., Atiya, A. F., Muthusami, J., & Tsai, W. K. (1994). An accelerated learning algorithm for multilayer perceptron networks. IEEE Transactions on Neural Networks, 5(3), 493–497.CrossRefGoogle Scholar
  96. 96.
    Parma, G. G., Menezes, B. R., & Braga, A. P. (1998). Sliding mode algorithm for training multilayer artificial neural networks. Electronics Letters, 34(1), 97–98.CrossRefGoogle Scholar
  97. 97.
    Perantonis, S. J., & Virvilis, V. (1999). Input feature extraction for multilayered perceptrons using supervised principal component analysis. Neural Processing Letters, 10, 243–252.CrossRefGoogle Scholar
  98. 98.
    Pernia-Espinoza, A. V., Ordieres-Mere, J. B., Martinez-de-Pison, F. J., & Gonzalez-Marcos, A. (2005). TAO-robust backpropagation learning algorithm. Neural Networks, 18, 191–204.CrossRefGoogle Scholar
  99. 99.
    Pfister, M., & Rojas, R. (1993). Speeding-up backpropagation—A comparison of orthogonal techniques. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 517–523). Nagoya, Japan.Google Scholar
  100. 100.
    Pfister, M., & Rojas, R. (1994). QRprop–a hybrid learning algorithm which adaptively includes second order information. In Proceedings of the 4th Dortmund Fuzzy Days (pp. 55–62).Google Scholar
  101. 101.
    Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.zbMATHCrossRefGoogle Scholar
  102. 102.
    Ponnapalli, P. V. S., Ho, K. C., & Thomson, M. (1999). A formal selection and pruning algorithm for feedforward artificial neural network optimization. IEEE Transactions on Neural Networks, 10(4), 964–968.CrossRefGoogle Scholar
  103. 103.
    Ponulak, F., & Kasinski, A. (2010). Supervised learning in spiking neural networks with ReSuMe: sequence learning, classification, and spike shifting. Neural Computation, 22(2), 467–510.MathSciNetzbMATHCrossRefGoogle Scholar
  104. 104.
    Rathbun, T. F., Rogers, S. K., DeSimio, M. P., & Oxley, M. E. (1997). MLP iterative construction algorithm. Neurocomputing, 17, 195–216.CrossRefGoogle Scholar
  105. 105.
    Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In Proceedings of IEEE International Conference on Neural Networks (pp. 586–591). San Francisco, CA.Google Scholar
  106. 106.
    RoyChowdhury, P., Singh, Y. P., & Chansarkar, R. A. (1999). Dynamic tunneling technique for efficient training of multilayer perceptrons. IEEE Transactions on Neural Networks, 10(1), 48–55.CrossRefGoogle Scholar
  107. 107.
    Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1990). Feature selection using a multilayer perceptron. Neural Network Computing, 2(2), 40–48.Google Scholar
  108. 108.
    Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition, 1: Foundation (pp. 318–362). Cambridge, MA: MIT Press.Google Scholar
  109. 109.
    Rumelhart, D. E., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: the basic theory. In Y. Chauvin & D. E. Rumelhart (Eds.), Backpropagation: Theory, architecture, and applications (pp. 1–34). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  110. 110.
    Satoh, S., & Nakano, R. (2013). Fast and stable learning utilizing singular regions of multilayer perceptron. Neural Processing Letters, 38(2), 99–115.CrossRefGoogle Scholar
  111. 111.
    Selmic, R. R., & Lewis, F. L. (2002). Neural network approximation of piecewise continuous functions: Application to friction compensation. IEEE Transactions on Neural Networks, 13(3), 745–751.CrossRefGoogle Scholar
  112. 112.
    Setiono, R., & Hui, L. C. K. (1995). Use of quasi-Newton method in a feed-forward neural network construction algorithm. IEEE Transactions on Neural Networks, 6(1), 273–277.CrossRefGoogle Scholar
  113. 113.
    Shrestha, S. B., & Song, Q. (2015). Adaptive learning rate of SpikeProp based on weight convergence analysis. Neural Networks, 63, 185–198.zbMATHCrossRefGoogle Scholar
  114. 114.
    Sietsma, J., & Dow, R. J. F. (1991). Creating artificial neural networks that generalize. Neural Networks, 4, 67–79.CrossRefGoogle Scholar
  115. 115.
    Silva, F. M., & Almeida, L. B. (1990). Speeding-up backpropagation. In R. Eckmiller (Ed.), Advanced neural computers (pp. 151–156). Amsterdam: North-Holland.Google Scholar
  116. 116.
    Sira-Ramirez, H., & Colina-Morles, E. (1995). A sliding mode strategy for adaptive learning in adalines. IEEE Transactions on Circuits and Systems I, 42(12), 1001–1012.CrossRefGoogle Scholar
  117. 117.
    Smyth, S. G. (1992). Designing multilayer perceptrons from nearest neighbor systems. IEEE Transactions on Neural Networks, 3(2), 329–333.MathSciNetCrossRefGoogle Scholar
  118. 118.
    Sperduti, A., & Starita, A. (1993). Speed up learning and networks optimization with extended back propagation. Neural Networks, 6(3), 365–383.CrossRefGoogle Scholar
  119. 119.
    Stahlberger, A., & Riedmiller, M. (1997). Fast network pruning and feature extraction using the unit-OBS algorithm. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems (Vol. 9, pp. 655–661). Cambridge, MA: MIT Press.Google Scholar
  120. 120.
    Sum, J., Leung, C. S., Young, G. H., & Kan, W. K. (1999). On the Kalman filtering method in neural network training and pruning. IEEE Transactions on Neural Networks, 10, 161–166.CrossRefGoogle Scholar
  121. 121.
    Tamura, S., & Tateishi, M. (1997). Capabilities of a four-layered feedforward neural network: Four layers versus three. IEEE Transactions on Neural Networks, 8(2), 251–255.CrossRefGoogle Scholar
  122. 122.
    Tang, Z., Wang, X., Tamura, H., & Ishii, M. (2003). An algorithm of supervised learning for multilayer neural networks. Neural Computation, 15, 1125–1142.zbMATHCrossRefGoogle Scholar
  123. 123.
    Tarres, P., & Yao, Y. (2014). Online learning as stochastic approximation of regularization paths: Optimality and almost-sure convergence. IEEE Transactions on Information Theory, 60(9), 5716–5735.MathSciNetzbMATHCrossRefGoogle Scholar
  124. 124.
    Teoh, E. J., Tan, K. C., & Xiang, C. (2006). Estimating the number of hidden neurons in a feedforward network using the singular value decomposition. IEEE Transactions on Neural Networks, 17(6), 1623–1629.CrossRefGoogle Scholar
  125. 125.
    Tesauro, G., & Janssens, B. (1988). Scaling relationships in back-propagation learning. Complex Systems, 2, 39–44.zbMATHGoogle Scholar
  126. 126.
    Thimm, G., & Fiesler, E. (1997). High-order and multilayer perceptron initialization. IEEE Transactions on Neural Networks, 8(2), 349–359.CrossRefGoogle Scholar
  127. 127.
    Tieleman, T., & Hinton, G. (2012). Lecture 6.5 – rmsprop: Divide the gradient by a running average of its recent magnitude. In COURSERA: Neural networks for machine learning.Google Scholar
  128. 128.
    Tollenaere, T. (1990). SuperSAB: Fast adaptive backpropation with good scaling properties. Neural Networks, 3(5), 561–573.CrossRefGoogle Scholar
  129. 129.
    Treadgold, N. K., & Gedeon, T. D. (1998). Simulated annealing and weight decay in adaptive learning: The SARPROP algorithm. IEEE Transactions on Neural Networks, 9(4), 662–668.CrossRefGoogle Scholar
  130. 130.
    Trenn, S. (2008). Multilayer perceptrons: Approximation order and necessary number of hidden units. IEEE Transactions on Neural Networks, 19(5), 836–844.CrossRefGoogle Scholar
  131. 131.
    Tresp, V., Neuneier, R., & Zimmermann, H. G. (1997). Early brain damage. In M. Mozer, M. I. Jordan, & P. Petsche (Eds.), Advances in neural information processing systems (Vol. 9, pp. 669–675). Cambridge, MA: MIT Press.Google Scholar
  132. 132.
    Tripathi, B. K., & Kalra, P. K. (2011). On efficient learning machine with root-power mean neuron in complex domain. IEEE Transactions on Neural Networks, 22(5), 727–738.CrossRefGoogle Scholar
  133. 133.
    Vitela, J. E., & Reifman, J. (1997). Premature saturation in backpropagation networks: mechanism and necessary condition. Neural Networks, 10(4), 721–735.CrossRefGoogle Scholar
  134. 134.
    Vogl, T. P., Mangis, J. K., Rigler, A. K., Zink, W. T., & Alkon, D. L. (1988). Accelerating the convergence of the backpropagation method. Biological Cybernetics, 59, 257–263.CrossRefGoogle Scholar
  135. 135.
    Wang, S. D., & Hsu, C. H. (1991). Terminal attractor learning algorithms for back propagation neural networks. In Proceedings of International Joint Conference on Neural Networks (pp. 183–189). Seattle, WA.Google Scholar
  136. 136.
    Wang, X. G., Tang, Z., Tamura, H., & Ishii, M. (2004). A modified error function for the backpropagation algorithm. Neurocomputing, 57, 477–484.CrossRefGoogle Scholar
  137. 137.
    Wang, J., Yang, J., & Wu, W. (2011). Convergence of cyclic and almost-cyclic learning with momentum for feedforward neural networks. IEEE Transactions on Neural Networks, 22(8), 1297–1306.CrossRefGoogle Scholar
  138. 138.
    Wang, J., Xu, C., Yang, X., & Zurada, J. M. (2018). A novel pruning algorithm for smoothing feedforward neural networks based on group lasso method. IEEE Transactions on Neural Networks and Learning Systems, 29(5), 2012–2024.MathSciNetCrossRefGoogle Scholar
  139. 139.
    Weigend, A. S., Rumelhart, D. E., & Huberman, B. A. (1991). Generalization by weight-elimination with application to forecasting. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in neural information processing systems (Vol. 3, pp. 875–882). San Mateo, CA: Morgan Kaufmann.Google Scholar
  140. 140.
    Wessels, L. F. A., & Barnard, E. (1992). Avoiding false local minima by proper initialization of connections. IEEE Transactions on Neural Networks, 3(6), 899–905.CrossRefGoogle Scholar
  141. 141.
    Werbos, P. J. (1974). Beyond regressions: New tools for prediction and analysis in the behavioral sciences. Cambridge, MA: Harvard University. PhD Thesis.Google Scholar
  142. 142.
    Weymaere, N., & Martens, J. P. (1994). On the initializing and optimization of multilayer perceptrons. IEEE Transactions on Neural Networks, 5, 738–751.CrossRefGoogle Scholar
  143. 143.
    White, H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1(4), 425–469.CrossRefGoogle Scholar
  144. 144.
    Widrow, B., & Stearns, S. D. (1985). Adaptive signal processing. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  145. 145.
    Wilson, D. R., & Martinez, T. R. (2003). The general inefficiency of batch training for gradient descent learning. Neural Networks, 16, 1429–1451.CrossRefGoogle Scholar
  146. 146.
    Xiang, C., Ding, S. Q., & Lee, T. H. (2005). Geometrical interpretation and architecture selection of MLP. IEEE Transactions on Neural Networks, 16(1), 84–96.CrossRefGoogle Scholar
  147. 147.
    Xing, H.-J., & Hu, B.-G. (2009). Two-phase construction of multilayer perceptrons using information theory. IEEE Transactions on Neural Networks, 20(4), 715–721.CrossRefGoogle Scholar
  148. 148.
    Xu, Z.-B., Zhang, R., & Jing, W.-F. (2009). When does online BP training converge? IEEE Transactions on Neural Networks, 20(10), 1529–1539.CrossRefGoogle Scholar
  149. 149.
    Yam, J. Y. F., & Chow, T. W. S. (2000). A weight initialization method for improving training speed in feedforward neural network. Neurocomputing, 30, 219–232.CrossRefGoogle Scholar
  150. 150.
    Yam, Y. F., Chow, T. W. S., & Leung, C. T. (1997). A new method in determining the initial weights of feedforward neural networks. Neurocomputing, 16, 23–32.CrossRefGoogle Scholar
  151. 151.
    Yam, J. Y. F., & Chow, T. W. S. (2001). Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Transactions on Neural Networks, 12(2), 430–434.CrossRefGoogle Scholar
  152. 152.
    Yam, Y. F., Leung, C. T., Tam, P. K. S., & Siu, W. C. (2002). An independent component analysis based weight initialization method for multilayer perceptrons. Neurocomputing, 48, 807–818.zbMATHCrossRefGoogle Scholar
  153. 153.
    Yang, L., & Yu, W. (1993). Backpropagation with homotopy. Neural Computation, 5(3), 363–366.CrossRefGoogle Scholar
  154. 154.
    Yu, X. H., & Chen, G. A. (1997). Efficient backpropagation learning using optimal learning rate and momentum. Neural Networks, 10(3), 517–527.CrossRefGoogle Scholar
  155. 155.
    Yu, X. H., Chen, G. A., & Cheng, S. X. (1995). Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks, 6(3), 669–677.CrossRefGoogle Scholar
  156. 156.
    Yu, X., Efe, M. O., & Kaynak, O. (2002). A general backpropagation algorithm for feedforward neural networks learning. IEEE Transactions on Neural Networks, 13(1), 251–254.CrossRefGoogle Scholar
  157. 157.
    Yuan, K., Ying, B., & Sayed, A. H. (2016). On the influence of momentum acceleration on online learning. Journal of Machine Learning Research, 17, 1–66.MathSciNetzbMATHGoogle Scholar
  158. 158.
    Zak, M. (1989). Terminal attractors in neural networks. Neural Networks, 2, 259–274.MathSciNetCrossRefGoogle Scholar
  159. 159.
    Zhang, N. (2015). A study on the optimal double parameters for steepest descent with momentum. Neural Computation, 27, 982–1004.MathSciNetzbMATHCrossRefGoogle Scholar
  160. 160.
    Zhang, X. M., Chen, Y. Q., Ansari, N., & Shi, Y. Q. (2004). Mini-max initialization for function approximation. Neurocomputing, 57, 389–409.CrossRefGoogle Scholar
  161. 161.
    Zhang, R., Xu, Z.-B., Huang, G.-B., & Wang, D. (2012). Global convergence of online BP training with dynamic learning rate. IEEE Transactions on Neural Networks and Learning Systems, 23(2), 330–341.CrossRefGoogle Scholar
  162. 162.
    Zurada, J. M., Malinowski, A., & Usui, S. (1997). Perturbation method for deleting redundant inputs of perceptron networks. Neurocomputing, 14, 177–193.CrossRefGoogle Scholar
  163. 163.
    Zweiri, Y. H., Whidborne, J. F., & Seneviratne, L. D. (2003). A three-term backpropagation algorithm. Neurocomputing, 50, 305–318.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringConcordia UniversityMontrealCanada
  2. 2.Xonlink Inc.HangzhouChina

Personalised recommendations