Advertisement

Abstract

The application of artificial neural networks to complex real world problems usually requires a modularization of the network architecture. The single modules deal with subtasks that are defined by a decomposition of the problem. Up to now, the modularization of the network is usually done heuristically. Little is known about sensible methods to adapt the network structure to the problem at hand. Incrementally constructed cascade architectures are a promising approach to grow networks according to the needs of the problem. This paper discusses the properties of the recently proposed direct cascade architecture DCA (Littmann & Ritter 1992). One important virtue of DCA is that it allows the cascading of entire subnetworks, even if these admit no error-backpropagation. Exploiting this flexibility and using LLM networks as cascaded elements, we show that the performance of the resulting network cascades can be greatly enhanced compared to the performance of a single network. Our results for the Mackey-Glass time series prediction task indicate that such deeply cascaded network architectures achieve good generalization even on small data sets, when shallow, broad architectures of comparable size suffer from overfitting. We conclude that the DCA approach offers a powerful and flexible alternative to existing schemes such as, e. g. , the mixtures of experts approach, for the construction of modular systems from a wide range of subnetwork types.

Keywords

Neural Information Processing System Neural Computation Normalize Root Mean Square Error Training Epoch Neural Module 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baffes, P. , & J. Zelle (1992). Growing layers of perceptrons: Introducing the exten-tron algorithm. Proceedings of the International Joint Conference on Neural Networks, volume II (pp. 392–397). Baltimore, MD.CrossRefGoogle Scholar
  2. Baum, E. , & D. Haussler (1989). What size net gives valid generalization? Neural Computation 1:151–160.CrossRefGoogle Scholar
  3. Crowder, R. S. (1990). Predicting the Mackey-Glass time series with cascade-correlation learning. In D. S. Touretzky, J. L. Elman, TJ. Sejnowski, & G. E. Hinton (eds. ), Connectionist Models: Proceedings of the 1990 Summer School (pp. 524–532). San Mateo, CA: Morgan Kaufmann.Google Scholar
  4. Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lipp-mann, J. E. Moody, and D. S. Touretzky (eds. ), Advances in neural information processing systems 3 (pp. 190–196). San Mateo, CA: Morgan Kaufmann.Google Scholar
  5. Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lipp-mann, J. E. Moody, and D. S. Touretzky (eds. ), Advances in neural information processing systems 3 (pp. 190–196). San Mateo, CA: Morgan Kaufmann.Google Scholar
  6. Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation 2:198–209.CrossRefGoogle Scholar
  7. Hartmann, E. , & J. D. Keeler (1991). Predicting the future: Advantages of semilocal units. Neural Computation 3:566–578.CrossRefGoogle Scholar
  8. Jacobs, R., & M. Jordan (1991). A competitive modular connectionist architecture. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (eds. ), Advances in neural information processing systems 3 (pp. 767–773). San Mateo, CA: Morgan Kaufmann.Google Scholar
  9. Jacobs, R., M. Jordan, S. Nowlan, & G. Hinton (1991). Adaptive mixtures of local experts. Neural Computation 3:79–87.CrossRefGoogle Scholar
  10. Lapedes, A. , & R. Farber (1987). Nonlinear signal processing using neural networks; prediction and system modeling. Technical Report TR LA-UR-87–2662, Los Alamos National Laboratory, Los Alamos, NM.Google Scholar
  11. LeCun, Y. , J. D. Denker, & S. A. Solla (1990). Optimal brain damage. In D. S. Touretzky (ed. ), Advances in neural information processing systems 2 (pp. 598–605). San Mateo, CA: Morgan Kaufmann.Google Scholar
  12. Littmann, E. , & H. Ritter (1992). Cascade network architectures. Proceedings of the International Joint Conference on Neural Networks, volume II (pp. 398–404). Baltimore, MD.Google Scholar
  13. Littmann, E. , & H. Ritter (1993). Generalization abilities of cascade network architectures. In C. L. Giles, S. J. Hanson, & J. D. Cowan (eds. ), Advances in neural information processing systems 5 (pp. 188–195). San Mateo, CA: Morgan Kaufmann.Google Scholar
  14. Littmann, E. , & H. Ritter (1994a). Analysis and applications of the direct cascade architecture. Technical Report TR 94–2, Department of Computer Science, Bielefeld University, Bielefeld, FR Germany.Google Scholar
  15. Littmann, E. , & H. Ritter (1996). Learning and generalization in cascade network architectures. Neural Computation 8(7):1521–1540.CrossRefGoogle Scholar
  16. Mackey, M. , L. Glass (1977). Oscillations and chaos in physiological control systems. Science 197:287–289.CrossRefGoogle Scholar
  17. Meyering, A. , & H. Ritter (1992). Learning 3D shape perception with local linear maps. Proceedings of the International Joint Conference on Neural Networks, volume IV (pp. 432–436). Baltimore, MD.Google Scholar
  18. Mezard, M. , & J. R Nadal (1989). Learning in feedforward layered networks: The tiling algorithm. Journal of Physics 22:2191–2204.MathSciNetGoogle Scholar
  19. Minsky, M. L. , & S. A. Papert (1969). Perceptrons. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  20. Moody, J. , & C. Darken (1988). Learning with localized receptive fields. Connec-tionist Models: Proceedings of the 1988 Summer School (pp. 133–143). San Mateo, CA: Morgan Kaufmann.Google Scholar
  21. Mozer, M. (1989). A focused back-propagation algorithm for temporal pattern recognition. Complex Systems 3:349–381.MathSciNetzbMATHGoogle Scholar
  22. Nabhan, T. , & A. Zomaya (1994). Toward generating neural network structures for function approximation. Neural Networks 7:89–99.CrossRefGoogle Scholar
  23. Nowlan, S. J. , & G. E. Hinton (1991). Evaluation of adaptive mixtures of competing experts. In D. S. Touretzky (ed. ), Advances in neural information processing systems 5 (pp. 774–780). San Mateo, CA: Morgan Kaufmann.Google Scholar
  24. Ritter, H. (1991). Learning with the self-organizing map. In T. Kohonen, K. Mäk-isara, O. Simula, & J. Kangas (eds. ), Artificial neural networks 1 (pp. 357–364). Amsterdam: Elsevier.Google Scholar
  25. Ritter, H. , T. Martinetz, & K. Schulten (1992). Neural computation and self-organizing maps: An introduction (English and German). New York: Addison-Wesley.zbMATHGoogle Scholar
  26. Rumelhart, D. E. , G. E. Hinton, & R. J. Williams (1986). Learning internal representations by back-propagating errors. In D. E. Rumelhart & J. L. McClelland (eds. ), Parallel distributed processing 1. Cambridge, MA: MIT Press.Google Scholar
  27. Stokbro, K. , D. Umberger, & J. Hertz (1990). Exploiting neurons with localized receptive fields to learn chaos. Complex Systems 4:603–622.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2000

Authors and Affiliations

  • Enno Littmann
    • 1
  • Helge Ritter
    • 2
  1. 1.Dornier GmbH, VAFA 1FriedrichshafenGermany
  2. 2.Universität BielefeldGermany

Personalised recommendations