Global least-squares vs. EM training for the Gaussian mixture of experts
Since the introduction of the mixture of experts models and the EM algorithm for training them, maximum likelihood training of such networks has been shown to be a very useful and powerful tool for function estimation and prediction. A similar architecture is derived by other researchers from the application of fuzzy rules. Such systems are often trained by a straightforward global error minimisation procedure. This paper argues that in certain situations global optimisation is the most appropriate approach to take despite its apparent lack of statistical justification compared to the maximum likelihood approach. Moreover a composition of the two approaches often gives the minimal error on both the training and validation sets.
KeywordsFuzzy Model Expert Model Similar Architecture Adaptive Expert Incremental Learning Algorithm
Unable to display preview. Download preview PDF.
- 1.H Bersini, A. Duchateau, and N. Bradshaw. Using incremental learning algorithms in the search for minimal and effective fuzzy models. In Proceeding of FUZZ-IEEE. IEEE, 1997.Google Scholar
- 2.M. Brown and C.J. Harris. Neurofuzzy adaptive modelling and control. Prentice-Hall, Hemel Hempstead, 1994.Google Scholar
- 3.R.A. Jacobs. Bias/variance analyses of mixtures-of-experts architectures. Neural Computation, 9(2):369–384, 1997.Google Scholar
- 4.M.I. Jordan and R.A. Jacobs. Hierarchies of adaptive experts. NIPS, 4:985–993, 1992.Google Scholar
- 5.M.I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6(181–214), 1994.Google Scholar
- 6.M.I. Jordan and L. Xu. Convergence results for the EM approach to mixtures of experts architectures. Neural Networks, 8(9):1409–1431, 1995.Google Scholar
- 7.W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.F. Flannery. Numerical Recipes in C. CUP, 1988.Google Scholar
- 8.T. Takagi and M. Sugeno. Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man and Cybernetics, 15(1):116–132, 1985.Google Scholar
- 9.S. Waterhouse, D. MacKay, and T. Robinson. Bayesian methods for mixtures of experts. NIPS, 8:351–357, 1996.Google Scholar
- 10.L. Xu, M.I. Jordan, and G.E. Hinton. An alternative model for mixtures of experts. In NIPS 7, pages 633–640, 1995.Google Scholar