The α-divergence is utilized to derive a generalized expectation and maximization algorithm (EM algorithm). This algorithm has a wide range of applications. In this paper, neural network learning for mixture probabilities is focused. The α-EM algorithm includes the existing EM algorithm as a special case since that corresponds to α = −1. The parameter α specifies a probability weight for the learning. This number affects learning speed and local optimality. In the discussions of update equations of neural nets, extensions of basic statistics such as Fisher's efficient score, his measure of information and Cramér-Rao's inequality are also given. Besides, this paper unveils another new idea. It is found that the cyclic EM structure can be used as a building block to generate a learning systolic array. Attaching monitors to this systolic array makes it possible to create a functionally distributed learning system.
Keywords
- Efficient Score
- Probability Weight
- Systolic Array
- Expert Network
- Differentiable Convex Function
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.