A Comparative Investigation on Model Selection in Binary Factor Analysis
Binary factor analysis has been widely used in data analysis with various applications. Most studies assume a known hidden factors number k or determine it by one of the existing model selection criteria in the literature of statistical learning. These criteria have to be implemented in two phases that first obtains a set of candidate models and then selects the “optimal” model among a family of candidates according to a model selection criterion, which incurs huge computational costs. Under the framework of Bayesian Ying-Yang (BYY) harmony learning, not only a criterion has been obtained, but also model selection can be made automatically during parameter learning without requiring a two stage implementation, with a significant saving on computational costs. This paper further investigates the BYY criterion and BYY harmony learning with automatic model selection (BYY-AUTO) in comparison with existing typical criteria, including Akaike’s information criterion (AIC), the consistent Akaike’s information criterion (CAIC), the Bayesian inference criterion (BIC), and the cross-validation (CV) criterion. This study is made via experiments on data sets with different sample sizes, data space dimensions, noise variances, and hidden factors numbers. Experiments have shown that in most cases BIC outperforms AIC, CAIC, and CV while the BYY criterion and BYY-AUTO are either comparable with or better than BIC. Furthermore, BYY-AUTO takes much less time than the conventional two-stage learning methods with an appropriate number k automatically determined during parameter learning. Therefore, BYY harmony learning is a more preferred tool for hidden factors number determination.
KeywordsNoise Variance Minimum Description Length Model Selection Criterion Parameter Learning Hide Factor
Unable to display preview. Download preview PDF.