On Model Selection, Bayesian Networks, and the Fisher Information Integral

Zou, Yuan; Roos, Teemu

doi:10.1007/978-3-319-28379-1_9

Yuan Zou¹⁶ &
Teemu Roos¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9505))

Included in the following conference series:

Workshop on Advanced Methodologies for Bayesian Networks

1120 Accesses
1 Citations

Abstract

We study BIC-like model selection criteria and in particular, their refinements that include a constant term involving the Fisher information matrix. We observe that for complex Bayesian network models, the constant term is a negative number with a very large absolute value that dominates the other terms for small and moderate sample sizes. We show that including the constant term degrades model selection accuracy dramatically compared to the standard BIC criterion where the term is omitted. On the other hand, we demonstrate that exact formulas such as Bayes factors or the normalized maximum likelihood (NML), or their approximations that are not based on Taylor expansions, perform well. A conclusion is that in lack of an exact formula, one should use either BIC, which is a very rough approximation, or a very close approximation but not an approximation that is truncated after the constant term.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
Our earlier paper on this topic appeared as an invited paper at the ITA-2013 workshop. Hence, no prior peer-reviewed publication of this material exists beyond the basic Monte Carlo approximation proposed in [13]. In particular, this is the first study where the lower-order terms of information criteria are discussed in conjunction with a model class for which model selection criteria are being intensively developed.
2.
An interesting line of future research will be to zoom in into the differences in model complexity within classes of networks with a fixed number of parameters by the techniques we use here.
3.
Unlike in the numerical studies in the previous section, here we want to take into account the fine-grained differences between FII values between different Bayesian network models with a fixed number of parameters.

References

Clarke, B.S., Barron, A.R.: Jeffreys prior is asymptotically least favorable under entropy risk. J. Stat. Plan. Infer. 41(1), 37–61 (1994)
Article MathSciNet MATH Google Scholar
Grünwald, P.D.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Book Google Scholar
Han, C., Carlin, B.P.: Markov chain Monte Carlo methods for computing Bayes factors. J. Am. Stat. Assoc. 96(455), 1122–1132 (2001)
Article Google Scholar
Jeffreys, H.: An invariant form for the prior probability in estimation problems. J. Roy. Stat. Soc. A 186(1007), 453–461 (1946)
MathSciNet MATH Google Scholar
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995)
Article MathSciNet MATH Google Scholar
Kontkanen, P., Myllymäki, P.: A linear-time algorithm for computing the multinomial stochastic complexity. Inf. Process. Lett. 103(6), 227–233 (2007)
Article MathSciNet MATH Google Scholar
Kontkanen, P., Myllymäki, P., Silander, T., Tirri, H., Grünwald, P.: On predictive distributions and Bayesian networks. Stat. Comput. 10, 39–54 (2000)
Article Google Scholar
Krichevsky, R., Trofimov, V.: The performance of universal coding. IEEE Trans. Inf. Theor. 27(2), 199–207 (1981)
Article MATH Google Scholar
Navarro, D.: A note on the applied use of MDL approximations. Neural Comput. 16(9), 1763–1768 (2004)
Article MATH Google Scholar
Rasmussen, C. E., Ghahramani, Z.: Occam’s razor. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems, pp. 294–300 (2001)
Google Scholar
Rissanen, J.: Fisher information and stochastic complexity. IEEE Trans. Inf. Theor. 42(1), 40–47 (1996)
Article MathSciNet MATH Google Scholar
Rissanen, J.: Information and Complexity in Statistical Modeling. Springer, New York (2007)
Book MATH Google Scholar
Roos, T.: Monte Carlo estimation of minimax regret with an application to MDL model selection. In: Proceedings of the IEEE Information Theory Workshop, pp. 284–288. IEEE Press (2008)
Google Scholar
Roos, T., Rissanen, J.: On sequentially normalized maximum likelihood models. In: Rissanen, J., Liski, E., Tabus, I., Myllymäki, P., Kontoyiannis, I., Heikkonen, J. (eds.) Proceedings of the Workshop on Information Theoretic Methods in Science and Engineering (WITMSE 2008), Tampere, Finland (2008)
Google Scholar
Roos, T., Zou, Y.: Keep it simple stupid – on the effect of lower-order terms in BIC-like criteria. In: Information Theory and Applications Workshop (ITA), pp. 1–7. IEEE Press (2013)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MathSciNet MATH Google Scholar
Shtarkov, Y.M.: Universal sequential coding of single messages. Probl. Inf. Transm. 23(3), 3–17 (1987)
MathSciNet Google Scholar
Silander, T., Roos, T., Kontkanen, P., Myllymäki, P.: Factorized normalized maximum likelihood criterion for learning Bayesian network structures. In: Jaeger, M., Nielsen, T. D. (eds.) Proceedings of the 4th European Workshop on Probabilistic Graphical Models (PGM 2008), pp. 257–272 (2008)
Google Scholar
Silander, T., Roos, T., Myllymäki, P.: Learning locally minimax optimal Bayesian networks. Int. J. Approx. Reason. 51(5), 544–557 (2010)
Article MathSciNet Google Scholar
Xie, Q., Barron, A.R.: Asymptotic minimax regret for data compression, gambling, and prediction. IEEE Trans. Inf. Theor. 46(2), 431–445 (2000)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki Institute for Information Technology HIIT, Gustaf Hällströmin katu 2b, 00014, Helsinki, Finland
Yuan Zou & Teemu Roos

Authors

Yuan Zou
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Roos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Teemu Roos .

Editor information

Editors and Affiliations

Osaka University, Osaka, Japan
Joe Suzuki
The University of Electro-Communications, Tokyo, Japan
Maomi Ueno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, Y., Roos, T. (2015). On Model Selection, Bayesian Networks, and the Fisher Information Integral. In: Suzuki, J., Ueno, M. (eds) Advanced Methodologies for Bayesian Networks. AMBN 2015. Lecture Notes in Computer Science(), vol 9505. Springer, Cham. https://doi.org/10.1007/978-3-319-28379-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-28379-1_9
Published: 08 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28378-4
Online ISBN: 978-3-319-28379-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics