Scoring Bayesian networks of mixed variables

  • Bryan Andrews
  • Joseph Ramsey
  • Gregory F. CooperEmail author
Regular Paper


In this paper we outline two novel scoring methods for learning Bayesian networks in the presence of both continuous and discrete variables, that is, mixed variables. While much work has been done in the domain of automated Bayesian network learning, few studies have investigated this task in the presence of both continuous and discrete variables while focusing on scalability. Our goal is to provide two novel and scalable scoring functions capable of handling mixed variables. The first method, the Conditional Gaussian (CG) score, provides a highly efficient option. The second method, the Mixed Variable Polynomial (MVP) score, allows for a wider range of modeled relationships, including nonlinearity, but it is slower than CG. Both methods calculate log likelihood and degrees of freedom terms, which are incorporated into a Bayesian Information Criterion (BIC) score. Additionally, we introduce a structure prior for efficient learning of large networks and a simplification in scoring the discrete case which performs well empirically. While the core of this work focuses on applications in the search and score paradigm, we also show how the introduced scoring functions may be readily adapted as conditional independence tests for constraint-based Bayesian network learning algorithms. Lastly, we describe ways to simulate networks of mixed variable types and evaluate our proposed methods on such simulations.


Bayesian network structure learning Mixed variables Continuous and discrete variables 



We thank Clark Glymour, Peter Spirtes, Takis Benos, Dimitrios Manatakis, and Vineet Raghu for helpful discussions about the topics in this paper. We also thank the reviewers for their helpful comments.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Anderson, T., Taylor, J.B.: Strong consistency of least squares estimates in normal linear regression. The Ann. Stat., pp. 788–790 (1976)Google Scholar
  2. 2.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)zbMATHGoogle Scholar
  3. 3.
    Bøttcher, S.G.: Learning bayesian networks with mixed variables. Ph.D. thesis, Aalborg University (2004)Google Scholar
  4. 4.
    Chen, J., Chen, Z.: Extended bic for small-n-large-p sparse glm. Statistica Sinica pp. 555–574 (2012)Google Scholar
  5. 5.
    Chickering, D.M.: Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, 507–554 (2002)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Daly, R., Shen, Q., Aitken, S.: Review: learning bayesian networks: approaches and issues. The Knowl. Eng. Rev. 26(2), 99–157 (2011)CrossRefGoogle Scholar
  7. 7.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  8. 8.
    Heckerman, D., Geiger, D.: Learning bayesian networks: a unification for discrete and gaussian domains. In: Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 274–284. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  9. 9.
    Hsia, C.Y., Zhu, Y., Lin, C.J.: A study on trust region update rules in newton methods for large-scale linear classification. In: Asian Conference on Machine Learning, pp. 33–48 (2017)Google Scholar
  10. 10.
    Huang, T., Peng, H., Zhang, K.: Model selection for gaussian mixture models. Statistica Sinica 27(1), 147–169 (2017)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Jeffreys, H., Jeffreys, B.: Weierstrasss theorem on approximation by polynomials. Methods of Mathematical Physics pp. 446–448 (1988)Google Scholar
  12. 12.
    Peters, J., Janzing, D., Schölkopf, B.: Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge (2017)Google Scholar
  13. 13.
    McGeachie, M.J., Chang, H.H., Weiss, S.T.: Cgbayesnets: conditional gaussian bayesian network learning and inference with mixed discrete and continuous data. PLoS Comput. Biol. 10(6), e1003676 (2014)CrossRefGoogle Scholar
  14. 14.
    Meek, C.: Complete orientation rules for patterns (1995)Google Scholar
  15. 15.
    Monti, S., Cooper, G.F.: A multivariate discretization method for learning bayesian networks from mixed data. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 404–413. Morgan Kaufmann Publishers Inc. (1998)Google Scholar
  16. 16.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Burlington (1988)zbMATHGoogle Scholar
  17. 17.
    Raftery, A.E.: Bayesian model selection in social research. Sociol. Methodol., pp. 111–163 (1995)Google Scholar
  18. 18.
    Ramsey, J., Glymour, M., Sanchez-Romero, R., Glymour, C.: A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal., pp. 1–9 (2016)Google Scholar
  19. 19.
    Ramsey, J., Zhang, J., Spirtes, P.: Adjacency-faithfulness and conservative causal inference. In: Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 401–408. AUAI Press, Arlington, Virginia (2006)Google Scholar
  20. 20.
    Ramsey, J.D., Malinsky, D.: Comparing the performance of graphical structure learning algorithms with tetrad. arXiv preprint arXiv:1607.08110 (2016)
  21. 21.
    Romero, V., Rumí, R., Salmerón, A.: Learning hybrid bayesian networks using mixtures of truncated exponentials. Int. J. Approx. Reason. 42(1–2), 54–68 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Scheines, R., Spirtes, P., Glymour, C., Meek, C., Richardson, T.: The tetrad project: constraint based aids to causal model specification. Multivar. Behav. Res. 33(1), 65–117 (1998)CrossRefGoogle Scholar
  23. 23.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Sedgewick, A.J., Shi, I., Donovan, R.M., Benos, P.V.: Learning mixed graphical models with separate sparsity parameters and stability-based model selection. BMC Bioinform. 17(Suppl 5), 175 (2016)CrossRefGoogle Scholar
  25. 25.
    Sokolova, E., Groot, P., Claassen, T., Heskes, T.: Causal discovery from databases with discrete and continuous variables. In: European Workshop on Probabilistic Graphical Models, pp. 442–457. Springer (2014)Google Scholar
  26. 26.
    Spirtes, P., Glymour, C.N., Scheines, R.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)zbMATHGoogle Scholar
  27. 27.
    Zaidi, N.A., Webb, G.I.: A fast trust-region newton method for softmax logistic regression. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 705–713. SIAM (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Bryan Andrews
    • 1
  • Joseph Ramsey
    • 2
  • Gregory F. Cooper
    • 1
    Email author
  1. 1.University of PittsburghPittsburghUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations