Abstract
We present a novel hybrid algorithm for Bayesian network structure learning, called Hybrid HPC (H2PC). It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. It is based on a subroutine called HPC, that combines ideas from incremental and divide-and-conquer constraint-based methods to learn the parents and children of a target variable. We conduct an experimental comparison of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning, on several benchmarks with various data sizes. Our extensive experiments show that H2PC outperforms MMHC both in terms of goodness of fit to new data and in terms of the quality of the network structure itself, which is closer to the true dependence structure of the data. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.
Chapter PDF
Similar content being viewed by others
Keywords
- Bayesian Network
- Directed Acyclic Graph
- Hybrid Algorithm
- Machine Learn Research
- Bayesian Network Structure
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley (2002)
Aliferis, C.F., Statnikov, A.R., Tsamardinos, I., Mani, S., Koutsoukos, X.D.: Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. Journal of Machine Learning Research 11, 171–234 (2010)
Armen, A.P., Tsamardinos, I.: A unified approach to estimation and control of the false discovery rate in bayesian network skeleton identification. In: European Symposium on Artificial Neural Networks, ESANN 2011 (2011)
Aussem, A., Rodrigues de Morais, S., Corbex, M.: Analysis of nasopharyngeal carcinoma risk factors with bayesian networks. Artificial Intelligence in Medicine 54(1) (2012)
Aussem, A., Tchernof, A., Rodrigues de Morais, S., Rome, S.: Analysis of lifestyle and metabolic predictors of visceral obesity with bayesian networks. BMC Bioinformatics 11, 487 (2010)
Brown, L.E., Tsamardinos, I.: A strategy for making predictions under manipulation. In: JMLR: Workshop and Conference Proceedings, vol. 3, pp. 35–52 (2008)
Buntine, W.: Theory refinement on Bayesian networks. In: Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence, San Mateo, CA, USA, pp. 52–60. Morgan Kaufmann Publishers (July 1991)
Cawley, G.: Causal and non-causal feature selection for ridge regression. In: JMLR: Workshop and Conference Proceedings vol. 3 (2008)
Cheng, J., Greiner, R., Kelly, J., Bell, D.A., Liu, W.: Learning Bayesian networks from data: An information-theory based approach. Artif. Intell. 137(1-2), 43–90 (2002)
Chickering, D.M.: Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507–554 (2002)
Ellis, B., Wong, W.H.: Learning causal bayesian network structures from experimental data. Journal of the American Statistical Association 103, 778–789 (2008)
Friedman, N.L., Nachman, I., Peér, D.: Learning bayesian network structure from massive datasets: the“sparse candidate” algorithm. In: Laskey, K.B., Prade, H. (eds.) Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 21–30. Morgan Kaufmann Publishers (1999)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning 20(3), 197–243 (1995)
Koivisto, M., Sood, K.: Exact bayesian structure discovery in bayesian networks. Journal of Machine Learning Research 5, 549–573 (2004)
Kojima, K., Perrier, E., Imoto, S., Miyano, S.: Optimal search on clustered structural constraint for learning bayesian network structure. Journal of Machine Learning Research 11, 285–310 (2010)
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009)
Moore, A., Wong, W.-K.: Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning, ICML 2003 (August 2003)
Peña, J.M., Nilsson, R., Björkegren, J., Tegnér, J.: Towards scalable and data efficient learning of Markov boundaries. International Journal of Approximate Reasoning 45(2), 211–232 (2007)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Peña, J.M.: Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control. In: Marchiori, E., Moore, J.H. (eds.) EvoBIO 2008. LNCS, vol. 4973, pp. 165–176. Springer, Heidelberg (2008)
Peña, J.: Finding consensus bayesian network structures. Journal of Artificial Intelligence Research 42, 661–687 (2012)
Perrier, E., Imoto, S., Miyano, S.: Finding optimal bayesian network given a super-structure. Journal of Machine Learning Research 9, 2251–2286 (2008)
de Morais, S.R., Aussem, A.: An Efficient and Scalable Algorithm for Local Bayesian Network Structure Discovery. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 164–179. Springer, Heidelberg (2010)
Rodrigues de Morais, S., Aussem, A.: A novel Markov boundary based feature subset selection algorithm. Neurocomputing 73, 578–584 (2010)
Schwarz, G.E.: Estimating the dimension of a model. Journal of Biomedical Informatics 6(2), 461–464 (1978)
Scutari, M.: Learning bayesian networks with the bnlearn R package. Journal of Statistical Software 35(3), 1–22 (2010)
Scutari, M., Brogini, A.: Bayesian network structure learning with permutation tests. To appear in Communications in Statistics Theory and Methods (2012)
Scutari, M.: Measures of Variability for Graphical Models. PhD thesis, School in Statistical Sciences, University of Padova (2011)
Silander, T., Myllymaki, P.: Simple approach for finding the globally optimal Bayesian network structure. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI 2006), pp. 445–452 (2006)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. The MIT Press (2000)
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2010)
Tsamardinos, I., Aliferis, C.F., Statnikov, A.R.: Algorithms for large scale Markov blanket discovery. In: Florida Artificial Intelligence Research Society Conference FLAIRS 2003, pp. 376–381 (2003)
Tsamardinos, I., Borboudakis, G.: Permutation Testing Improves Bayesian Network Learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 322–337. Springer, Heidelberg (2010)
Tsamardinos, I., Brown, L.E.: Bounding the false discovery rate in local Bayesian network learning. In: Proceedings AAAI National Conference on AI AAAI 2008, pp. 1100–1105 (2008)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)
Villanueva, E., Maciel, C.D.: Optimized algorithm for learning bayesian network superstructures. In: Proceedings of the 2012 International Conference on Pattern Recognition Applications and Methods, ICPRAM 2012 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gasse, M., Aussem, A., Elghazel, H. (2012). An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33460-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-33460-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33459-7
Online ISBN: 978-3-642-33460-3
eBook Packages: Computer ScienceComputer Science (R0)