Abstract
The framework of Bayesian networks is a widely popular formalism for performing belief update under uncertainty. Structure restricted Bayesian network models such as the Naive Bayes Model and Tree-Augmented Naive Bayes (TAN) Model have shown impressive performance for solving classification tasks. However, if the number of variables or the amount of data is large, then learning a TAN model from data can be a time consuming task. In this paper, we introduce a new method for parallel learning of a TAN model from large data sets. The method is based on computing the mutual information scores between pairs of variables given the class variable in parallel. The computations are organised in parallel using balanced incomplete block designs. The results of a preliminary empirical evaluation of the proposed method on large data sets show that a significant performance improvement is possible through parallelisation using the method presented in this paper.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Andreassen, S., Jensen, F.V., Andersen, S.K., Falck, B., Kjærulff, U., Woldbye, M., Sørensen, A.R., Rosenfalck, A., Jensen, F.: MUNIN — an expert EMG assistant. In: Desmedt, J.E. (ed.) Computer-Aided Electromyography and Expert Systems, ch. 21. Elsevier Science Publishers, Amsterdam (1989)
Basak, A., Brinster, I., Ma, X., Mengshoel, O.J.: Accelerating Bayesian network parameter learning using Hadoop and MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 101–108 (2012)
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, IT 14(3), 462–467 (1968)
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)
Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems. Springer (1999)
Di Paola, J.W., Wallis, J.S., Wallis, W.D.: A list of (v,b,r,k,λ) designs for r ≤ 30. In: Proc. 4th S-E Cont. Combinatorics, Graph Theory and Computing, pp. 249–258 (1973)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 103–130 (1997)
Fang, Q., Yue, K., Fu, X., Wu, H., Liu, W.: MapReduce-Based Method for Learning Bayesian Network from Massive Data. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds.) APWeb 2013. LNCS, vol. 7808, pp. 697–708. Springer, Heidelberg (2013)
Fisher, R.A.: An examination of the different possible solutions of a problem in incomplete blocks. Annals of Eugenics, 52–75 (1940)
The MPI Forum. MPI: A Message Passing Interface (1993)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning, 1–37 (1997)
Gordon, D.M.: La Jolla Difference Set Repository, http://www.ccrwest.org/diffsets/diff_sets/ (accessed May 15, 2014)
Gordon, D.M.: The Prime Power Conjecture is True for n < 2000000. Electronic J. Combinatorics 1(1, R6), 1–7 (1994)
Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs, 2nd edn. Springer (2007)
Kjærulff, U.B., Madsen, A.L.: Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis, 2nd edn. Springer (2013)
Koller, D., Friedman, N.: Probabilistic Graphical Models — Principles and Techniques. MIT Press (2009)
Madsen, A.L., Jensen, F., Kjærulff, U.B., Lang, M.: HUGIN - The Tool for Bayesian Networks and Influence Diagrams. International Journal on Artificial Intelligence Tools 14(3), 507–543 (2005)
Madsen, A.L., Lang, M., Kjærulff, U.B., Jensen, F.: The Hugin Tool for Learning Bayesian Networks. In: Nielsen, T.D., Zhang, N.L. (eds.) ECSQARU 2003. LNCS (LNAI), vol. 2711, pp. 594–605. Springer, Heidelberg (2003)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Series in Representation and Reasoning. Morgan Kaufmann Publishers, San Mateo (1988)
Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI Workshop on Empirical Methods in AI, pp. 41–46 (2001)
Scutari, M.: Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software 35(3), 1–22 (2010)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, Adaptive Computation and Machine Learning, 2nd edn. MIT Press (2000)
Stinson, D.: Combinatorial Designs — Constructions and Analysis. Springer (2003)
Zhang, N.L.: Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research 5, 697–723 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Madsen, A.L., Jensen, F., Salmerón, A., Karlsen, M., Langseth, H., Nielsen, T.D. (2014). A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs. In: van der Gaag, L.C., Feelders, A.J. (eds) Probabilistic Graphical Models. PGM 2014. Lecture Notes in Computer Science(), vol 8754. Springer, Cham. https://doi.org/10.1007/978-3-319-11433-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-11433-0_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11432-3
Online ISBN: 978-3-319-11433-0
eBook Packages: Computer ScienceComputer Science (R0)