A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs

Madsen, Anders L.; Jensen, Frank; Salmerón, Antonio; Karlsen, Martin; Langseth, Helge; Nielsen, Thomas D.

doi:10.1007/978-3-319-11433-0_20

A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs

Anders L. Madsen^21,22,
Frank Jensen²¹,
Antonio Salmerón²³,
Martin Karlsen²¹,
Helge Langseth²⁴ &
…
Thomas D. Nielsen²²

Conference paper

2083 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8754))

Abstract

The framework of Bayesian networks is a widely popular formalism for performing belief update under uncertainty. Structure restricted Bayesian network models such as the Naive Bayes Model and Tree-Augmented Naive Bayes (TAN) Model have shown impressive performance for solving classification tasks. However, if the number of variables or the amount of data is large, then learning a TAN model from data can be a time consuming task. In this paper, we introduce a new method for parallel learning of a TAN model from large data sets. The method is based on computing the mutual information scores between pairs of variables given the class variable in parallel. The computations are organised in parallel using balanced incomplete block designs. The results of a preliminary empirical evaluation of the proposed method on large data sets show that a significant performance improvement is possible through parallelisation using the method presented in this paper.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andreassen, S., Jensen, F.V., Andersen, S.K., Falck, B., Kjærulff, U., Woldbye, M., Sørensen, A.R., Rosenfalck, A., Jensen, F.: MUNIN — an expert EMG assistant. In: Desmedt, J.E. (ed.) Computer-Aided Electromyography and Expert Systems, ch. 21. Elsevier Science Publishers, Amsterdam (1989)
Google Scholar
Basak, A., Brinster, I., Ma, X., Mengshoel, O.J.: Accelerating Bayesian network parameter learning using Hadoop and MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 101–108 (2012)
Google Scholar
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, IT 14(3), 462–467 (1968)
Article MathSciNet MATH Google Scholar
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)
Google Scholar
Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems. Springer (1999)
Google Scholar
Di Paola, J.W., Wallis, J.S., Wallis, W.D.: A list of (v,b,r,k,λ) designs for r ≤ 30. In: Proc. 4th S-E Cont. Combinatorics, Graph Theory and Computing, pp. 249–258 (1973)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 103–130 (1997)
Google Scholar
Fang, Q., Yue, K., Fu, X., Wu, H., Liu, W.: MapReduce-Based Method for Learning Bayesian Network from Massive Data. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds.) APWeb 2013. LNCS, vol. 7808, pp. 697–708. Springer, Heidelberg (2013)
Chapter Google Scholar
Fisher, R.A.: An examination of the different possible solutions of a problem in incomplete blocks. Annals of Eugenics, 52–75 (1940)
Google Scholar
The MPI Forum. MPI: A Message Passing Interface (1993)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning, 1–37 (1997)
Google Scholar
Gordon, D.M.: La Jolla Difference Set Repository, http://www.ccrwest.org/diffsets/diff_sets/ (accessed May 15, 2014)
Gordon, D.M.: The Prime Power Conjecture is True for n < 2000000. Electronic J. Combinatorics 1(1, R6), 1–7 (1994)
MATH Google Scholar
Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs, 2nd edn. Springer (2007)
Google Scholar
Kjærulff, U.B., Madsen, A.L.: Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis, 2nd edn. Springer (2013)
Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models — Principles and Techniques. MIT Press (2009)
Google Scholar
Madsen, A.L., Jensen, F., Kjærulff, U.B., Lang, M.: HUGIN - The Tool for Bayesian Networks and Influence Diagrams. International Journal on Artificial Intelligence Tools 14(3), 507–543 (2005)
Article Google Scholar
Madsen, A.L., Lang, M., Kjærulff, U.B., Jensen, F.: The Hugin Tool for Learning Bayesian Networks. In: Nielsen, T.D., Zhang, N.L. (eds.) ECSQARU 2003. LNCS (LNAI), vol. 2711, pp. 594–605. Springer, Heidelberg (2003)
Chapter Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Series in Representation and Reasoning. Morgan Kaufmann Publishers, San Mateo (1988)
Google Scholar
Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI Workshop on Empirical Methods in AI, pp. 41–46 (2001)
Google Scholar
Scutari, M.: Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software 35(3), 1–22 (2010)
Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, Adaptive Computation and Machine Learning, 2nd edn. MIT Press (2000)
Google Scholar
Stinson, D.: Combinatorial Designs — Constructions and Analysis. Springer (2003)
Google Scholar
Zhang, N.L.: Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research 5, 697–723 (2004)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

HUGIN EXPERT A/S, Aalborg, Denmark
Anders L. Madsen, Frank Jensen & Martin Karlsen
Department of Computer Science, Aalborg University, Denmark
Anders L. Madsen & Thomas D. Nielsen
Department of Mathematics, University of Almería, Spain
Antonio Salmerón
Department of Computer and Information Science, Norwegian University of Science and Technology, Norway
Helge Langseth

Authors

Anders L. Madsen
View author publications
You can also search for this author in PubMed Google Scholar
Frank Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Salmerón
View author publications
You can also search for this author in PubMed Google Scholar
Martin Karlsen
View author publications
You can also search for this author in PubMed Google Scholar
Helge Langseth
View author publications
You can also search for this author in PubMed Google Scholar
Thomas D. Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Utrecht University, Faculty of Science, Department of Information and Computing Sciences, Princetonplein 5, 3584 CC Utrecht, The Netherlands
Linda C. van der Gaag
Utrecht University, Faculty of Science, Department of Information and Computing Sciences, Princetonplein 5, 3584 CC Utrecht,, The Netherlands
Ad J. Feelders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madsen, A.L., Jensen, F., Salmerón, A., Karlsen, M., Langseth, H., Nielsen, T.D. (2014). A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs. In: van der Gaag, L.C., Feelders, A.J. (eds) Probabilistic Graphical Models. PGM 2014. Lecture Notes in Computer Science(), vol 8754. Springer, Cham. https://doi.org/10.1007/978-3-319-11433-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-11433-0_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11432-3
Online ISBN: 978-3-319-11433-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics