Skip to main content

A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8754))

Abstract

The framework of Bayesian networks is a widely popular formalism for performing belief update under uncertainty. Structure restricted Bayesian network models such as the Naive Bayes Model and Tree-Augmented Naive Bayes (TAN) Model have shown impressive performance for solving classification tasks. However, if the number of variables or the amount of data is large, then learning a TAN model from data can be a time consuming task. In this paper, we introduce a new method for parallel learning of a TAN model from large data sets. The method is based on computing the mutual information scores between pairs of variables given the class variable in parallel. The computations are organised in parallel using balanced incomplete block designs. The results of a preliminary empirical evaluation of the proposed method on large data sets show that a significant performance improvement is possible through parallelisation using the method presented in this paper.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andreassen, S., Jensen, F.V., Andersen, S.K., Falck, B., Kjærulff, U., Woldbye, M., Sørensen, A.R., Rosenfalck, A., Jensen, F.: MUNIN — an expert EMG assistant. In: Desmedt, J.E. (ed.) Computer-Aided Electromyography and Expert Systems, ch. 21. Elsevier Science Publishers, Amsterdam (1989)

    Google Scholar 

  2. Basak, A., Brinster, I., Ma, X., Mengshoel, O.J.: Accelerating Bayesian network parameter learning using Hadoop and MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 101–108 (2012)

    Google Scholar 

  3. Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, IT 14(3), 462–467 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)

    Google Scholar 

  5. Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems. Springer (1999)

    Google Scholar 

  6. Di Paola, J.W., Wallis, J.S., Wallis, W.D.: A list of (v,b,r,k,λ) designs for r ≤ 30. In: Proc. 4th S-E Cont. Combinatorics, Graph Theory and Computing, pp. 249–258 (1973)

    Google Scholar 

  7. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 103–130 (1997)

    Google Scholar 

  8. Fang, Q., Yue, K., Fu, X., Wu, H., Liu, W.: MapReduce-Based Method for Learning Bayesian Network from Massive Data. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds.) APWeb 2013. LNCS, vol. 7808, pp. 697–708. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Fisher, R.A.: An examination of the different possible solutions of a problem in incomplete blocks. Annals of Eugenics, 52–75 (1940)

    Google Scholar 

  10. The MPI Forum. MPI: A Message Passing Interface (1993)

    Google Scholar 

  11. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning, 1–37 (1997)

    Google Scholar 

  12. Gordon, D.M.: La Jolla Difference Set Repository, http://www.ccrwest.org/diffsets/diff_sets/ (accessed May 15, 2014)

  13. Gordon, D.M.: The Prime Power Conjecture is True for n < 2000000. Electronic J. Combinatorics 1(1, R6), 1–7 (1994)

    MATH  Google Scholar 

  14. Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs, 2nd edn. Springer (2007)

    Google Scholar 

  15. Kjærulff, U.B., Madsen, A.L.: Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis, 2nd edn. Springer (2013)

    Google Scholar 

  16. Koller, D., Friedman, N.: Probabilistic Graphical Models — Principles and Techniques. MIT Press (2009)

    Google Scholar 

  17. Madsen, A.L., Jensen, F., Kjærulff, U.B., Lang, M.: HUGIN - The Tool for Bayesian Networks and Influence Diagrams. International Journal on Artificial Intelligence Tools 14(3), 507–543 (2005)

    Article  Google Scholar 

  18. Madsen, A.L., Lang, M., Kjærulff, U.B., Jensen, F.: The Hugin Tool for Learning Bayesian Networks. In: Nielsen, T.D., Zhang, N.L. (eds.) ECSQARU 2003. LNCS (LNAI), vol. 2711, pp. 594–605. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Series in Representation and Reasoning. Morgan Kaufmann Publishers, San Mateo (1988)

    Google Scholar 

  20. Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI Workshop on Empirical Methods in AI, pp. 41–46 (2001)

    Google Scholar 

  21. Scutari, M.: Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software 35(3), 1–22 (2010)

    Google Scholar 

  22. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, Adaptive Computation and Machine Learning, 2nd edn. MIT Press (2000)

    Google Scholar 

  23. Stinson, D.: Combinatorial Designs — Constructions and Analysis. Springer (2003)

    Google Scholar 

  24. Zhang, N.L.: Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research 5, 697–723 (2004)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Madsen, A.L., Jensen, F., Salmerón, A., Karlsen, M., Langseth, H., Nielsen, T.D. (2014). A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs. In: van der Gaag, L.C., Feelders, A.J. (eds) Probabilistic Graphical Models. PGM 2014. Lecture Notes in Computer Science(), vol 8754. Springer, Cham. https://doi.org/10.1007/978-3-319-11433-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11433-0_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11432-3

  • Online ISBN: 978-3-319-11433-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics