Skip to main content

Uni- and Multi-Dimensional Clustering Via Bayesian Networks

  • Chapter
  • First Online:
Unsupervised Learning Algorithms
  • 6112 Accesses

Abstract

This chapter discusses model based clustering via Bayesian networks. Both uni-dimensional and multi-dimensional clustering methods are discussed. The main idea for uni-dimensional clustering via Bayesian networks is to use the Bayesian structural clustering algorithm, which is a greedy algorithm that makes use of the EM algorithm. On the other hand, for multi-dimensional clustering we investigate latent tree models which according to our knowledge, are the only model based approach to multi-dimensional clustering. There are generally two approaches for learning latent tree models: Greedy search and feature selection. The former is able to cover a wider range of models, but the latter is more time efficient. However, latent tree models are unable to capture dependency between partitions through attributes. So we propose two approaches to overcome this shortcoming. Our first approach extends the idea of Bayesian structural clustering for uni-dimensional clustering, while the second one is a combination of feature selection methods and the main idea of multi-dimensional classification with Bayesian networks. We test our second approach on both real and synthetic data. The results show the goodness of our approach in finding meaningful and novel partitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The data is available on: http://www.stats.nba.com/leaders/alltime/.

References

  1. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2004)

    MATH  Google Scholar 

  2. Peña, J.M., Lozano, J.A., Larrañaga, P.: Learning Bayesian networks for clustering by means of constructive induction. Pattern Recogn. Lett. 20(11), 1219–1230 (1999)

    Article  Google Scholar 

  3. Peña, J.M., Lozano, J.A., Larrañaga, P.: An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering. Pattern Recogn. Lett. 21(8), 779–786 (2000)

    Article  Google Scholar 

  4. Peña, J.M., Lozano, J.A., Larrañaga, P.: Learning recursive Bayesian multinets for data clustering by means of constructive induction. Mach. Learn. 47(1), 63–89 (2002)

    Article  MATH  Google Scholar 

  5. Pham, D.T., Ruz, G.A.: Unsupervised training of Bayesian networks for data clustering. Proc. R. Soc. A Math. Phys. Eng. Sci. 465(2109), 2927–2948 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Van Der Gaag, L.C., De Waal, P.R.: Multi-dimensional Bayesian network classifiers. In: Proceedings of the 3rd European Workshop in Probabilistic Graphical Models, pp. 107–114 (2006)

    Google Scholar 

  7. Bielza, C., Li, G., Larrañaga, P.: Multi-dimensional classification with Bayesian networks. Int. J. Approx. Reason. 52, 705–727 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Sucar, L.E., Bielza, C., Morales, E.F., Hernandez-Leal, P., Zaragoza, J.H., Larrañaga, P.: Multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn. Lett. 41, 14–22 (2014)

    Article  Google Scholar 

  9. Rodríguez, J.D., Lozano, J.A.: Multi-objective learning of multi-dimensional Bayesian classifiers. In: Proceedings of the 8th IEEE International Conference on Hybrid Intelligent Systems, pp. 501–506 (2008)

    Google Scholar 

  10. Read, J., Bielza, C., Larrañaga, P.: Multi-dimensional classification with super-classes. IEEE Trans. Knowl. Data Eng. 26(7), 1720–1733 (2014)

    Article  Google Scholar 

  11. Chen, T., Zhang, N.L., Liu, T., Poon, K.M., Wang, Y.: Model-based multidimensional clustering of categorical data. Artif. Intell. 176(1), 2246–2269 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Liu, T., Zhang, N., Poon, K., Liu, H., Wang, Y.: A novel LTM-based method for multi-partition clustering. In: Proceedings of the 6th European Workshop on Probabilistic Graphical Models, pp. 203–210 (2012)

    Google Scholar 

  13. Mourad, R., Sinoquet, C., Zhang, N.L., Liu, T., Leray, P., et al.: A survey on latent tree models and applications. J. Artif. Intell. Res. 47, 157–203 (2013)

    Google Scholar 

  14. Zhang, N.L.: Hierarchical latent class models for cluster analysis. J. Mach. Learn. Res. 5, 697–723 (2004)

    MathSciNet  MATH  Google Scholar 

  15. Elidan, G., Lotner, N., Friedman, N., Koller, D.: Discovering hidden variables: A structure-based approach. Neural Inf. Process. Syst. 13, 479–485 (2000)

    Google Scholar 

  16. Elidan, G., Friedman, N.: Learning the dimensionality of hidden variables. In: Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, pp. 144–151 (2001)

    Google Scholar 

  17. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, New York (2007)

    MATH  Google Scholar 

  18. Friedman, N.: The Bayesian structural EM algorithm. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 129–138 (1998)

    Google Scholar 

  19. Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden markov models. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing, pp. 366–375 (2005)

    Google Scholar 

  20. Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  21. Santafé, G., Lozano, J.A., Larrañaga, P.: Bayesian model averaging of naive Bayes for clustering. IEEE Trans. Syst. Man Cybern. B Cybern. 36(5), 1149–1161 (2006)

    Article  Google Scholar 

  22. Santafé, G., Lozano, J.A., Larrañaga, P.: Bayesian model averaging of TAN models for clustering. In: 3rd European Workshop on Probabilistic Graphical Models, pp. 271–278 (2006)

    Google Scholar 

  23. Neapolitan, R.E.: Learning Bayesian Networks, vol. 38. Prentice Hall, Upper Saddle River (2004)

    Google Scholar 

  24. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)

    Article  MATH  Google Scholar 

  25. Ramoni, M., Sebastiani, P.: Learning Bayesian networks from incomplete databases. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, pp. 401–408 (1997)

    Google Scholar 

  26. Thiesson, B., Meek, C., Chickering, D.M., Heckerman, D.: Learning mixtures of DAG models. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 504–513 (1998)

    Google Scholar 

  27. Geiger, D., Heckerman, D.: Knowledge representation and inference in similarity networks and Bayesian multinets. Artif. Intell. 82(1), 45–74 (1996)

    Article  MathSciNet  Google Scholar 

  28. Galimberti, G., Soffritti, G.: Model-based methods to identify multiple cluster structures in a data set. Comput. Stat. Data Anal. 52(1), 520–536 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  29. Guan, Y., Dy, J.G., Niu, D., Ghahramani, Z.: Variational inference for nonparametric multiple clustering. In: Proceedings of the Workshop on Discovering, Summarizing and Using Multiple Clusterings (2010)

    Google Scholar 

  30. Herman, G., Zhang, B., Wang, Y., Ye, G., Chen, F.: Mutual information-based method for selecting informative feature sets. Pattern Recogn. 46(12), 3315–3327 (2013)

    Article  Google Scholar 

  31. Zhang, N.L., Kocka, T.: Efficient learning of hierarchical latent class models. In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 585–593 (2004)

    Google Scholar 

  32. Poon, L., Zhang, N.L., Chen, T., Wang, Y.: Variable selection in model-based clustering: to do or to facilitate. In: Proceedings of the 27th International Conference on Machine Learning, pp. 887–894 (2010)

    Google Scholar 

  33. Poon, L.K., Zhang, N.L., Liu, T., Liu, A.H.: Model-based clustering of high-dimensional data: variable selection versus facet determination. Int. J. Approx. Reason. 54(1), 196–215 (2013)

    Article  MATH  Google Scholar 

  34. Liu, T.-F., Zhang, N.L., Chen, P., Liu, A.H., Poon, L.K., Wang, Y.: Greedy learning of latent tree models for multidimensional clustering. Mach. Learn. 98, 301–330 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  35. Wang, Y., Zhang, N.L., Chen, T.: Latent tree models and approximate inference in Bayesian networks. J. Artif. Intell. Res., 879–900 (2008)

    Google Scholar 

  36. Harmeling, S., Williams, C.K.: Greedy learning of binary latent trees. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1087–1097 (2011)

    Article  Google Scholar 

  37. Zaragoza, J.C., Sucar, L.E., Morales, E.F.: A two-step method to learn multidimensional Bayesian network classifiers based on mutual information measures. In: Proceedings of Florida Artificial Intelligence Research Society Conference (2011)

    Google Scholar 

  38. Zaragoza, J.H., Sucar, L.E., Morales, E.F., Bielza, C., Larrañaga, P.: Bayesian chain classifiers for multidimensional classification. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 11, pp. 2192–2197 (2011)

    Google Scholar 

  39. Cheng, W., Hüllermeier, E., Dembczynski, K.J.: Bayes optimal multilabel classification via probabilistic classifier chains. In: Proceedings of the 27th International Conference on Machine Learning, pp. 279–286 (2010)

    Google Scholar 

  40. Borchani, H., Bielza, C., Martínez-Martín, P., Larrañaga, P.: Predicting EQ-5D from the Parkinson’s disease questionnaire PDQ-8 using multi-dimensional Bayesian network classifiers. Biomed. Eng. Appl. Basis Commun. 26(1), 1450015 (2014)

    Article  Google Scholar 

  41. Mihaljevic, B., Bielza, C., Benavides-Piccione, R., DeFelipe, J., Larrañaga, P.: Multi-dimensional classification of GABAergic interneurons with Bayesian network-modeled label uncertainty. Front. Comput. Neurosci. 8, 150 (2014)

    Google Scholar 

  42. Borchani, H., Bielza, C., Toro, C., Larrañaga, P.: Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers. Artif. Intell. Med. 57(3), 219–229 (2013)

    Article  Google Scholar 

  43. Borchani, H., Bielza, C., Martínez-Martín, P., Larrañaga, P.: Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: an application to predict the European quality of life-5Dimensions (EQ-5D) from the 39-item Parkinson’s disease questionnaire (PDQ-39). J. Biomed. Inform. 45, 1175–1184 (2012)

    Article  Google Scholar 

  44. de Waal, P.R., van der Gaag, L.C.: Inference and learning in multi-dimensional Bayesian network classifiers. In: Proceedings of the 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, 501–511 (2007)

    Google Scholar 

Download references

Acknowledgements

This work is funded by a so-called career contract at Linköping University, and by the Swedish Research Council (ref. 2010–4808).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omid Keivani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Keivani, O., Peña, J.M. (2016). Uni- and Multi-Dimensional Clustering Via Bayesian Networks. In: Celebi, M., Aydin, K. (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24211-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24209-5

  • Online ISBN: 978-3-319-24211-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics