Skip to main content

Discretisation Does Affect the Performance of Bayesian Networks

  • Conference paper
  • First Online:
Research and Development in Intelligent Systems XXVII (SGAI 2010)

Abstract

In this paper, we study the use of Bayesian networks to interpret breast X-ray images in the context of breast-cancer screening. In particular, we investigate the performance of a manually developed Bayesian network under various discretisation schemes to check whether the probabilistic parameters in the initial manual network with continuous features are optimal and correctly reflect the reality. The classification performance was determined using ROC analysis. A few algorithms perform better than the continuous baseline: best was the entropy-based method of Fayyad and Irani, but also simpler algorithms did outperform the continuous baseline. Two simpler methods with only 3 bins per variable gave results similar to the continuous baseline. These results indicate that it is worthwhile to consider discretising continuous data when developing Bayesian networks and support the practical importance of probabilitistic parameters in determining the network’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abraham, R., Simha, J.B., Iyengar, S.S.: A comparative analysis of discretization methods for medical datamining with naїve Bayesian classifier. In: Proc. of the Ninth International Conference on Information Technology, pp. 235–236 (2006)

    Google Scholar 

  2. Acid, S., de Campos, L.M., Fernandez-Luna, J.M., Rodriguez, S., Rodriguez, J.M., Salcedo, J.L.: A comparison of learning algorithms for Bayesian networks: a case study based on data from an emergency medical service. Artif. Intel. in Medicine 30(3), 215–232 (2004)

    Article  Google Scholar 

  3. Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  4. Burnside, E., Davis, J., Chhatwal, J., Alagoz, O., Lindstrom, M., Geller, B., Littenberg, B., Shaffer, K., Kahn Jr, C., Page, C.: Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology 251(3), 663–672 (2009)

    Article  Google Scholar 

  5. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  6. D’Orsi, C., Bassett, L., Berg, W.e.a.: Breast Imaging Reporting and Data System: ACR BIRADS- Mammography (ed 4) (2003)

    Google Scholar 

  7. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. of the 12th ICML, pp. 194–202 (1995)

    Google Scholar 

  8. Druzdzel, M.J., Onisko, A.: Are Bayesian networks sensitive to precision of their parameters? In: Proc. of the International IIS08 Conference, Intelligent Information Systems XVI, pp. 35–44 (2008)

    Google Scholar 

  9. Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. of the 13th IJCAI, pp. 1022–1027 (1993)

    Google Scholar 

  10. Ferreira, N., Velikova, M., Lucas, P.: Bayesian modelling of multi-view mammography. In: Proc. of the ICML Workshop on Machine Learning for Health-Care Applications (2008)

    Google Scholar 

  11. Flores, J.L., Inza, I., naga, P.L.: Wrapper discretization by means of estimation of distribution algorithms. Intelligent Data Analysis 11(5), 525–545 (2007)

    Google Scholar 

  12. Geurts, P.,Wehenkel, L.: Investigation and reduction of discretization variance in decision tree induction. Lecture Notes In Computer Science 1810, 162–170 (2000)

    Google Scholar 

  13. Ismail, M.K., Ciesielski, V.: An empirical investigation of the impact of discretization on common data distributions. In: Proc. of the Third Int. Conf. on Hybrid Intelligent Systems: Design and Application of Hybrid Intelligent Systems, pp. 692–701 (2003)

    Google Scholar 

  14. Jensen, F., Nielsen, T.: Bayesian networks and decision graphs. Springer Verlag (2007)

    Google Scholar 

  15. Kahn, C., Roberts, L., Shaffer, K., Haddawy, P.: Construction of a Bayesian network for mammographic diagnosis of breast cancer. Comp. in Biol. and Medic. 27(1), 19–29 (1997)

    Article  Google Scholar 

  16. Mizianty, M., Kurgan, L., Ogiela, M.: Comparative analysis of the impact of discretization on the classification with na¨ıve Bayes and semi-na¨ıve Bayes classifiers. In: Proc. of the Seventh International Conference on Machine Learning and Applications, pp. 823–828 (2008)

    Google Scholar 

  17. Murphy, K.: Bayesian network toolbox (BNT) (2007). http://people.cs.ubc.ca/_murphyk/ Software/BNT/bnt.html

  18. Pradhan, A., Henrion, M., Provan, G., del Favero, B., Huang, K.: The sensitivity of belief networks to imprecise probabilities: an experimental investigation. Artificial Intelligence 84(1-2),357–357 (1996)

    Article  Google Scholar 

  19. Radstake, N., Lucas, P.J.F., Velikova, M., Samulski, M.: Critiquing knowledge representation in medical image interpretation using structure learning. In: Proc. of the Second Workshop ”Knowledge Representation for Health Care”, Lisbon, Portugal (2010)

    Google Scholar 

  20. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann, San Francisco, CA, USA (2005)

    MATH  Google Scholar 

  21. Yang, Y., Webb, G.: Proportional k-interval discretization for na¨ıve-Bayes classifiers. In: Machine Learning: ECML 2001, pp. 564–575. Springer (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Saskia Robben , Marina Velikova , Peter J.F. Lucas or Maurice Samulski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this paper

Cite this paper

Robben, S., Velikova, M., Lucas, P.J., Samulski, M. (2011). Discretisation Does Affect the Performance of Bayesian Networks. In: Bramer, M., Petridis, M., Hopgood, A. (eds) Research and Development in Intelligent Systems XXVII. SGAI 2010. Springer, London. https://doi.org/10.1007/978-0-85729-130-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-130-1_17

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-129-5

  • Online ISBN: 978-0-85729-130-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics