Discretisation Does Affect the Performance of Bayesian Networks

Robben, Saskia; Velikova, Marina; Lucas, Peter J.F.; Samulski, Maurice

doi:10.1007/978-0-85729-130-1_17

Saskia Robben⁴,
Marina Velikova⁴,
Peter J.F. Lucas⁴ &
…
Maurice Samulski⁵

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

698 Accesses

Abstract

In this paper, we study the use of Bayesian networks to interpret breast X-ray images in the context of breast-cancer screening. In particular, we investigate the performance of a manually developed Bayesian network under various discretisation schemes to check whether the probabilistic parameters in the initial manual network with continuous features are optimal and correctly reflect the reality. The classification performance was determined using ROC analysis. A few algorithms perform better than the continuous baseline: best was the entropy-based method of Fayyad and Irani, but also simpler algorithms did outperform the continuous baseline. Two simpler methods with only 3 bins per variable gave results similar to the continuous baseline. These results indicate that it is worthwhile to consider discretising continuous data when developing Bayesian networks and support the practical importance of probabilitistic parameters in determining the network’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abraham, R., Simha, J.B., Iyengar, S.S.: A comparative analysis of discretization methods for medical datamining with naїve Bayesian classifier. In: Proc. of the Ninth International Conference on Information Technology, pp. 235–236 (2006)
Google Scholar
Acid, S., de Campos, L.M., Fernandez-Luna, J.M., Rodriguez, S., Rodriguez, J.M., Salcedo, J.L.: A comparison of learning algorithms for Bayesian networks: a case study based on data from an emergency medical service. Artif. Intel. in Medicine 30(3), 215–232 (2004)
Article Google Scholar
Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Article Google Scholar
Burnside, E., Davis, J., Chhatwal, J., Alagoz, O., Lindstrom, M., Geller, B., Littenberg, B., Shaffer, K., Kahn Jr, C., Page, C.: Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology 251(3), 663–672 (2009)
Article Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
D’Orsi, C., Bassett, L., Berg, W.e.a.: Breast Imaging Reporting and Data System: ACR BIRADS- Mammography (ed 4) (2003)
Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. of the 12th ICML, pp. 194–202 (1995)
Google Scholar
Druzdzel, M.J., Onisko, A.: Are Bayesian networks sensitive to precision of their parameters? In: Proc. of the International IIS08 Conference, Intelligent Information Systems XVI, pp. 35–44 (2008)
Google Scholar
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. of the 13th IJCAI, pp. 1022–1027 (1993)
Google Scholar
Ferreira, N., Velikova, M., Lucas, P.: Bayesian modelling of multi-view mammography. In: Proc. of the ICML Workshop on Machine Learning for Health-Care Applications (2008)
Google Scholar
Flores, J.L., Inza, I., naga, P.L.: Wrapper discretization by means of estimation of distribution algorithms. Intelligent Data Analysis 11(5), 525–545 (2007)
Google Scholar
Geurts, P.,Wehenkel, L.: Investigation and reduction of discretization variance in decision tree induction. Lecture Notes In Computer Science 1810, 162–170 (2000)
Google Scholar
Ismail, M.K., Ciesielski, V.: An empirical investigation of the impact of discretization on common data distributions. In: Proc. of the Third Int. Conf. on Hybrid Intelligent Systems: Design and Application of Hybrid Intelligent Systems, pp. 692–701 (2003)
Google Scholar
Jensen, F., Nielsen, T.: Bayesian networks and decision graphs. Springer Verlag (2007)
Google Scholar
Kahn, C., Roberts, L., Shaffer, K., Haddawy, P.: Construction of a Bayesian network for mammographic diagnosis of breast cancer. Comp. in Biol. and Medic. 27(1), 19–29 (1997)
Article Google Scholar
Mizianty, M., Kurgan, L., Ogiela, M.: Comparative analysis of the impact of discretization on the classification with na¨ıve Bayes and semi-na¨ıve Bayes classifiers. In: Proc. of the Seventh International Conference on Machine Learning and Applications, pp. 823–828 (2008)
Google Scholar
Murphy, K.: Bayesian network toolbox (BNT) (2007). http://people.cs.ubc.ca/_murphyk/ Software/BNT/bnt.html
Pradhan, A., Henrion, M., Provan, G., del Favero, B., Huang, K.: The sensitivity of belief networks to imprecise probabilities: an experimental investigation. Artificial Intelligence 84(1-2),357–357 (1996)
Article Google Scholar
Radstake, N., Lucas, P.J.F., Velikova, M., Samulski, M.: Critiquing knowledge representation in medical image interpretation using structure learning. In: Proc. of the Second Workshop ”Knowledge Representation for Health Care”, Lisbon, Portugal (2010)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann, San Francisco, CA, USA (2005)
MATH Google Scholar
Yang, Y., Webb, G.: Proportional k-interval discretization for na¨ıve-Bayes classifiers. In: Machine Learning: ECML 2001, pp. 564–575. Springer (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Radboud University Nijmegen, Institute for Computing and Information Sciences, Nijmegen, The Netherlands
Saskia Robben, Marina Velikova & Peter J.F. Lucas
Department of Radiology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
Maurice Samulski

Authors

Saskia Robben
View author publications
You can also search for this author in PubMed Google Scholar
Marina Velikova
View author publications
You can also search for this author in PubMed Google Scholar
Peter J.F. Lucas
View author publications
You can also search for this author in PubMed Google Scholar
Maurice Samulski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Saskia Robben , Marina Velikova , Peter J.F. Lucas or Maurice Samulski .

Editor information

Editors and Affiliations

Dept. Computer Science and, Software Engineering, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom
Max Bramer
School of Computing &, Mathematical Sciences, University of Greenwich, Park Row 30, London, SE10 9LS, United Kingdom
Miltos Petridis
, Faculty of Technology, De Montford University, The Gateway, Leicester, LE1 9BH, United Kingdom
Adrian Hopgood

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Robben, S., Velikova, M., Lucas, P.J., Samulski, M. (2011). Discretisation Does Affect the Performance of Bayesian Networks. In: Bramer, M., Petridis, M., Hopgood, A. (eds) Research and Development in Intelligent Systems XXVII. SGAI 2010. Springer, London. https://doi.org/10.1007/978-0-85729-130-1_17

Download citation

DOI: https://doi.org/10.1007/978-0-85729-130-1_17
Published: 29 October 2010
Publisher Name: Springer, London
Print ISBN: 978-0-85729-129-5
Online ISBN: 978-0-85729-130-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics