Cost-Sensitive Bayesian Network Learning Using Sampling

Nashnush, Eman; Vadera, Sunil

doi:10.1007/978-3-319-07692-8_44

Eman Nashnush⁵ &
Sunil Vadera⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 287))

1516 Accesses
1 Citations

Abstract

A significant advance in recent years has been the development of cost-sensitive decision tree learners, recognising that real world classification problems need to take account of costs of misclassification and not just focus on accuracy. The literature contains well over 50 cost-sensitive decision tree induction algorithms, each with varying performance profiles. Obtaining good Bayesian networks can be challenging and hence several algorithms have been proposed for learning their structure and parameters from data. However, most of these algorithms focus on learning Bayesian networks that aim to maximise the accuracy of classifications. Hence an obvious question that arises is whether it is possible to develop cost-sensitive Bayesian networks and whether they would perform better than cost-sensitive decision trees for minimising classification cost? This paper explores this question by developing a new Bayesian network learning algorithm based on changing the data distribution to reflect the costs of misclassification.The proposed method is explored by conducting experiments on over 20 data sets. The results show that this approach produces good results in comparison to more complex cost-sensitive decision tree algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Friedman, J.H.: Data Mining and Statistics: What’s the connection? Computing Science and Statistics 29(1), 3–9 (1998)
Google Scholar
Pearl, J.: Embracing Causality in Formal Reasoning. In: AAAI, pp. 369–373 (1987)
Google Scholar
Vadera, S., Ventura, D.: A Comparison of Cost-Sensitive Decision Tree Learning Algorithms. In: Second European Conference in Intelligent Management Systems in Operations, July 3-4, pp. 79–86. University of Salford, Operational Research Society, Birmingham (2001)
Google Scholar
Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Computing Surveys (CSUR) 45(2), 16:1–16:35 (2013)
Google Scholar
Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 435–442. IEEE (2003a)
Google Scholar
Zadrozny, B., Langford, J., Abe, N.: A simple method for cost-sensitive learning. IBM Technical Report RC22666 (2003b)
Google Scholar
Sheng, V.S., Ling, C.X.: Roulette sampling for cost-sensitive learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 724–731. Springer, Heidelberg (2007)
Chapter Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Philip Kegelmeyer, W.: SMOTE: Synthetic minority over-sampling technique, pp. 1106–1813 (2011)
Google Scholar
Ma, G.-Z., Song, E., Hung, C.-C., Su, L., Huang, D.-S.: Multiple costs based decision making with back-propagation neural networks. Decision Support Systems 52(3), 657–663 (2012)
Article Google Scholar
Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II, vol. 2, pp. 2–1 (2003)
Google Scholar
Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, p. 11 (2003)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML, vol. 97, pp. 179–186 (1997)
Google Scholar
Ling, C.X., Li, C.: Data Mining for Direct Marketing: Problems and Solutions. In: KDD, vol. 98, pp. 73–79 (1998)
Google Scholar
Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)
Google Scholar
Vadera, S.: CSNL: A cost-sensitive non-linear decision tree algorithm. ACM Transactions on Knowledge Discovery from Data (TKDD) 4(2), 6 (2010)
Article Google Scholar
Pazzani, M.J., Merz, C.J., Murphy, P.M., Ali, K., Hume, T., Brunk, C.: Reducing Misclassification Costs. In: ICML, vol. 94, pp. 217–225 (1994)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter 6(1), 20–29 (2004)
Article Google Scholar
Agarwal, A.: Selective sampling algorithms for cost-sensitive multiclass prediction. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1220–1228 (2013)
Google Scholar
Fayyad, U., Irani, K.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceedings of the International Joint Conference on Uncertainty in AI, pp. 1022–1027 (1993)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/
Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17(1), pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

The School of Computing, Science and Engineering, Salford University, Manchester, UK
Eman Nashnush & Sunil Vadera

Authors

Eman Nashnush
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Vadera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eman Nashnush .

Editor information

Editors and Affiliations

Department of Information System Faculty of Comp. Sci. & Info. Tech., University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan
Faculty of Comp. Sci. and Info. Tech, Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Rozaida Ghazali
Faculty of Comp. Sci. and Info. Tech., Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Mustafa Mat Deris

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nashnush, E., Vadera, S. (2014). Cost-Sensitive Bayesian Network Learning Using Sampling. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-07692-8_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07691-1
Online ISBN: 978-3-319-07692-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics