Skip to main content

Cost-Sensitive Bayesian Network Learning Using Sampling

  • Conference paper
Recent Advances on Soft Computing and Data Mining

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 287))

Abstract

A significant advance in recent years has been the development of cost-sensitive decision tree learners, recognising that real world classification problems need to take account of costs of misclassification and not just focus on accuracy. The literature contains well over 50 cost-sensitive decision tree induction algorithms, each with varying performance profiles. Obtaining good Bayesian networks can be challenging and hence several algorithms have been proposed for learning their structure and parameters from data. However, most of these algorithms focus on learning Bayesian networks that aim to maximise the accuracy of classifications. Hence an obvious question that arises is whether it is possible to develop cost-sensitive Bayesian networks and whether they would perform better than cost-sensitive decision trees for minimising classification cost? This paper explores this question by developing a new Bayesian network learning algorithm based on changing the data distribution to reflect the costs of misclassification.The proposed method is explored by conducting experiments on over 20 data sets. The results show that this approach produces good results in comparison to more complex cost-sensitive decision tree algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Friedman, J.H.: Data Mining and Statistics: What’s the connection? Computing Science and Statistics 29(1), 3–9 (1998)

    Google Scholar 

  2. Pearl, J.: Embracing Causality in Formal Reasoning. In: AAAI, pp. 369–373 (1987)

    Google Scholar 

  3. Vadera, S., Ventura, D.: A Comparison of Cost-Sensitive Decision Tree Learning Algorithms. In: Second European Conference in Intelligent Management Systems in Operations, July 3-4, pp. 79–86. University of Salford, Operational Research Society, Birmingham (2001)

    Google Scholar 

  4. Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Computing Surveys (CSUR) 45(2), 16:1–16:35 (2013)

    Google Scholar 

  5. Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 435–442. IEEE (2003a)

    Google Scholar 

  6. Zadrozny, B., Langford, J., Abe, N.: A simple method for cost-sensitive learning. IBM Technical Report RC22666 (2003b)

    Google Scholar 

  7. Sheng, V.S., Ling, C.X.: Roulette sampling for cost-sensitive learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 724–731. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Philip Kegelmeyer, W.: SMOTE: Synthetic minority over-sampling technique, pp. 1106–1813 (2011)

    Google Scholar 

  9. Ma, G.-Z., Song, E., Hung, C.-C., Su, L., Huang, D.-S.: Multiple costs based decision making with back-propagation neural networks. Decision Support Systems 52(3), 657–663 (2012)

    Article  Google Scholar 

  10. Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II, vol. 2, pp. 2–1 (2003)

    Google Scholar 

  11. Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, p. 11 (2003)

    Google Scholar 

  12. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML, vol. 97, pp. 179–186 (1997)

    Google Scholar 

  13. Ling, C.X., Li, C.: Data Mining for Direct Marketing: Problems and Solutions. In: KDD, vol. 98, pp. 73–79 (1998)

    Google Scholar 

  14. Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)

    Google Scholar 

  15. Vadera, S.: CSNL: A cost-sensitive non-linear decision tree algorithm. ACM Transactions on Knowledge Discovery from Data (TKDD) 4(2), 6 (2010)

    Article  Google Scholar 

  16. Pazzani, M.J., Merz, C.J., Murphy, P.M., Ali, K., Hume, T., Brunk, C.: Reducing Misclassification Costs. In: ICML, vol. 94, pp. 217–225 (1994)

    Google Scholar 

  17. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter 6(1), 20–29 (2004)

    Article  Google Scholar 

  18. Agarwal, A.: Selective sampling algorithms for cost-sensitive multiclass prediction. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1220–1228 (2013)

    Google Scholar 

  19. Fayyad, U., Irani, K.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceedings of the International Joint Conference on Uncertainty in AI, pp. 1022–1027 (1993)

    Google Scholar 

  20. Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/

  21. Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17(1), pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eman Nashnush .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Nashnush, E., Vadera, S. (2014). Cost-Sensitive Bayesian Network Learning Using Sampling. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07692-8_44

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07691-1

  • Online ISBN: 978-3-319-07692-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics