Skip to main content

Cost Matters: A New Example-Dependent Cost-Sensitive Logistic Regression Model

  • Conference paper
  • First Online:
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10234))

Included in the following conference series:

Abstract

Connectivity and automation are evermore part of today’s cars. To provide automation, many gauges are integrated in cars to collect physical readings. In the automobile industry, the gathered multiple datasets can be used to predict whether a car repair is needed soon. This information gives drivers and retailers helpful information to take action early. However, prediction in real use cases shows new challenges: misclassified instances have not equal but different costs. For example, incurred costs for not predicting a necessarily needed tire change are usually higher than predicting a tire change even though the car could still drive thousands of kilometers. To tackle this problem, we introduce a new example-dependent cost sensitive prediction model extending the well-established idea of logistic regression. Our model allows different costs of misclassified instances and obtains prediction results leading to overall less cost. Our method consistently outperforms the state-of-the-art in example-dependent cost-sensitive logistic regression on various datasets. Applying our methods to vehicle data from a large European car manufacturer, we show cost savings of about 10%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that we do not have to explicitly distinguish between \({c_{i}^{FP}}\) and \({c_{i}^{FN}}\). If \(y_i=0\), then \({c_{i}^{FP}} = c_i\), if \(y_i=1\), then \({c_{i}^{FN}}=c_i\). For a single instance, \({c_{i}^{FP}}\) and \({c_{i}^{FN}}\) can never occur together.

  2. 2.

    The case \(y_i=0\) is equivalent; only mirrored. W.l.o.g. we consider in the following only \(y_i=1\).

  3. 3.

    More precise, the average loss for correctly classified instances would be \(2\cdot T_{log}\).

  4. 4.

    https://goo.gl/U2Uwz2.

  5. 5.

    http://www.kaggle.com/c/GiveMeSomeCredit/.

  6. 6.

    Due to nondisclosure agreements we unfortunately can not provide more details on the dataset. The two other datasets studied in this work are publicly available.

References

  1. Zadrozny, B., et al.: Cost-sensitive learning by cost-proportionate example weighting. In: ICDM, pp. 435–442 (2003)

    Google Scholar 

  2. Günnemann, N., et al.: Robust multivariate autoregression for anomaly detection in dynamic product ratings. In: WWW, pp. 361–372 (2014)

    Google Scholar 

  3. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  4. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)

    Google Scholar 

  5. Haykin, S.: A comprehensive foundation. Neural Netw. 2, 41 (2004)

    Google Scholar 

  6. Weiss, G.M.: Learning with rare cases and small disjuncts. In: ICML, pp. 558–565 (1995)

    Google Scholar 

  7. Bahnsen, A.C., et al.: Example-dependent cost-sensitive logistic regression for credit scoring. In: ICMLA, pp. 263–269 (2014)

    Google Scholar 

  8. Anderson, R.: The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press, Oxford (2007)

    Google Scholar 

  9. Bahnsen, A.C., et al.: Cost sensitive credit card fraud detection using Bayes minimum risk. In: ICMLA, pp. 333–338 (2013)

    Google Scholar 

  10. Bahnsen, A.C., et al.: Improving credit card fraud detection with calibrated probabilities. In: SIAM, pp. 677–685 (2014)

    Google Scholar 

  11. Alejo, R., García, V., Marqués, A.I., Sánchez, J.S., Antonio-Velázquez, J.A.: Making accurate credit risk predictions with cost-sensitive MLP neural networks. In: Casillas, J., Martínez-López, F., Vicari, R., De la Prieta, F. (eds.) Management Intelligent Systems. AISC, vol. 220, pp. 1–8. Springer, Heidelberg (2013). doi:10.1007/978-3-319-00569-0_1

    Chapter  Google Scholar 

  12. Beling, P., et al.: Optimal scoring cutoff policies and efficient frontiers. J. Oper. Res. Soc. 56(9), 1016–1029 (2005)

    Article  MATH  Google Scholar 

  13. Oliver, R.M., et al.: Optimal score cutoffs and pricing in regulatory capital in retail credit portfolios. University of Southampton (2009)

    Google Scholar 

  14. Verbraken, T., et al.: Development and application of consumer credit scoring models using profit-based classification measures. Eur. J. Oper. Res. 238(2), 505–513 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Lomax, S., et al.: A survey of cost-sensitive decision tree induction algorithms. CSUR 45(2), 16 (2013)

    Article  MATH  Google Scholar 

  16. Bahnsen, A.C., et al.: Ensemble of example-dependent cost-sensitive decision trees (2015). arXiv preprint arXiv:1505.04637

  17. Mobley, R.K.: An Introduction to Predictive Maintenance. Butterworth-Heinemann, Oxford (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikou Günnemann .

Editor information

Editors and Affiliations

A Appendix

A Appendix

$$\begin{aligned}&\qquad \quad \frac{F_{a_i log^{b_i}}}{F_{a_i log^{b_i}}}=\frac{a_i \varGamma (b_i + 1, 0.6931)}{a_i (\varGamma (b_i +1)- \varGamma (b_i +1, 0.6931))} \mathop {=}\limits ^{!} c_i \cdot \frac{F_{log}}{T_{log}} \\&\Leftrightarrow \varGamma (b_i +1, 0.6931)\mathop {=}\limits ^{!} c_i \cdot \frac{F_{log}}{T_{log}} \cdot \varGamma (b_i +1)- c_i \cdot \frac{F_{log}}{T_{log}} \varGamma (b_i + 1, 0.6931) \\&\qquad \qquad \Leftrightarrow \frac{\varGamma (b_i +1)}{\varGamma (b_i +1, 0.6931)}\mathop {=}\limits ^{!} \frac{1+c_i \cdot \frac{F_{log}}{T_{log}}}{c_i \cdot \frac{F_{log}}{T_{log}}}=1+ \frac{T_{log}}{c_i \cdot F_{log}} \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Günnemann, N., Pfeffer, J. (2017). Cost Matters: A New Example-Dependent Cost-Sensitive Logistic Regression Model. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57454-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57453-0

  • Online ISBN: 978-3-319-57454-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics