Cost Matters: A New Example-Dependent Cost-Sensitive Logistic Regression Model

Günnemann, Nikou; Pfeffer, Jürgen

doi:10.1007/978-3-319-57454-7_17

Nikou Günnemann¹⁹ &
Jürgen Pfeffer¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10234))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4005 Accesses
3 Citations

Abstract

Connectivity and automation are evermore part of today’s cars. To provide automation, many gauges are integrated in cars to collect physical readings. In the automobile industry, the gathered multiple datasets can be used to predict whether a car repair is needed soon. This information gives drivers and retailers helpful information to take action early. However, prediction in real use cases shows new challenges: misclassified instances have not equal but different costs. For example, incurred costs for not predicting a necessarily needed tire change are usually higher than predicting a tire change even though the car could still drive thousands of kilometers. To tackle this problem, we introduce a new example-dependent cost sensitive prediction model extending the well-established idea of logistic regression. Our model allows different costs of misclassified instances and obtains prediction results leading to overall less cost. Our method consistently outperforms the state-of-the-art in example-dependent cost-sensitive logistic regression on various datasets. Applying our methods to vehicle data from a large European car manufacturer, we show cost savings of about 10%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that we do not have to explicitly distinguish between ${c_{i}^{FP}}$ and ${c_{i}^{FN}}$. If $y_i=0$, then ${c_{i}^{FP}} = c_i$, if $y_i=1$, then ${c_{i}^{FN}}=c_i$. For a single instance, ${c_{i}^{FP}}$ and ${c_{i}^{FN}}$ can never occur together.
2.
The case $y_i=0$ is equivalent; only mirrored. W.l.o.g. we consider in the following only $y_i=1$.
3.
More precise, the average loss for correctly classified instances would be $2\cdot T_{log}$.
4.
https://goo.gl/U2Uwz2.
5.
http://www.kaggle.com/c/GiveMeSomeCredit/.
6.
Due to nondisclosure agreements we unfortunately can not provide more details on the dataset. The two other datasets studied in this work are publicly available.

References

Zadrozny, B., et al.: Cost-sensitive learning by cost-proportionate example weighting. In: ICDM, pp. 435–442 (2003)
Google Scholar
Günnemann, N., et al.: Robust multivariate autoregression for anomaly detection in dynamic product ratings. In: WWW, pp. 361–372 (2014)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Google Scholar
Haykin, S.: A comprehensive foundation. Neural Netw. 2, 41 (2004)
Google Scholar
Weiss, G.M.: Learning with rare cases and small disjuncts. In: ICML, pp. 558–565 (1995)
Google Scholar
Bahnsen, A.C., et al.: Example-dependent cost-sensitive logistic regression for credit scoring. In: ICMLA, pp. 263–269 (2014)
Google Scholar
Anderson, R.: The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press, Oxford (2007)
Google Scholar
Bahnsen, A.C., et al.: Cost sensitive credit card fraud detection using Bayes minimum risk. In: ICMLA, pp. 333–338 (2013)
Google Scholar
Bahnsen, A.C., et al.: Improving credit card fraud detection with calibrated probabilities. In: SIAM, pp. 677–685 (2014)
Google Scholar
Alejo, R., García, V., Marqués, A.I., Sánchez, J.S., Antonio-Velázquez, J.A.: Making accurate credit risk predictions with cost-sensitive MLP neural networks. In: Casillas, J., Martínez-López, F., Vicari, R., De la Prieta, F. (eds.) Management Intelligent Systems. AISC, vol. 220, pp. 1–8. Springer, Heidelberg (2013). doi:10.1007/978-3-319-00569-0_1
Chapter Google Scholar
Beling, P., et al.: Optimal scoring cutoff policies and efficient frontiers. J. Oper. Res. Soc. 56(9), 1016–1029 (2005)
Article MATH Google Scholar
Oliver, R.M., et al.: Optimal score cutoffs and pricing in regulatory capital in retail credit portfolios. University of Southampton (2009)
Google Scholar
Verbraken, T., et al.: Development and application of consumer credit scoring models using profit-based classification measures. Eur. J. Oper. Res. 238(2), 505–513 (2014)
Article MathSciNet MATH Google Scholar
Lomax, S., et al.: A survey of cost-sensitive decision tree induction algorithms. CSUR 45(2), 16 (2013)
Article MATH Google Scholar
Bahnsen, A.C., et al.: Ensemble of example-dependent cost-sensitive decision trees (2015). arXiv preprint arXiv:1505.04637
Mobley, R.K.: An Introduction to Predictive Maintenance. Butterworth-Heinemann, Oxford (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Nikou Günnemann & Jürgen Pfeffer

Authors

Nikou Günnemann
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Pfeffer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikou Günnemann .

Editor information

Editors and Affiliations

Kangwon National University, Chuncheon, Korea (Republic of)
Jinho Kim
Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
University of Technology Sydney, Sydney, New South Wales, Australia
Longbing Cao
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
University of New South Wales, Sydney, New South Wales, Australia
Xuemin Lin
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

A Appendix

$$\begin{aligned}&\qquad \quad \frac{F_{a_i log^{b_i}}}{F_{a_i log^{b_i}}}=\frac{a_i \varGamma (b_i + 1, 0.6931)}{a_i (\varGamma (b_i +1)- \varGamma (b_i +1, 0.6931))} \mathop {=}\limits ^{!} c_i \cdot \frac{F_{log}}{T_{log}} \\&\Leftrightarrow \varGamma (b_i +1, 0.6931)\mathop {=}\limits ^{!} c_i \cdot \frac{F_{log}}{T_{log}} \cdot \varGamma (b_i +1)- c_i \cdot \frac{F_{log}}{T_{log}} \varGamma (b_i + 1, 0.6931) \\&\qquad \qquad \Leftrightarrow \frac{\varGamma (b_i +1)}{\varGamma (b_i +1, 0.6931)}\mathop {=}\limits ^{!} \frac{1+c_i \cdot \frac{F_{log}}{T_{log}}}{c_i \cdot \frac{F_{log}}{T_{log}}}=1+ \frac{T_{log}}{c_i \cdot F_{log}} \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Günnemann, N., Pfeffer, J. (2017). Cost Matters: A New Example-Dependent Cost-Sensitive Logistic Regression Model. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-57454-7_17
Published: 23 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57453-0
Online ISBN: 978-3-319-57454-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cost Matters: A New Example-Dependent Cost-Sensitive Logistic Regression Model

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation