Individualized Error Estimation for Classification and Regression Models

Buza, Krisztian; Nanopoulos, Alexandros; Schmidt-Thieme, Lars

doi:10.1007/978-3-642-24466-7_19

Krisztian Buza⁵,
Alexandros Nanopoulos⁵ &
Lars Schmidt-Thieme⁵

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2500 Accesses

Abstract

Estimating the error of classification and regression models is one of the most crucial tasks in machine learning. While the global error is capable to measure the quality of a model, local error estimates are even more interesting: on the one hand they contribute to better understanding of prediction models (where does and where does not work the model well), on the other hand they may provide powerful means to build successful ensembles that select for each region the most appropriate model(s). In this paper we introduce an extremely localized error estimation, called individualized error estimation (IEE), that estimates the error of a prediction model M for each instance x individually. To solve the problem of individualized error estimation, we apply a meta model \({M}^{{_\ast}}\). We systematically investigate various combinations of elementary models M and meta models M ^∗ on publicly available real-world data sets. Further, we illustrate the power of IEE in the context of time series classification: on 35 publicly available real-world time series data sets, we show that IEE is capable to enhance state-of-the art time series classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Hubs are time series that appear most frequently as nearest neighbors of other time series. Denote the set of time series for which t is the nearest neighbor as N _t. A hub t is a bad hub if its class label is different from the class labels of many time series in N _t. See also (Radovanovic et al. 2010).

References

Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: Experimental comparison of representations and distance measures. VLDB Endowment 1(2):1542–1552
Google Scholar
Domeniconi C, Gunopulos D (2001) Adaptive nearest neighbor classification using support vector machines. Adv NIPS 14:665–672
Google Scholar
Domeniconi C, Peng J, Gunopulos D (2002) Locally adaptive metric nearest-neighbor classification. IEEE Trans Pattern Anal Machine Intell 24(9):1281–1285
Article Google Scholar
Duffy N, Helmbold D (2002) Boosting methods for regression. Mach Learn 47:153–200
Article MATH Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. Tech. rep., University of California, School of Information and Computer Sciences, Irvine, URL http://archive.ics.uci.edu/ml
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: An update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Hastie T, Tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell 18(6):607–616
Article Google Scholar
Jain AK, Dubes RC, Chen CC (1987) Bootstrap techniques for error estimation. IEEE Trans Pattern Anal Mach Intell 5(9):606–633
Google Scholar
Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307
Article Google Scholar
Radovanovic M, Nanopoulos A, Ivanovic M (2010) Time-series classification in many intrinsic dimensions. In: Proc. 10th SIAM International Conference on Data Mining, SIAM, pp 677–688
Google Scholar
Tsuda K, Rätsch G, Mika S, Müller KR (2001) Learning to predict the leave-one-out error of kernel based classifiers. ICANN 2001, LNCS 2130/2001:331–338
Google Scholar
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proc. 23th Int’l. Conf. on Machine Learning, ACM, pp 1033–1040
Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Marienburger Platz 22, 31141, Hildesheim, Germany
Krisztian Buza, Alexandros Nanopoulos & Lars Schmidt-Thieme

Authors

Krisztian Buza
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Nanopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Lars Schmidt-Thieme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krisztian Buza .

Editor information

Editors and Affiliations

Fak. Wirtschaftswissenschaften, Inst. Entscheidungstheorieund, Universität Karlsruhe (TH), Kaiserstr. 12, Karlsruhe, 76128, Germany
Wolfgang A. Gaul
Insitute for Information Systems, and Management (IISM), Karlsruhe Institute of Technology (KIT), Kaiserstr. 12, Karlsruhe, 76131, Baden-Württemberg, Germany
Andreas Geyer-Schulz
, Information Systems, University ofHildesheim, Marienburger Platz 22, Hildesheim, 31141, Germany
Lars Schmidt-Thieme
Institute for Information Systems, and Management (IISM), Karlsruhe Institute of Technology (KIT), Kaiserstraße 12, Karlsruhe, 76128, Germany
Jonas Kunze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buza, K., Nanopoulos, A., Schmidt-Thieme, L. (2012). Individualized Error Estimation for Classification and Regression Models. In: Gaul, W., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24466-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-24466-7_19
Published: 05 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24465-0
Online ISBN: 978-3-642-24466-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics