Abstract
A well known category of classification error rate estimators is so called parametric error rate estimators. These estimators are typically expressed as functions of the training sample size, the dimensionality of the observation vector and the Mahalanobis distance between the classes. However, all parametric classification error rate estimators are biased and the main source of this bias is the estimate of the Mahalanobis distance. In this paper we propose a new Mahalanobis distance estimation method that is designed for use in parametric classification error rate estimators. Experiments with real world and synthetic data sets show that new estimator helps to reduce the bias of the most common parametric classification error rate estimators. Additionally, non-parametric classification error rate estimators, such as resubstitution, repeated 10-fold cross-validation and leave-one-out are outperformed (in terms of root-mean-square error) by parametric estimators that use new estimates of the Mahalanobis distance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2015). http://archive.ics.uci.edu/ml
Braga-Neto, U., Dougherty, E.: Is cross-validation valid for small sample microarray classification? Bioinform. 20(3), 374–380 (2004)
Breukelen, M., Duin, R.P.V., Tax, D.M.J., Hartog, J.E.: Handwritten digit recognition by combined classifiers. Kybernetika 34, 381–386 (1998)
Chen, Y., Wang, H., Zhang, J., Garty, G., Simaan, N., Yao, Y.L., Brenner, D.J.: Automated recognition of robotic manipulation failures in high-throughput biodosimetry tool. Expert Syst. Appl. 39, 9602–9611 (2012)
Dougherty, E., Sima, C., Hua, J., Hanczar, B., Braga-Neto, U.: Performance of error estimators for classification. Curr. Bioinform. 5(1), 53–67 (2010)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2000)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002)
Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
Gvardinskas, M.: Weighted classification error rate estimator for the euclidean distance classifier. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2015. CCIS, vol. 538, pp. 343–355. Springer, Heidelberg (2015)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137–1143 (1995)
Lachenbruch, P., Mickey, R.: Estimation of error rates in discriminant analysis. Technometrics 10(1), 1–11 (1968)
Lucas, D.D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models. Geoscientific Model Dev. 6, 1157–1171 (2013)
Raudys, S.: Statistical and Neural Classifiers. An Integrated Approach to Design. Springer, London (2001)
Raudys, S., Young, D.M.: Results in statistical discriminant analysis: A review of the former soviet union literature. J. Multivar. Anal. 89, 1–35 (2004)
Schiavo, R.A., Hand, D.J.: Ten more years of error rate research. Int. Stat. Rev. 68(3), 295–310 (2000)
Sima, C., Dougherty, E.: Optimal convex error estimators for classification. Pattern Recogn. 39(6), 1763–1780 (2006)
Smith, C.: Some examples of discrimination. Ann. Eugenics 18, 272–282 (1947)
Toussaint, G., Sharpe, P.: An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis. Comput. Biol. Med. 4, 269–278 (1975)
Wyman, F.J., Young, D.M., Turner, D.W.: A comparison of asymptotic error rate expansions for the sample linear discriminant function. Pattern Recogn. 23(7), 775–783 (1990)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gvardinskas, M. (2016). A New Estimator of the Mahalanobis Distance and its Application to Classification Error Rate Estimation. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-46254-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46253-0
Online ISBN: 978-3-319-46254-7
eBook Packages: Computer ScienceComputer Science (R0)