A New Estimator of the Mahalanobis Distance and its Application to Classification Error Rate Estimation

Gvardinskas, Mindaugas

doi:10.1007/978-3-319-46254-7_25

A New Estimator of the Mahalanobis Distance and its Application to Classification Error Rate Estimation

Mindaugas Gvardinskas¹²

Conference paper
First Online: 22 September 2016

1228 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 639))

Abstract

A well known category of classification error rate estimators is so called parametric error rate estimators. These estimators are typically expressed as functions of the training sample size, the dimensionality of the observation vector and the Mahalanobis distance between the classes. However, all parametric classification error rate estimators are biased and the main source of this bias is the estimate of the Mahalanobis distance. In this paper we propose a new Mahalanobis distance estimation method that is designed for use in parametric classification error rate estimators. Experiments with real world and synthetic data sets show that new estimator helps to reduce the bias of the most common parametric classification error rate estimators. Additionally, non-parametric classification error rate estimators, such as resubstitution, repeated 10-fold cross-validation and leave-one-out are outperformed (in terms of root-mean-square error) by parametric estimators that use new estimates of the Mahalanobis distance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2015). http://archive.ics.uci.edu/ml
Braga-Neto, U., Dougherty, E.: Is cross-validation valid for small sample microarray classification? Bioinform. 20(3), 374–380 (2004)
Article Google Scholar
Breukelen, M., Duin, R.P.V., Tax, D.M.J., Hartog, J.E.: Handwritten digit recognition by combined classifiers. Kybernetika 34, 381–386 (1998)
MATH Google Scholar
Chen, Y., Wang, H., Zhang, J., Garty, G., Simaan, N., Yao, Y.L., Brenner, D.J.: Automated recognition of robotic manipulation failures in high-throughput biodosimetry tool. Expert Syst. Appl. 39, 9602–9611 (2012)
Article Google Scholar
Dougherty, E., Sima, C., Hua, J., Hanczar, B., Braga-Neto, U.: Performance of error estimators for classification. Curr. Bioinform. 5(1), 53–67 (2010)
Article Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2000)
MATH Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002)
Article MathSciNet MATH Google Scholar
Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
Article Google Scholar
Gvardinskas, M.: Weighted classification error rate estimator for the euclidean distance classifier. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2015. CCIS, vol. 538, pp. 343–355. Springer, Heidelberg (2015)
Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137–1143 (1995)
Google Scholar
Lachenbruch, P., Mickey, R.: Estimation of error rates in discriminant analysis. Technometrics 10(1), 1–11 (1968)
Article MathSciNet Google Scholar
Lucas, D.D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models. Geoscientific Model Dev. 6, 1157–1171 (2013)
Article Google Scholar
Raudys, S.: Statistical and Neural Classifiers. An Integrated Approach to Design. Springer, London (2001)
Book MATH Google Scholar
Raudys, S., Young, D.M.: Results in statistical discriminant analysis: A review of the former soviet union literature. J. Multivar. Anal. 89, 1–35 (2004)
Article MathSciNet MATH Google Scholar
Schiavo, R.A., Hand, D.J.: Ten more years of error rate research. Int. Stat. Rev. 68(3), 295–310 (2000)
Article MATH Google Scholar
Sima, C., Dougherty, E.: Optimal convex error estimators for classification. Pattern Recogn. 39(6), 1763–1780 (2006)
Article MATH Google Scholar
Smith, C.: Some examples of discrimination. Ann. Eugenics 18, 272–282 (1947)
MathSciNet Google Scholar
Toussaint, G., Sharpe, P.: An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis. Comput. Biol. Med. 4, 269–278 (1975)
Article Google Scholar
Wyman, F.J., Young, D.M., Turner, D.W.: A comparison of asymptotic error rate expansions for the sample linear discriminant function. Pattern Recogn. 23(7), 775–783 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of System Analysis, Vytautas Magnus University, Vileikos Street 8, 44404, Kaunas, Lithuania
Mindaugas Gvardinskas

Authors

Mindaugas Gvardinskas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mindaugas Gvardinskas .

Editor information

Editors and Affiliations

Kaunas University of Technology , Kaunas, Lithuania
Giedre Dregvaite
Kaunas University of Technology , Kaunas, Lithuania
Robertas Damasevicius

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gvardinskas, M. (2016). A New Estimator of the Mahalanobis Distance and its Application to Classification Error Rate Estimation. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-46254-7_25
Published: 22 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46253-0
Online ISBN: 978-3-319-46254-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics