Advertisement

A Theoretical Analysis of the Peaking Phenomenon in Classification

  • Amin ZollanvariEmail author
  • Alex Pappachen James
  • Reza Sameni
Article
  • 10 Downloads

Abstract

In this work, we analytically study the peaking phenomenon in the context of linear discriminant analysis in the multivariate Gaussian model under the assumption of a common known covariance matrix. The focus is finite-sample setting where the sample size and observation dimension are comparable. Therefore, in order to study the phenomenon in such a setting, we use an asymptotic technique whereby the number of sample points is kept comparable in magnitude to the dimensionality of observations. The analysis provides a more thorough picture of the phenomenon. In particular, the analysis shows that as long as the Relative Cumulative Efficacy of an additional Feature set (RCEF) is greater (less) than the size of this set, the expected error of the classifier constructed using these additional features will be less (greater) than the expected error of the classifier constructed without them. Our result highlights underlying factors of the peaking phenomenon relative to the classifier used in this study and, at the same time, calls into question the classical wisdom around the peaking phenomenon.

Keywords

Peaking phenomenon Linear discriminant analysis Classification error rate Multiple asymptotic analysis 

Notes

Acknowledgements

This material is based in part upon work supported by the Nazarbayev University Faculty Development Competitive Research Grant, under award number SOE2018008.

References

  1. Abend, K., & Harley, T.J.J. (1969). Comments on the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory, 14, 420–423.CrossRefGoogle Scholar
  2. Bowker, A., & Sitgreaves, R. (1961). An asymptotic expansion for the distribution function of the w-classification statistic. In Solomon, H (Ed.) Studies in item analysis and prediction (pp. 292–310): Stanford University Press.Google Scholar
  3. Braga-Neto, U., & Dougherty, E. (2015). Error estimation for pattern recognition. New Jersey: Wiley-IEEE Press.CrossRefGoogle Scholar
  4. Campenhout, J.M.V. (1978). On the peaking of the Hughes mean recognition accuracy: the resolution of an apparent paradox. IEEE Transactions on Systems, Man and Cybernetics, 8, 390–395.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Chandrasekaran, B., & Jain, A.K. (1974). Quantization complexity and independent measurements. IEEE Transactions on Computers, 23, 102–106.CrossRefzbMATHGoogle Scholar
  6. Couplet, R., & Debbah, M. (2013). Signal processing in large systems, a new paradigm. IEEE Signal Processing Magazine, 24–39.Google Scholar
  7. Devroye, L., Gyorfi, L., Lugosi, G. (1996). A probabilistic theory of pattern recognition. New York: Springer.CrossRefzbMATHGoogle Scholar
  8. Duda, R.O., Hart, P.E., Stork, D.G. (2000). Pattern classification. Wiley.Google Scholar
  9. Efron, B. (2005). Bayesian, frequentists, and scientists. Journal of the American Statistical Association, 100, 1–5.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Girko, V.L. (1995). Statistical analysis of observations of increasing dimension. Dordrecht: Kluwer Academic Publishers.CrossRefzbMATHGoogle Scholar
  11. Hirschhorn, J., & Daly, M.J. (2005). Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6, 95–108.CrossRefGoogle Scholar
  12. Hua, J., Xiong, Z., Lowey, J., Suh, E., Dougherty, E.R. (2005). Optimal number of features as a function of sample size for various classification rules. Bioinformatics, 21, 1509–1515.CrossRefGoogle Scholar
  13. Hughes, G.F. (1968). On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory, 14, 55–63.CrossRefGoogle Scholar
  14. Jain, A., & Waller, W. (1978). On the optimal number of features in the classification of multivariate gaussian data. Pattern Recognition, 10, 365–374.CrossRefzbMATHGoogle Scholar
  15. McLachlan, G. (2004). Discriminant analysis and statistical pattern recognition. New York: Wiley.zbMATHGoogle Scholar
  16. Moran, M. (1975). On the expectation of errors of allocation associated with a linear discriminant function. Biometrika, 62, 141–148.MathSciNetCrossRefzbMATHGoogle Scholar
  17. Niu, G. (2017). Data-driven technology for engineering systems health management. Beijing: Science Press-Springer.CrossRefGoogle Scholar
  18. Raudys, S. (1967). On determining training sample size of a linear classifier. Computer Systems, 28, 79–87. In Russian.Google Scholar
  19. Raudys, S. (2001). Statistical and neural classifiers an integrated approach to design. London: Springer.CrossRefzbMATHGoogle Scholar
  20. Raudys, S.J., & Jain, A.K. (1991). Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 252–264.CrossRefGoogle Scholar
  21. Raudys, S., & Young, D.M. (2004). Results in statistical discriminant analysis: a review of the former soviet union literature. Journal of Multivariate Analysis, 89, 1–35.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Rubio, F., Mestre, X., Palomar, D.P. (2012). Performance analysis and optimal selection of large minimum variance portfolios under estimation risk. IEEE Journal of Selected Topics Signal Process, 6, 337–350.CrossRefGoogle Scholar
  23. Serdobolskii, V.I. (1983). On minimum error probability in discriminant analysis. Soviet. Math. Dokl., 27, 720–725.Google Scholar
  24. Serdobolskii, V.I. (2000). Multivariate statistical analysis: a high-dimensional approach. Kluwer Academic Publishers.Google Scholar
  25. Serdobolskii, V. (2008). Multiparametric statistics. Elsevier.Google Scholar
  26. Sitgreaves, R. (1961). Some results on the distribution of the W-classification statistics. In Solomon, H. (Ed.) Studies in item analysis and prediction (pp. 241–251). Stanford: Stanford University Press.Google Scholar
  27. Sorum, M.J. (1973). Estimating the expected probability of misclassification for a rule based on the linear discriminant function: univariate normal case. Technometrics, 15, 329–339.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Waller, W., & Jain, A. (1978). On the monotonicity of the performance of Bayesian classifiers. IEEE Transactions on Information Theory, 24, 392–394.MathSciNetCrossRefzbMATHGoogle Scholar
  29. Wigner, E.P. (1958). On the distribution of the roots of certain symmetric matrices. Annals of Mathematics, 67, 325–327.MathSciNetCrossRefzbMATHGoogle Scholar
  30. Zhang, M., Rubio, F., Palomar, D.P., Mestre, X. (2013). Finite-sample linear filter optimization in wireless communications and financial systems. IEEE Transactions on Signal Processing, 61, 5014–5025.MathSciNetCrossRefzbMATHGoogle Scholar
  31. Zheng, N., & Xue, J. (2009). Statistical learning and pattern analysis for image and video processing. New York: Springer.CrossRefzbMATHGoogle Scholar
  32. Zondervan, K.T., & Cardon, L.R. (2004). The complex interplay among factors that influence allelic association. Nature Reviews Genetics, 5, 89–100.CrossRefGoogle Scholar
  33. Zollanvari, A., & Dougherty, E.R. (2015). Generalized consistent error estimator of linear discriminant analysis. IEEE Transactions on Signal Processing, 63, 2804–2814.MathSciNetCrossRefzbMATHGoogle Scholar
  34. Zollanvari, A., Braga-Neto, U.M., Dougherty, E.R. (2011). Analytic study of performance of error estimators for linear discriminant analysis. IEEE Transactions on Signal Processing, 59, 4238–4255.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Classification Society of North America 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringNazarbayev UniversityNur-SultanKazakhstan
  2. 2.Department of Computer Science & Engineering and Information TechnologyShiraz UniversityShirazIran

Personalised recommendations