Skip to main content
Log in

Classification with the pot–pot plot

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

A Publisher Correction to this article was published on 12 September 2019

This article has been updated

Abstract

We propose a procedure for supervised classification that is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class’s prior probability. The method transforms the data to a potential–potential (pot–pot) plot, where each data point is mapped to a vector of potentials. Separation of the classes, as well as classification of new data points, is performed on this plot. For this, either the \(\alpha \)-procedure (\(\alpha \)-P) or k-nearest neighbors (k-NN) are employed. For data that are generated from continuous distributions, these classifiers prove to be strongly Bayes-consistent. The potentials depend on the kernel and its bandwidth used in the density estimate. We investigate several variants of bandwidth selection, including joint and separate pre-scaling and a bandwidth regression approach. The new method is applied to benchmark data from the literature, including simulated data sets as well as 50 sets of real data. It compares favorably to known classification methods such as LDA, QDA, max kernel density estimates, k-NN, and DD-plot classification using depth functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Change history

  • 12 September 2019

    Unfortunately, due to a technical error, the articles published in issues 60:2 and 60:3 received incorrect pagination. Please find here the corrected Tables of Contents. We apologize to the authors of the articles and the readers.

Notes

  1. \({\mathbf z}^T\) denotes the transpose of \({\mathbf z}\).

References

  • Aizerman MA, Braverman EM, Rozonoer LI (1970) The method of potential functions in the theory of machine learning. Nauka, Moscow

    MATH  Google Scholar 

  • Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2: 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Cuesta-Albertos JA, Febrero-Bande M, de la Fuente MO (2016) The DD\(^G\)-classifier in the functional setting. arXiv:1501.00372

  • Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat 22(3):481–496

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York

    Book  MATH  Google Scholar 

  • Duong T (2007) ks: kernel density estimation and kernel discriminant analysis for multivariate data in R. J Stat Softw 21:1–16

    Article  Google Scholar 

  • Dutta S, Chaudhuri P, Ghosh AK (2012) Classification using localized spatial depth with multiple localization. Mimeo, New York

    Google Scholar 

  • Fraiman R, Meloche J (1999) Multivariate L-estimation. Test 8:255–317

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman JH (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min Knowl Discov 1:55–77

    Article  Google Scholar 

  • Härdle W, Müller M, Sperlich S, Werwatz A (2004) Nonparametric and semiparametric models. Springer, New York

    Book  MATH  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2014) Fast nonparametric classification based on data depth. Stat Pap 55:49–69

    Article  MathSciNet  MATH  Google Scholar 

  • Li J, Cuesta-Albertos JA, Liu RY (2012) DD-classifier: nonparametric classification procedure based on DD-plot. J Am Stat Assoc 107:737–753

    Article  MathSciNet  MATH  Google Scholar 

  • Mosler K (2013) Depth statistics. In: Becker C, Fried R, Kuhnt S (eds) Robustness and complex data structures: Festschrift in Honour of Ursula Gather. Springer, Berlin, pp 17–34

    Chapter  Google Scholar 

  • Mozharovskyi P, Mosler K, Lange T (2015) Classifying real-world data with the \(DD\alpha \)-procedure. Adv Data Anal Classif 9:287–314

    Article  MathSciNet  MATH  Google Scholar 

  • Paindaveine D, Van Bever G (2013) From depth to local depth: a focus on centrality. J Am Stat Assoc 108:1105–1119

    Article  MathSciNet  MATH  Google Scholar 

  • Paindaveine D, Van Bever G (2015) Nonparametrically consistent depth-based classifiers. Bernoulli 21:62–82

    Article  MathSciNet  MATH  Google Scholar 

  • Pokotylo O, Mozharovskyi P, Dyckerhoff R (2016) Depth and depth-based classification with R-package ddalpha. arXiv:1608.04109

  • Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. Wiley, New York

    Book  MATH  Google Scholar 

  • Serfling R (2006) Depth functions in nonparametric multivariate inference. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol 72

  • Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Vencalek O (2014) New depth-based modification of the k-nearest neighbour method. SOP Trans Stat Anal 1:131–138

    Article  Google Scholar 

  • Wand MP, Jones MC (1993) Comparison of smoothing parameterizations in bivariate kernel density estimation. J Am Stat Assoc 88:520–528

    Article  MathSciNet  MATH  Google Scholar 

  • Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are grateful to Tatjana Lange and Pavlo Mozharovskyi for the active discussion of this paper. The work of Oleksii Pokotylo is supported by the Cologne Graduate School of Management, Economics and Social Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleksii Pokotylo.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 253 KB)

Appendix

Appendix

Fig. 6
figure 6

Examples of bandwidths-to-error plots (simulated data). (Color figure online)

figure a

See detailed description under Fig. 7.

Fig. 7
figure 7

Examples of bandwidths-to-errors plots (real data). (Color figure online)

See the legend under Fig. 6.

The bandwidths-to-errors plots illustrate the dependencies between the selected kernel bandwidths and the classification errors for particular data sets using diverse DD-classifiers using joint (left column of panels) and separate (other panels) scaling.

In the joint scaling case the abscissa represents the logarithm of the bandwidth parameter \(\log _{10}h^2\,\), and the ordinate the error rate. The rule of thumb (ROT) errors are shown with bold points.

For the separate scaling case the axes present the \(\log _{10}h_i^2\) bandwidth of the first and the second classes kernels, respectively. The colors correspond to the classification errors achieved with these bandwidth combinations, where red corresponds to the highest error rate, violet to the lowest, and the colors in between are in the rainbow order. The black points represent the rule of thumb (ROT) bandwidths.

We search for the relationship between the bandwidth parameters using regressions. At first we calculate the classification error rate at 25 bandwidth points divided into five sets orthogonal to the main diagonal. Then the minimum is found over each set and a regression is computed to find the proper relation between the bandwidths. We use the found relationship to estimate error rates along the regression line, iterating one bandwidth parameter and calculating the other. For comparing the performance of this approach (pot-pot regressive separate), to joint and separate scaling; see Fig. 11.

Fig. 8
figure 8

Efficiency of the methods. For the DD- and pot–pot classifiers the errors of the a-classifier are given

Fig. 9
figure 9

Efficiency of the pot–pot classifiers

Fig. 10
figure 10

Efficiency of DD-classifiers

Fig. 11
figure 11

Efficiency of separate scaling with different bandwidth regressions over joint scaling

The minimum errors are found in Table 2 for simulated and in Table 4 for real data.

The index is the relation of the error rates of the chosen classifier and the reference classifier. Here and in the following figures we take the Bayes risk as the reference for the simulated data and LDA – for the real data. The index measures the relative efficiency of a classifier compared to the reference for a particular data set (the more efficient classifier has smaller index). For each classifier a boxplot is built that illustrates the distribution of the efficiency index over all data sets.

Table 1 Error rates (in %) of different classifiers for simulated data sets. Columns (6)–(10) and (13)–(15): \(\alpha \)-classifier in the DD- and pot–pot plots
Table 2 Error rates (in %) of the pot–pot classifiers for simulated data sets

The boxplots in Fig. 11 show minimal errors using the same bandwidth regressions as in the description of Fig. 6. The first and the last rows show the global minima for resp. separate and joint scaling. The classifier using joint scaling is taken as the reference.

Tables 1 and 3 show errors from using the different methods. The methods are grouped by type, and the best values within each group are marked black. The best classifiers for the particular data set are additionally underlined.

In the tables we use the following abbreviations for the data depths: HS for halfspace, Mah for Mahalanobis, Proj for projection and Spat for spatial. Dknn abbreviates the depth-based k-NN of Paindaveine and Van Bever (2015).

Table 3 Error rates (in %) of different classifiers for real data sets. Columns (7)–(11) and (14)–(16): \(\alpha \)-classifier in the DD- and pot–pot plots
Table 4 Error rates (in %) of the pot–pot classifiers for real data sets
Fig. 12
figure 12

Calculation time of a single data point for various depth functions and potential, on the logarithmic time scale

Fig. 13
figure 13

Performance of the KDE and the pot-pot classifiers in the multidimensional space

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pokotylo, O., Mosler, K. Classification with the pot–pot plot. Stat Papers 60, 903–931 (2019). https://doi.org/10.1007/s00362-016-0854-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-016-0854-8

Keywords

Mathematics Subject Classification

Navigation