Nonparametric “anti-Bayesian” quantile-based pattern classification

Mahmoudi, Fatemeh; Razmkhah, Mostafa; Oommen, B. John

doi:10.1007/s10044-020-00903-7

Nonparametric “anti-Bayesian” quantile-based pattern classification

Theoretical advances
Published: 23 June 2020

Volume 24, pages 75–87, (2021)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

181 Accesses
1 Citation
Explore all metrics

Abstract

Parametric and nonparametric pattern recognition have been studied for almost a century based on a Bayesian paradigm, which is, in turn, founded on the principles of Bayes theorem. It is well known that the accuracy of the Bayes classifier cannot be exceeded. Typically, this reduces to comparing the testing sample to mean or median of the respective distributions. Recently, Oommen and his co-authors have presented a pioneering and non-intuitive paradigm, namely that of achieving the classification by comparing the testing sample with another descriptor, which could also be quite distant from the mean. This paradigm has been termed as being “anti-Bayesian,” and it essentially uses the quantiles of the distributions to achieve the pattern recognition. Such classifiers attain the optimal Bayesian accuracy for symmetric distributions even though they operate with a non-intuitive philosophy. While this paradigm has been applied in a number of domains (briefly explained in the body of this paper), its application for nonparametric domains has been limited. This paper explains, in detail, how such quantile-based classification can be extended to the nonparametric world, using both traditional and kernel-based strategies. The paper analyzes the methodology of such nonparametric schemes and their robustness. From a fundamental perspective, the paper utilizes the so-called large sample theory to derive strong asymptotic results that pertain to the equivalence between the parametric and nonparametric schemes for large samples. Apart from the new theoretical results, the paper also presents experimental results demonstrating their power. These results pertain to artificial data sets and also involve a real-life breast cancer data set obtained from the University Hospital Centre of Coimbra. The experimental results clearly confirm the power of the proposed “anti-Bayesian” procedure, especially when approached from a nonparametric perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantile-distribution functions and their use for classification, with application to naïve Bayes classifiers

Article Open access 24 March 2023

Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

On Achieving Near-Optimal “Anti-Bayesian” Order Statistics-Based Classification for Asymmetric Exponential Distributions

Notes

In the last century, there are, indeed, tens of thousands of papers describing the art and science of Bayesian classification—for a myriad of distributions and applications. In this paper, we do not attempt a survey of the field.
We are very grateful to the anonymous referee of the previous version of the paper, who requested this.
Initially, the authors of [31] stated that the classification was based on the order statistics of the distribution, and this was later rectified [33].
With going into too many details, we refer the reader to [5], which is a key reference in this field.
To be fair to the authors of [12, 22, 34, 35], one must grant them the credit that they were able to achieve their nonparametric results by using the “anti-Bayesian” paradigm in multidimensions, as opposed to unidimensions, as we have done here!
The proof of the theorem is omitted, since it is found in the literature. Also, we refer the interested reader to [6] for more information about the various types of convergence.
It is pertinent to mention that the accuracy of any classifier can and will never exceed that of a Bayesian classifier. The amazing thing is that we have been able to attain to an accuracy quite close to the optimal, even though we have worked in a counterintuitive manner, and also made no assumption about the underlying distribution!
For more details about outliers in statistical analysis, we refer the reader to [8, 16, 25, 27].
The data may be obtained from the UCI Repository of Machine Learning databases at archive.ics.uci.edu/ml.

References

Ahsanullah M, Nevzorov VB (2005) Order statistics: examples and exercises. Nova Publishers, Hauppauge
MATH Google Scholar
Aitkin M, Wilson GT (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22(3):325–331
Article Google Scholar
Alfred M (2001) Stochastic ordering of multivariate normal distributions. Ann Inst Stat Math 53(3):567–575
Article MathSciNet Google Scholar
Altman N, Léger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46(2):195–214
Article MathSciNet Google Scholar
Arnold BC, Balakrishnan N, Nagaraja HN (2008) A first course in order statistics. SIAM, Philadelphia
Book Google Scholar
Athreya KB, Lahiri SN (2006) Measure theory and probability theory. Springer, Berlin
MATH Google Scholar
Azzalini A (1981) A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68(1):326–328
Article MathSciNet Google Scholar
Barnett V, Lewis T (1978) Outliers in statistical data. Wiley, Hoboken
MATH Google Scholar
Binder DA (1978) Bayesian cluster analysis. Biometrika 65(1):31–38
Article MathSciNet Google Scholar
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Berlin, Heidelberg
MATH Google Scholar
David HA, Nagaraja HN (2004) Order statistics. Wiley, Hoboken
Google Scholar
Hammer H, Yazidi A, Oommen BJ (2017) “Anti-Bayesian” flat and hierarchical clustering using symmetric quantiloids. Inf Sci 418–419:495–512
Article MathSciNet Google Scholar
Hawkins DM (1980) Identification of outliers. Chapman and Hall, London
Book Google Scholar
Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods. Wiley, Hoboken
MATH Google Scholar
Hu L (2015) A note on order statistics-based parametric pattern classification. Pattern Recognit 48(1):43–49
Article Google Scholar
Huber PJ (2011) Robust statistics. In: International encyclopedia of statistical science. pp 1248–1251
Kothari CR (2004) Research methodology: methods and techniques. New Age International, New Delhi
Google Scholar
Leech NL, Onwuegbuzie AJ (2002) A call for greater use of nonparametric statistics. In: Mid-south educational research association annual meeting
Meegen CV, Schnackenberg S, Ligges U (2019) Unequal priors in linear discriminant analysis. J Classif. https://doi.org/10.1007/s00357-019-09336-2
Article Google Scholar
Nguyen-Trang T, Vo-Van T (2017) A new approach for determining the prior probabilities in the classification problem by Bayesian method. Adv Data Anal Classif 11:629–643
Article MathSciNet Google Scholar
Oommen BJ, Thomas A (2014) Optimal order statistics-based “anti-Bayesian” parametric pattern classification for the exponential family. Pattern Recognit 47:40–55
Article Google Scholar
Oommen BJ, Khoury R, Schmidt A (2015) Text classification using novel “anti-Bayesian” techniques. In: Nunez M, Nguyen N, Camacho D, Trawinski B (eds) Computational collective intelligence. Lecture notes in computer science, vol 9329. pp 1–15
Patrício M, Pereira J, Crisóstomo J, Matafome P, Gomes M, Seiça R, Caramelo F (2018) Using resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer 18:29
Article Google Scholar
Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27(3):832–837
Article MathSciNet Google Scholar
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley, Hoboken
MATH Google Scholar
Santhanam V, Morariu VI, Harwood D, Davis LS (2016) A non-parametric approach to extending generic binary classifiers for multi-classification. Pattern Recognit 58:149–158
Article Google Scholar
Scott D W (2004) Partial mixture estimation and outlier detection in data and regression. In: Hubert M, Pison G, Struyf A, Van Aelst S (eds) Theory and applications of recent robust methods. Statistics for industry and technology. pp 297–306
Serfling RJ (2009) Approximation theorems of mathematical statistics. Wiley, Hoboken
MATH Google Scholar
Shaked M, Shanthikumar JG (2007) Stochastic orders. Springer, Berlin
Book Google Scholar
Thomas A, Oommen BJ (2012) Optimal “anti-Bayesian” parametric pattern classification for the exponential family using order statistics criteria. In: Alvarez L, Mejail M, Gomez L, Jacobo J (eds) Progress in pattern recognition, image analysis, computer vision, and applications. CIARP 2012. Lecture notes in computer science, vol 7441. pp 1–13
Thomas A, Oommen BJ (2013) The fundamental theory of optimal “anti-Bayesian” parametric pattern classification using order statistics criteria. Pattern Recognit 46(1):376–388
Article Google Scholar
Thomas A, Oommen B J (2013) Order statistics-based parametric classification for multi-dimensional distributions. Pattern Recognit 46(12):3472–3482
Article Google Scholar
Thomas A, Oommen BJ (2014) Corrigendum to three papers that deal with “anti-Bayesian” pattern recognition. Pattern Recognit 47(6):2301–2302
Article Google Scholar
Thomas A, Oommen BJ (2013) Ultimate order statistics-based prototype reduction schemes. In: Cranefield S, Nayak A (eds) AI 2013: Advances in artificial intelligence. AI 2013. Lecture notes in computer science, vol 8272. pp 421–433
Thomas A, Oommen BJ (2013) A novel Border Identification algorithm based on an “anti-Bayesian” paradigm. In: Wilson R, Hancock E, Bors A, Smith W (eds) Computer analysis of images and patterns. CAIP 2013. Lecture notes in computer science, vol 8047. pp 196–203

Download references

Acknowledgements

We are very grateful to the anonymous referees of the previous version of the paper, who suggested various modifications and changes. Their suggestions have greatly enhanced the quality of this present version.

Author information

Authors and Affiliations

Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, P. O. Box 1159, Mashhad, 91775, Iran
Fatemeh Mahmoudi & Mostafa Razmkhah
School of Computer Science, Carleton University, Ottawa, K1S 5B6, Canada
B. John Oommen
University of Agder, Grimstad, Norway
B. John Oommen

Authors

Fatemeh Mahmoudi
View author publications
You can also search for this author in PubMed Google Scholar
Mostafa Razmkhah
View author publications
You can also search for this author in PubMed Google Scholar
B. John Oommen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. John Oommen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmoudi, F., Razmkhah, M. & Oommen, B.J. Nonparametric “anti-Bayesian” quantile-based pattern classification. Pattern Anal Applic 24, 75–87 (2021). https://doi.org/10.1007/s10044-020-00903-7

Download citation

Received: 20 January 2020
Accepted: 10 June 2020
Published: 23 June 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10044-020-00903-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric “anti-Bayesian” quantile-based pattern classification

Abstract

Access this article

Similar content being viewed by others

Quantile-distribution functions and their use for classification, with application to naïve Bayes classifiers

Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

On Achieving Near-Optimal “Anti-Bayesian” Order Statistics-Based Classification for Asymmetric Exponential Distributions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonparametric “anti-Bayesian” quantile-based pattern classification

Abstract

Access this article

Similar content being viewed by others

Quantile-distribution functions and their use for classification, with application to naïve Bayes classifiers

Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

On Achieving Near-Optimal “Anti-Bayesian” Order Statistics-Based Classification for Asymmetric Exponential Distributions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation