Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data

  • Aijun YangEmail author
  • Yuzhu Tian
  • Yunxian Li
  • Jinguan Lin
Original paper


In this paper, we developed a sparse Bayesian variable selection in kernel probit model for high-dimensional data classification. Particularly we assigned a correlation prior distribution on the model size and a sparse prior distribution on the regression parameters. MCMC-based computation algorithms are outlined to generate samples from the posterior distributions. Simulation and real data studies show that in terms of the accuracy of variable selection and classification, our proposed method performs better than the other five Bayesian methods without the correlation term in the prior or those involving only one shrinkage parameter.


Variable selection Correlation prior Sparse prior Kernel probit model Classification 



The authors gratefully acknowledge the financial support of the Humanities and Social Science Foundation of Ministry of Education of China (18YJC910001), the Natural Science Foundation of China (11501294,11501167,11571073), the University Philosophy and Social Science Research Project of Jiangsu Province (2018SJA0130) and the Jiangsu Qinglan Project(2017).

Compliance with ethical standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest


  1. Albert J, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679MathSciNetCrossRefzbMATHGoogle Scholar
  2. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750CrossRefGoogle Scholar
  3. Araki T, Ikeda K, Akaho S (2015) An efficient sampling algorithm with adaptations for Bayesian variable selection. Neural Netw 61:22–31CrossRefzbMATHGoogle Scholar
  4. Armagan A, Dunson DB, Lee J (2013) Generalized double Pareto shrinkage. Statistica Sinica 3(1):119–143MathSciNetzbMATHGoogle Scholar
  5. Ben-Dor A et al (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–583CrossRefGoogle Scholar
  6. Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, pp 82–90Google Scholar
  7. Chakraborty S, Mallick BK, Ghosh M (2013) Bayesian hierarchical kernel machines for nonlinear regression and classification. In: Damien P, Dellaportas P, Polson NG, Stephens DA (eds) Bayesian theory and applications (A tribute to Sir Adrian Smith). Oxford University Press, Oxford, pp 50–69CrossRefGoogle Scholar
  8. Chhikara R, Folks L (1989) The inverse gaussian distribution: theory, methodology and applications. Marcel Dekker, New YorkzbMATHGoogle Scholar
  9. Crawford L, Wood KC, Zhou X, Mukherjee S (2017) Bayesian approximate kernel regression with variable selection. J Am Stat Assoc 113:1710–1721. MathSciNetCrossRefzbMATHGoogle Scholar
  10. Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20:3583–3593CrossRefGoogle Scholar
  11. Devroye L (1986) Non-uniform random variate generation. Springer, New YorkCrossRefzbMATHGoogle Scholar
  12. Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genom 2:28–34CrossRefGoogle Scholar
  13. George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889CrossRefGoogle Scholar
  14. Gelfand A, Smith AFM (1990) Sampling based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409MathSciNetCrossRefzbMATHGoogle Scholar
  15. Golub TR et al (1999) Molecular classification of cancer:class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRefGoogle Scholar
  16. Guyon I, Weston J, Barnhill S, Vapnik V et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422CrossRefzbMATHGoogle Scholar
  17. Lamnisos D, Grin JE, Mark Steel FJ (2009) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Gr Stat 18:592–612MathSciNetCrossRefGoogle Scholar
  18. Lee KE et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97CrossRefGoogle Scholar
  19. Mallick BK, Ghosh D, Ghosh M (2005) Bayesian classification of tumors using gene expression data. J R Stat Soc B 67:219–232CrossRefzbMATHGoogle Scholar
  20. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21:1087–1092CrossRefGoogle Scholar
  21. Notterman D et al (2001) Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotidearrays. Cancer Res 61:3124–3130Google Scholar
  22. Panagiotelisa A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high dimensional additive models. J Econom 143:291–316MathSciNetCrossRefGoogle Scholar
  23. Park K, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103:681–686MathSciNetCrossRefzbMATHGoogle Scholar
  24. Shailubhai K et al (2000) Uroguanylin treatment suppresses polyp formation in the Apc(Min/+) mouse and induces apoptosis in human colon adenocarcinoma cells via cyclic GMP. Cancer Res 60:5151–5157Google Scholar
  25. Tolosi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27:1986–1994CrossRefGoogle Scholar
  26. Troyanskaya OG et al (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18:1454–1461CrossRefGoogle Scholar
  27. Vapnik VN (1995) The nature of statistical learning theory. Springer, New YorkCrossRefzbMATHGoogle Scholar
  28. Wahba G (1990) Spline models for observational data. SIAM, PhiladelphiaCrossRefzbMATHGoogle Scholar
  29. Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24:412–419CrossRefGoogle Scholar
  30. Yang AJ, Xiang J, Yang HQ, Lin JG (2018a) Sparse Bayesian variable selection in probit model for forecasting U.S. recessions using a large set of predictors. Comput Econ 51:1123–1138CrossRefGoogle Scholar
  31. Yang AJ, Jiang XJ, Shu LJ, Liu PF (2018b) Sparse bayesian kernel multinomial probit regression model for high-dimensional data classification. Commun Stat-Theory Methods 48:165–176. MathSciNetCrossRefGoogle Scholar
  32. Yang AJ, Xiang J, Shu LJ, Yang HQ (2018c) Sparse bayesian variable selection with correlation prior for forecasting macroeconomic variable using highly correlated predictors. Comput Econ 51:323–338CrossRefGoogle Scholar
  33. Yuan M, Lin Y (2005) Efficient empirical Bayes variable selection and estimation in linear models. J Am Stat Assoc 472:1215–1225MathSciNetCrossRefzbMATHGoogle Scholar
  34. Zhang Z, Dai G, Jordan MI (2011) Bayesian generalized kernel mixed models. J Mach Learn Res 12:111–139MathSciNetzbMATHGoogle Scholar
  35. Zhou X, Wang X, Wong S (2004a) A Bayesian approach to nonlinear probit gene selection and classification. J Frankl Inst 341:137–156MathSciNetCrossRefzbMATHGoogle Scholar
  36. Zhou X, Liu K, Wong S (2004b) Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inf 37:249–259CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Aijun Yang
    • 1
    Email author
  • Yuzhu Tian
    • 2
  • Yunxian Li
    • 3
  • Jinguan Lin
    • 4
  1. 1.College of Economics and ManagementNanjing Forestry UniversityNanjingChina
  2. 2.School of Mathematics and StatisticsHenan University of Science and TechnologyLuoyangChina
  3. 3.School of FinanceYunnan University of Finance and EconomicsKunmingChina
  4. 4.School of Statistics and MathematicsNanjing Audit UniversityNanjingChina

Personalised recommendations