Robust variable selection for finite mixture regression models

  • Qingguo Tang
  • R.  J. Karunamuni


Finite mixture regression (FMR) models are frequently used in statistical modeling, often with many covariates with low significance. Variable selection techniques can be employed to identify the covariates with little influence on the response. The problem of variable selection in FMR models is studied here. Penalized likelihood-based approaches are sensitive to data contamination, and their efficiency may be significantly reduced when the model is slightly misspecified. We propose a new robust variable selection procedure for FMR models. The proposed method is based on minimum-distance techniques, which seem to have some automatic robustness to model misspecification. We show that the proposed estimator has the variable selection consistency and oracle property. The finite-sample breakdown point of the estimator is established to demonstrate its robustness. We examine small-sample and robustness properties of the estimator using a Monte Carlo study. We also analyze a real data set.


Finite mixture regression models Variable selection Minimum-distance methods 



We wish to thank the Chief Editor, Professor Kenji Fukumizu, an Associate Editor, and two reviewers for their helpful comments and suggestions that led to substantial improvements in this paper. Q. Tang’s research was supported in part by the National Social Science Foundation of China (16BTJ019) and Jiangsu Natural Science Foundation of China (BK20151481). R.J. Karunamuni’s research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.


  1. Basu, A., Harris, I. R., Hjort, N. L., Jones, M. C. (1998). Robust and efficient estimation by minimizing a density power divergence. Biometrika, 85, 549–559.Google Scholar
  2. Beran, R. (1977). Minimum Hellinger distance estimators for parametric models. Annals of Statistics, 5, 445–463.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Beran, R. (1978). An efficient and robust adaptive estimator of location. Annals of Statistics, 6, 292–313.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.zbMATHGoogle Scholar
  5. Broniatowski, M., Toma, A., Vajda, I. (2012). Decomposable pseudodistances and applications in statistical estimation. Journal of Statistical Planning and Inference, 142, 2574–2585.Google Scholar
  6. Chen, S. X. (1999). Beta kernel estimators for density functions. Computational Statistics and Data Analysis, 31, 131–145.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Cutler, A., Cordero-Braña, O. I. (1996). Minimum Hellinger distance estimation for finite mixture models. Journal of the American Statistical Association, 91, 1716–1723.Google Scholar
  8. Devlin, S. J., Gnandesikan, R., Kettenring, J. R. (1981). Robust Estimation of Dispersion Matrices and Principal Components. Journal of the American Statistical Association, 76, 354–362.Google Scholar
  9. Devroye, L. P., Wagner, T. J. (1979). The \(L_{1}\) convergence of kernel density estimates. Annals of Statistics, 7, 1136–1139.Google Scholar
  10. Donoho, D. (1982). Breakdown properties of multivariate location estimators. Unpublished qualifying paper. Cambridge, Massachusetts, USA: Harvard University, Department of Statistics.Google Scholar
  11. Donoho, D., Huber, P. (1983). The notion of breakdown point. In P. J. Bickel, K. A. Doksum, J. L. Hodges Jr. (Eds.), A Festschrift for E. L. Lehmann (pp. 157–184). Belmont, CA: Wadsworth.Google Scholar
  12. Donoho, D. L., Liu, R. C. (1988). The “automatic” robustness of minimum distance functionals. Annals of Statistics, 16, 552–586.Google Scholar
  13. Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.Google Scholar
  14. Fan, J., Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In International Congress of Mathematicians, 3, 595–622.Google Scholar
  15. Fan, J., Lv, J. (2011). Non-concave penalized likelihood with np-dimensionality. IEEE Transaction Information Theory, 57, 5467–5484.Google Scholar
  16. Fan, J., Xue, L., Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Annals of Statistics, 42, 819–849.Google Scholar
  17. Frank, I., Friedman, J. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics, 35, 109–135.Google Scholar
  18. Friedman, J., Hastie, T., Höflinng, H., Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1, 302–332.Google Scholar
  19. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., Stahel, W. A. (1986). Robust statistics: The approach based on influence functions. New York: Wiley.Google Scholar
  20. Hennig, C. (2000). Identifiability of models for clusterwise linear regression. Journal of Classification, 17, 273–296.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Jiang, W., Tanner, M. A. (1999). On the approximation rate of hierarchical mixtures-of-experts for generalized linear models. Machine Learning, 11, 1183–1198.Google Scholar
  22. Karlis, D., Xekalaki, E. (2001). Robust inference for finite mixtures. Journal of Statistical Planning and Inference, 93, 93–115.Google Scholar
  23. Karunamuni, R. J., Wu, J. (2011). One-step minimum Hellinger distance estimation. Computational Statistics and Data Analysis, 55, 3148–3164.Google Scholar
  24. Khalili, A. (2010). New estimation and feature selection methods in mixture-of-experts models. The Canadian Journal of Statistics, 38, 519–539.MathSciNetCrossRefzbMATHGoogle Scholar
  25. Khalili, A., Chen, J. (2007). Variable selection in finite mixture of regression models. Journal of the American Statistical Association, 102, 1025–1038.Google Scholar
  26. Khalili, A., Lin, S. (2013). Regularization in finite mixture of regression models with diverging number of parameters. Biometrics, 69, 436–446.Google Scholar
  27. Khalili, A., Chen, J., Lin, S. (2011). Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space. Biostatistics, 12, 156–172.Google Scholar
  28. Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18.Google Scholar
  29. Lindsay, B. G. (1994). Efficiency versus robustness: The case for minimum Hellinger distance and related methods. Annals of Statistics, 22, 1081–1114.MathSciNetCrossRefzbMATHGoogle Scholar
  30. Lu, Z., Hui, Y. V., Lee, A. H. (2003). Minimum Hellinger distance estimation for finite mixtures of Poisson regression models and its applications. Biometrics, 59, 1016–1026.Google Scholar
  31. Lv, J., Fan, J. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Annals of Statistics, 37, 3498–3528.Google Scholar
  32. Markatou, M. (2000). Mixture models, robustness and the weighted likelihood methodology. Biometrics, 56, 483–486.CrossRefzbMATHGoogle Scholar
  33. Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. Annals of Statistics, 4, 51–67.MathSciNetCrossRefzbMATHGoogle Scholar
  34. McLachlan, G. J., Peel, D. (2000). Finite Mixture Models. New York: Wiley.Google Scholar
  35. Pollard, D. (1981). Stong consistency of k-means clustering. Annals of Statistics, 9, 135–140.MathSciNetCrossRefzbMATHGoogle Scholar
  36. Shen, L. Z. (1995). On optimal B-robust influence functions in semiparametric models. Annals of Statistics, 23, 968–989.MathSciNetCrossRefzbMATHGoogle Scholar
  37. Städler, N., Bühlmann, P., van de Geer, S. (2010). \(l_{1}\)-penalization for mixture regression models. Test, 19, 209–256.Google Scholar
  38. Tamura, R., Boos, D. D. (1986). Minimum Hellinger distance estimation for multivariate location and covariance. Journal of the American Statistical Association, 81, 223–229.Google Scholar
  39. Tang, Q., Karunamuni, R. J. (2013). Minimum distance estimation in a finite mixture regression model. Journal of Multivariate Analysis, 120, 185–204.Google Scholar
  40. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.Google Scholar
  41. Titterington, D. M., Smith, A. F. M., Markov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: Wiley.Google Scholar
  42. Toma, A. (2008). Minimum Hellinger distance estimators for multivariate distributions from Johnson system. Journal of Statistical Planning and Inference, 138, 803–816.MathSciNetCrossRefzbMATHGoogle Scholar
  43. van der Vaart, A. (1996). Efficient maximum likelihood estimation in semiparametric models. Annals of Statistics, 24, 862–878.MathSciNetCrossRefzbMATHGoogle Scholar
  44. Wang, H., Li, G., Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the LAD-Lasso. Journal of Business and Economic Statistics, 25, 347–355.Google Scholar
  45. Wang, X., Jiang, Y., Huang, M., Zhang, H. (2013). Robust variable selection with exponential squared loss. Journal of the American Statistical Association, 108, 632–643.Google Scholar
  46. Wu, J., Karunamuni, R. J. (2012). Efficient Hellinger distance estimates for semiparametric models. Journal of Multivariate Analysis, 107, 1–23.Google Scholar
  47. Wu, J., Karunamuni, R. J. (2015). Profile Hellinger distance estimation. Statistics, 49(4), 711–740.Google Scholar
  48. Wu, J., Karunamuni, R. J., Zhang, B. (2010). Minimum Hellinger distance estimation in a two-sample semiparametric model. Journal of Multivariate Analysis, 101, 1102–1122.Google Scholar
  49. Zhang, C.-H. (2010). Nearly unbiased variable selection under mini-max concave penalty. Annals of Statistics, 38, 894–942.MathSciNetCrossRefzbMATHGoogle Scholar
  50. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.MathSciNetCrossRefzbMATHGoogle Scholar
  51. Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301–320.Google Scholar
  52. Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics, 36, 1509–1533.Google Scholar
  53. Zou, H., Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics, 37, 1733–1751.Google Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2017

Authors and Affiliations

  1. 1.School of Economics and ManagementNanjing University of Science and TechnologyNanjingChina
  2. 2.Department of Mathematical and Statistical SciencesUniversity of AlbertaEdmontonCanada

Personalised recommendations