A DCA Based Algorithm for Feature Selection in Model-Based Clustering

  • Viet Anh Nguyen
  • Hoai An Le Thi
  • Hoai Minh LeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12033)


Gaussian Mixture Models (GMM) is a model-based clustering approach which has been used in many applications thanks to its flexibility and effectiveness. However, in high dimension data, GMM based clustering lost its advantages due to over-parameterization and noise features. To deal with this issue, we incorporate feature selection into GMM clustering. For the first time, a non-convex sparse inducing regularization is considered for feature selection in GMM clustering. The resulting optimization problem is nonconvex for which we develop a DCA (Difference of Convex functions Algorithm) to solve. Numerical experiments on several benchmark and synthetic datasets illustrate the efficiency of our algorithm and its superiority over an EM method for solving the GMM clustering using \(l_1\) regularization.


Model-based clustering Gaussian Mixture Models Variable selection Non-convex regularization DC programming DCA 


  1. 1.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bhattacharya, S., McNicholas, P.D.: A LASSO-penalized BIC for mixture model selection. Adv. Data Anal. Classif. 8, 45–61 (2014)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bouveyron, C., Brunet, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2013)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the Fifteenth International Conference on Machine Learning ICML 1998, pp. 82–90 (1998)Google Scholar
  5. 5.
    Grun, B.: Model-based clustering. In: Fruhwirth-Schnatter, S., Celeux, G., Robert, C.P. (eds.) Handbook of Mixture Analysis. Taylor and Francis, New York (2019)Google Scholar
  6. 6.
    Guo, J., Levina, E., Michailidis, G., Zhu, J.: Pairwise variable selection for high-dimensional model-based clustering. Biometrics 66, 793–804 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)CrossRefGoogle Scholar
  8. 8.
    Hsieh, C.J., Sustik, M.A., Dhillon, I.S., Ravikumar, P.: QUIC: quadratic approximation for sparse inverse covariance estimation. J. Mach. Learn. Res. 15, 2911–2947 (2014)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefGoogle Scholar
  10. 10.
    Judice, J., Raydan, M., Rosa, S.: On the solution of the symmetric eigenvalue complementarity problem by the spectral projected gradient algorithm. Numer. Algorithms 47, 391–407 (2008)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Le, H.M., Le Thi, H.A., Nguyen, M.C.: Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153, 62–76 (2015)CrossRefGoogle Scholar
  12. 12.
    Le Thi, H.A., Le, H.M., Nguyen, V.V., Pham Dinh, T.: A DC programming approach for feature selection in support vector machines learning. J. Adv. Data Anal. Classif. 2, 259–278 (2013)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Le Thi, H.A., Nguyen Thi, B.T., Le, H.M.: Sparse signal recovery by difference of convex functions algorithms. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013. LNCS (LNAI), vol. 7803, pp. 387–397. Springer, Heidelberg (2013). Scholar
  14. 14.
    Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Le Thi, H.A., Phan, D.N.: DC programming and DCA for sparse optimal scoring problem. Neurocomput. 186(C), 170–181 (2016)CrossRefGoogle Scholar
  17. 17.
    Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Math. Program. 169(1), 5–68 (2018)MathSciNetCrossRefGoogle Scholar
  18. 18.
    McNicholas, P.: Model-based clustering. J. Classif. 33, 331–373 (2016)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Pan, W., Shen, X.: Penalized model-based clustering with application to variable selection. J. Mach. Learn. Res 8, 1145–1164 (2007)zbMATHGoogle Scholar
  20. 20.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)Google Scholar
  21. 21.
    Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnamica 22(1), 289–355 (1997)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Pham Dinh, T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Stahl, D., Sallis, H.: Model-based cluster analysis. Comput. Stat. 4, 341–358 (2015)CrossRefGoogle Scholar
  24. 24.
    Wang, S., Zhu, J.: Model-based high-dimensional clustering and its application to microarray data. Biometrics 64, 440–448 (2008)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Wolfe, J.: Object cluster analysis of social areas. Master’s thesis, Ph.D. thesis, California, Berkeley (1963)Google Scholar
  26. 26.
    Zhou, H., Pan, W., Shen, X.: Penalized model-based clustering with un-constrained covariance matrices. Electron. J. Stat. 3, 1473–1496 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Computer Science and Application Department, LGIPMUniversity of LorraineMetzFrance

Personalised recommendations