Clustering via finite nonparametric ICA mixture models

  • Xiaotian Zhu
  • David R. HunterEmail author
Regular Article


We propose a novel extension of nonparametric multivariate finite mixture models by dropping the standard conditional independence assumption and incorporating the independent component analysis (ICA) structure instead. This innovation extends nonparametric mixture model estimation methods to situations in which conditional independence, a necessary assumption for the unique identifiability of the parameters in such models, is clearly violated. We formulate an objective function in terms of penalized smoothed Kullback–Leibler distance and introduce the nonlinear smoothed majorization-minimization independent component analysis algorithm for optimizing this function and estimating the model parameters. Our algorithm does not require any labeled observations a priori; it may be used for fully unsupervised clustering problems in a multivariate setting. We have implemented a practical version of this algorithm, which utilizes the FastICA algorithm, in the R package icamix. We illustrate this new methodology using several applications in unsupervised learning and image processing.


Independent component analysis Kernel density estimation Nonparametric estimation Penalized smoothed likelihood Unsupervised learning 

Mathematics Subject Classification

62H30 62G07 


  1. Aeberhard S, Coomans D, De Vel O (1992) Comparison of classifiers in high dimensional settings. Deparment of Mathematics and Statistics, James Cook University, North Queensland, Australia. Technical Report 92(02)Google Scholar
  2. Allman ES, Matias C, Rhodes JA (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37(6A):3099–3132MathSciNetzbMATHGoogle Scholar
  3. Anandkumar A, Hsu D, Kakade SM (2012) A method of moments for mixture models and hidden Markov models. In Mannor S, Srebro N, Williamson RC (eds) Proceedings of the 25th annual conference on learning theory, vol 23, pp 33.1–33.34. PMLR, Edinburgh, ScotlandGoogle Scholar
  4. Azzalini A, Torelli N (2007) Clustering via nonparametric density estimation. Stat Comput 17(1):71–80MathSciNetGoogle Scholar
  5. Bajari P, Hahn J, Hong H, Ridder G (2011) A note on semiparametric estimation of finite mixtures of discrete choice models with application to game theoretic models. Int Econ Rev 52(3):807–824MathSciNetGoogle Scholar
  6. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 803–821Google Scholar
  7. Benaglia T, Chauveau D, Hunter DR (2009) An EM-like algorithm for semi-and nonparametric estimation in multivariate mixtures. J Comput Graph Stat 18(2):505–526MathSciNetGoogle Scholar
  8. Benaglia T, Chauveau D, Hunter DR (2011) Bandwidth selection in an EM-like algorithm for nonparametric multivariate mixtures. In: Hunter DR, Richards DSP, Rosenberger JL (eds) Nonparametric statistics and mixture models: a festschrift in honor of Thomas P. Hettmansperger. World Scientific, Singapore, pp 15–27Google Scholar
  9. Bonhomme S, Jochmans K, Robin J-M (2016a) Estimating multivariate latent-structure models. Ann Stat 44(2):540–563MathSciNetzbMATHGoogle Scholar
  10. Bonhomme S, Jochmans K, Robin J-M (2016b) Non-parametric estimation of finite mixtures from repeated measurements. J R Stat Soc Ser B (Stat Methodol) 78(1):211–229MathSciNetGoogle Scholar
  11. Butucea C, Vandekerkhove P (2014) Semiparametric mixtures of symmetric distributions. Scand J Stat 41(1):227–239MathSciNetzbMATHGoogle Scholar
  12. Chauveau D, Hunter DR, Levine M (2015) Semi-parametric estimation for conditional independence multivariate finite mixture models. Stat Surv 9:1–31MathSciNetzbMATHGoogle Scholar
  13. Cohen EA (1984) Some effects of inharmonic partials on interval perception. Music Percept Interdiscip J 1(3):323–349Google Scholar
  14. De Castro Y, Gassiat E, Lacour C (2016) Minimax adaptive estimation of nonparametric hidden Markov models. J Mach Learn Res 17(1):3842–3884MathSciNetzbMATHGoogle Scholar
  15. De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8(3):227–245MathSciNetzbMATHGoogle Scholar
  16. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38MathSciNetzbMATHGoogle Scholar
  17. Eddelbuettel D (2013) Seamless R and C++ integration with Rcpp. Springer, New YorkzbMATHGoogle Scholar
  18. Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18Google Scholar
  19. Eddelbuettel D, Sanderson C (2014) RcppArmadillo: accelerating R with high-performance C++ linear algebra. Comput Stat Data Anal 71:1054–1063MathSciNetzbMATHGoogle Scholar
  20. Forina M, Leardi R, Armanino C, Lanteri S, Conti P, Princi P (1988) Parvus: an extendable package of programs for data exploration, classification and correlation. J Chemometr 4(2):191–193Google Scholar
  21. Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588zbMATHGoogle Scholar
  22. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetzbMATHGoogle Scholar
  23. Frühwirth-Schnatter S (2006) Finite mixture and markov switching models. Springer Science & Business Media, LLC, New YorkzbMATHGoogle Scholar
  24. Gassiat E, Cleynen A, Robin S (2016) Inference in finite state space non parametric hidden Markov models and applications. Stat Comput 26(1):61–71MathSciNetzbMATHGoogle Scholar
  25. Gassiat E, Rousseau J (2016) Nonparametric finite translation hidden Markov models and extensions. Bernoulli 22(1):193–212MathSciNetzbMATHGoogle Scholar
  26. Guglielmi A, Ieva F, Paganoni AM, Ruggeri F, Soriano J (2014) Semiparametric Bayesian models for clustering and classification in the presence of unbalanced in-hospital survival. J R Stat Soc Ser C (Appl Stat) 63(1):25–46MathSciNetGoogle Scholar
  27. Hall P, Zhou X-H (2003) Nonparametric estimation of component distributions in a multivariate mixture. Ann Stat 31(1):201–224MathSciNetzbMATHGoogle Scholar
  28. Han B, Davis LS (2006) Semi-parametric model-based clustering for DNA microarray data. In: 18th International conference on pattern recognition (ICPR’06), vol 3, pp 324–327Google Scholar
  29. Huang M, Li R, Wang S (2013) Nonparametric mixture of regression models. J Am Stat Assoc 108(503):929–941MathSciNetzbMATHGoogle Scholar
  30. Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58(1):30–37MathSciNetGoogle Scholar
  31. Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametr Stat 24(1):19–38MathSciNetzbMATHGoogle Scholar
  32. Hyvarinen A, Karhunen J, Oja E (2002) Independent component analysis. Stud Inform Control 11(2):205–207Google Scholar
  33. Lee T-W, Lewicki MS, Sejnowski TJ (1999a) ICA mixture models for image processing. In: Sixth joint symposium on neural computation proceedings, pp 79–86Google Scholar
  34. Lee T-W, Lewicki MS, Sejnowski TJ (1999b) Unsupervised classification with non-Gaussian mixture models using ICA. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems, vol 11. MIT Press, Cambridge, pp 508–514Google Scholar
  35. Lee T-W, Lewicki MS, Sejnowski TJ (2000) ICA mixture models for unsupervised classification of non-Gaussian classes and automatic context switching in blind signal separation. IEEE Trans Pattern Anal Mach Intell 22(10):1078–1089Google Scholar
  36. Levine M, Hunter DR, Chauveau D (2011) Maximum smoothed likelihood for multivariate mixtures. Biometrika 98(2):403–416MathSciNetzbMATHGoogle Scholar
  37. Li J, Ray S, Lindsay BG (2007) A nonparametric statistical approach to clustering via mode identification. J Mach Learn Res 8(8):1687–1723MathSciNetzbMATHGoogle Scholar
  38. Mallapragada PK, Jin R, Jain A (2010) Non-parametric mixture models for clustering. In: Structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 334–343Google Scholar
  39. McLachlan G, Peel D (2000) Finite mixture models. Wiley, New YorkzbMATHGoogle Scholar
  40. Meng X-L, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278MathSciNetzbMATHGoogle Scholar
  41. Miettinen J, Taskinen S, Nordhausen K, Oja H (2015) Fourth moments and independent component analysis. Stat Sci 30(3):372–390MathSciNetzbMATHGoogle Scholar
  42. Palmer JA, Makeig S, Kreutz-Delgado K, Rao BD (2008) Newton method for the ICA mixture model. In: Proceedings of the 2008 IEEE international conference on acoustics, speech, and signal processing, pp 1805–1808Google Scholar
  43. Peña D, Prieto FJ, Viladomat J (2010) Eigenvectors of a kurtosis matrix as interesting directions to reveal cluster structure. J Multivar Anal 101(9):1995–2007MathSciNetzbMATHGoogle Scholar
  44. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  45. Raykar VC, Yang C, Duraiswami R, Gumerov N (2005) Fast computation of sums of Gaussians in high dimensions. Technical report, University of MarylandGoogle Scholar
  46. Salazar A, Igual J, Safont G, Vergara L, Vidal A (2015) Image applications of agglomerative clustering using mixtures of non-Gaussian distributions. In: Proceedings of the 2015 international conference on computational science and computational intelligence (CSCI), pp 459–463Google Scholar
  47. Salazar A, Vergara L, Serrano A, Igual J (2010) A general procedure for learning mixtures of independent component analyzers. Pattern Recognit 43(1):69–85zbMATHGoogle Scholar
  48. Shah CA, Arora MK, Varshney PK (2004) Unsupervised classification of hyperspectral data: an ICA mixture model based approach. Int J Remote Sens 25(2):481–487Google Scholar
  49. Tyler DE, Critchley F, Dümbgen L, Oja H (2009) Invariant co-ordinate selection (with discussion). J R Stat Soc Ser B (Stat Methodol) 71(3):549–592MathSciNetzbMATHGoogle Scholar
  50. Vandekerkhove P (2013) Estimation of a semiparametric mixture of regressions model. J Nonparametr Stat 25(1):181–208MathSciNetzbMATHGoogle Scholar
  51. Vichi M (2008) Fitting semiparametric clustering models to dissimilarity data. Adv Data Anal Classif 2(2):121–161MathSciNetzbMATHGoogle Scholar
  52. Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12(4):315–330MathSciNetGoogle Scholar
  53. Wolfe JH (1963) Object cluster analysis of social areas. Ph.D. thesis, University of CaliforniaGoogle Scholar
  54. Zhang W, Fan J, Sun Y (2009) A semiparametric model for cluster data. Ann Stat 37(5A):2377–2408MathSciNetzbMATHGoogle Scholar
  55. Zhu X, Hunter DR (2016) Theoretical grouding for estimation in conditional independence multivariate finite mixture models. J Nonparametr Stat 28(1):683–701MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Natera Inc.San CarlosUSA
  2. 2.Department of StatisticsPennsylvania State UniversityUniversity ParkUSA

Personalised recommendations