Artificial Intelligence Review

, Volume 52, Issue 3, pp 1739–1779 | Cite as

A survey of feature selection methods for Gaussian mixture models and hidden Markov models

  • Stephen AdamsEmail author
  • Peter A. Beling


Feature selection is the process of reducing the number of collected features to a relevant subset of features and is often used to combat the curse of dimensionality. This paper provides a review of the literature on feature selection techniques specifically designed for Gaussian mixture models (GMMs) and hidden Markov models (HMMs), two common parametric latent variable models. The primary contribution of this work is the collection and grouping of feature selection methods specifically designed for GMMs and for HMMs. An additional contribution lies in outlining the connections between these two groups of feature selection methods. Often, feature selection methods for GMMs and HMMs are treated as separate topics. In this survey, we propose that methods developed for one model can be adapted to the other model. Further, we find that the number of feature selection methods for GMMs outweighs the number of methods for HMMs and that the proportion of methods for HMMs that require supervised data is larger than the proportion of GMM methods that require supervised data. We conclude that further research into unsupervised feature selection methods for HMMs is required and that established methods for GMMs could be adapted to HMMs. It should be noted that feature selection can also be referred to as dimensionality reduction, variable selection, attribute selection, and variable subset reduction. In this paper, we make a distinction between dimensionality reduction and feature selection. Dimensionality reduction, which we do not consider, is any process that reduces the number of features used in a model and can include methods that transform features in order to reduce the dimensionality. Feature selection, by contrast, is a specific form of dimensionality reduction that eliminates feature as inputs into the model. The primary difference is that dimensionality reduction can still require the collection of all the data sources in order to transform and reduce the feature set, while feature selection eliminates the need to collect the irrelevant data sources.


Feature selection Gaussian mixture model Hidden Markov model 


  1. Adams S, Beling PA, Cogill R (2016) Feature selection for hidden Markov models and hidden semi-Markov models. IEEE Access 4:1642–1657Google Scholar
  2. Aha DW, Bankert RL (1995) A comparative evaluation of sequential feature selection algorithms. In: Proceedings of the fifth international workshop on artificial intelligence and statisticsGoogle Scholar
  3. Allili MS, Bouguila N, Ziou D (2008) Finite general Gaussian mixture modeling and application to image and video foreground segmentation. J Electron Imaging 17(1):013,005–013,005Google Scholar
  4. Allili MS, Ziou D, Bouguila N, Boutemedjet S (2010) Image and video segmentation by combining unsupervised generalized Gaussian mixture modeling and feature selection. IEEE Trans Circuits Syst Video Technol 20(10):1373–1377Google Scholar
  5. Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: AAAI, vol 91. Citeseer, pp 547–552Google Scholar
  6. Bagos PG, Liakopoulos TD, Hamodrakas SJ (2004) Faster gradient descent training of hidden Markov models, using individual learning rate adaptation. In: International colloquium on grammatical inference. Springer, pp 40–52Google Scholar
  7. Bahl L, Brown PF, De Souza PV, Mercer RL (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of ICASSP, vol 86, pp 49–52Google Scholar
  8. Bashir FI, Khokhar AA, Schonfeld D (2007) Object trajectory-based activity classification and recognition using hidden Markov models. IEEE Trans Image Process 16(7):1912–1919MathSciNetGoogle Scholar
  9. Bhattacharya S, McNicholas PD (2014) A LASSO-penalized BIC for mixture model selection. Adv Data Anal Classif 8(1):45–61MathSciNetGoogle Scholar
  10. Bilmes J (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int Comput Sci Inst 4(510):126Google Scholar
  11. Bins J, Draper BA (2001) Feature selection from huge feature sets. In: Eighth IEEE international conference on computer vision, 2001. ICCV 2001. Proceedings, vol 2. IEEE, pp 159–165Google Scholar
  12. Bishop CM, Hinton GE, Strachant IG (1997) GTM through time. In: Proceedings of the IEEE fifth international conference on artificial neural networks. CiteseerGoogle Scholar
  13. Bishop CM, Svensén M, Williams CK (1998) GTM: the generative topographic mapping. Neural Comput 10(1):215–234zbMATHGoogle Scholar
  14. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1):245–271MathSciNetzbMATHGoogle Scholar
  15. Bocchieri E (1993) Vector quantization for the efficient computation of continuous density likelihoods. In: 1993 IEEE international conference on acoustics, speech, and signal processing, 1993. ICASSP-93, vol 2. IEEE, pp 692–695Google Scholar
  16. Boutemedjet S, Bouguila N, Ziou D (2007) Feature selection for non Gaussian mixture models. In: 2007 IEEE workshop on machine learning for signal processing. IEEE, pp 69–74Google Scholar
  17. Bouveyron C, Brunet C (2012) Simultaneous model-based clustering and visualization in the fisher discriminative subspace. Stat Comput 22(1):301–324MathSciNetzbMATHGoogle Scholar
  18. Bouveyron C, Brunet-Saumard C (2014) Discriminative variable selection for clustering with the sparse Fisher-EM algorithm. Comput Stat 29(3–4):489–513MathSciNetzbMATHGoogle Scholar
  19. Boys RJ, Henderson DA (2001) A comparison of reversible jump MCMC algorithms for DNA sequence segmentation using hidden Markov models. Comput Sci Stat 33:35–49Google Scholar
  20. Cappé O, Buchoux V, Moulines E (1998) Quasi-Newton method for maximum likelihood estimation of hidden Markov models. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998, vol 4. IEEE, pp 2265–2268Google Scholar
  21. Carbonetto P, De Freitas N, Gustafson P, Thompson N (2003) Bayesian feature weighting for unsupervised learning, with application to object recognition. In: Artificial intelligence and statistics (AI & Statistics’ 03). Society for Artificial Intelligence and StatisticsGoogle Scholar
  22. Caruana R, Freitag D (1994) Greedy attribute selection. In: ICML. Citeseer, pp 28–36Google Scholar
  23. Caruana R, Freitag D (1994) How useful is relevance? FOCUS 14(8):2Google Scholar
  24. Celeux G, Martin-Magniette ML, Maugis-Rabusseau C, Raftery AE (2014) Comparing model selection and regularization approaches to variable selection in model-based clustering. Journal de la Societe francaise de statistique (2009) 155(2):57MathSciNetzbMATHGoogle Scholar
  25. Chang S, Dasgupta N, Carin L (2005) A Bayesian approach to unsupervised feature selection and density estimation using expectation propagation. In: IEEE Computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 2. IEEE, pp 1043–1050Google Scholar
  26. Charlet D, Jouvet D (1997) Optimizing feature set for speaker verification. In: International conference on audio-and video-based biometric person authentication. Springer, pp 203–210Google Scholar
  27. Chatzis SP, Kosmopoulos DI (2011) A variational Bayesian methodology for hidden Markov models utilizing Student’s-t mixtures. Pattern Recognit 44(2):295–306zbMATHGoogle Scholar
  28. Cheung R, Eisenstein B (1978) Feature selection via dynamic programming for text-independent speaker identification. IEEE Trans Acoust Speech Signal Process 26(5):397–403Google Scholar
  29. Cheung Ym (2004) A rival penalized EM algorithm towards maximizing weighted likelihood for density mixture clustering with automatic model selection. In: Proceedings of the 17th international conference on Pattern recognition, 2004. ICPR 2004, vol 4. IEEE, pp 633–636Google Scholar
  30. Cheung Ym (2005) Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection. IEEE Trans Knowl Data Eng 17(6):750–761Google Scholar
  31. Consonni G, Marin JM (2007) Mean-field variational approximate Bayesian inference for latent variable models. Comput Stat Data Anal 52(2):790–798MathSciNetzbMATHGoogle Scholar
  32. Constantinopoulos C, Titsias MK, Likas A (2006) Bayesian feature and model selection for Gaussian mixture models. IEEE Trans Pattern Anal Mach Intell 28(6):1013–1018Google Scholar
  33. Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. In: Artificial intelligence and statistics, vol 2001. Morgan Kaufmann Waltham, MA, pp 27–34Google Scholar
  34. Cover TM, Van Campenhout JM (1977) On the possible orderings in the measurement selection problem. IEEE Trans Syst Man Cybern 7(9):657–661MathSciNetzbMATHGoogle Scholar
  35. Daelemans W, Hoste V, De Meulder F, Naudts B (2003) Combined optimization of feature selection and algorithm parameters in machine learning of language. In: Machine learning: ECML 2003. Springer, pp 84–95Google Scholar
  36. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156Google Scholar
  37. Dash M, Liu H, Motoda H (2000) Consistency based feature selection. In: Knowledge discovery and data mining. Current issues and new applications. Springer, pp 98–109Google Scholar
  38. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227Google Scholar
  39. Doak J (1992) An evaluation of feature selection methods and their application to computer security. University of California, Computer ScienceGoogle Scholar
  40. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkzbMATHGoogle Scholar
  41. Dy JG (2008) Unsupervised feature selection. Computational methods of feature selection, pp 19–39Google Scholar
  42. Dy JG, Brodley CE (2000) Feature subset selection and order identification for unsupervised learning. In: ICML, pp 247–254Google Scholar
  43. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetzbMATHGoogle Scholar
  44. Figueiredo MAT, Jain AK, Law MH (2003) A feature selection wrapper for mixtures. In: Perales FJ, Campilho AJC, de la Blanca NP, Sanfeliu A (eds) Pattern recognition and image analysis. IbPRIA 2003. Lecture notes in computer science, vol 2652. Springer, Berlin, pp 229–237Google Scholar
  45. Figueiredo MA, Leitão JM, Jain AK (1999) On fitting mixture models. In: International workshop on energy minimization methods in computer vision and pattern recognition. Springer, pp 54–69Google Scholar
  46. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305zbMATHGoogle Scholar
  47. Frühwirth-Schnatter S (2001) Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J Am Stat Assoc 96(453):194–209MathSciNetzbMATHGoogle Scholar
  48. Gales MJ (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Trans Speech Audio Process 7(3):272–281Google Scholar
  49. Gales MJ, Knill KM, Young SJ (1999) State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs. IEEE Trans Speech Audio Process 7(2):152–161Google Scholar
  50. Galimberti G, Manisi A, Soffritti G (2017) Modelling the role of variables in model-based cluster analysis. Stat Comput 1–25Google Scholar
  51. Galimberti G, Montanari A, Viroli C (2009) Penalized factor mixture analysis for variable selection in clustered data. Comput Stat Data Anal 53(12):4301–4310MathSciNetzbMATHGoogle Scholar
  52. Godino-Llorente JI, Gomez-Vilda P, Blanco-Velasco M (2006) Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters. IEEE Trans Biomed Eng 53(10):1943–1953Google Scholar
  53. Graham MW, Miller DJ (2006) Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection. IEEE Trans Signal Process 54(4):1289–1303zbMATHGoogle Scholar
  54. Günter S, Bunke H (2003) Fast feature selection in an HMM-based multiple classifier system for handwriting recognition. In: Joint pattern recognition symposium. Springer, pp 289–296Google Scholar
  55. Guo J, Levina E, Michailidis G, Zhu J (2010) Pairwise variable selection for high-dimensional model-based clustering. Biometrics 66(3):793–804MathSciNetzbMATHGoogle Scholar
  56. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  57. Jain AK, Duin RP, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37Google Scholar
  58. Jasra A, Holmes C, Stephens D (2005) Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat Sci 50–67Google Scholar
  59. Ji S, Krishnapuram B, Carin L (2006) Variational Bayes for continuous hidden Markov models and its application to active learning. IEEE Trans Pattern Anal Mach Intell 28(4):522–532Google Scholar
  60. John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international conference, pp 121–129Google Scholar
  61. Kerroum MA, Hammouch A, Aboutajdine D (2010) Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification. Pattern Recognit Lett 31(10):1168–1174Google Scholar
  62. Khreich W, Granger E, Miri A, Sabourin R (2012) A survey of techniques for incremental learning of HMM parameters. Inf Sci 197:105–130Google Scholar
  63. Kim S, Tadesse MG, Vannucci M (2006) Variable selection in clustering via Dirichlet process mixture models. Biometrika 93(4):877–893MathSciNetzbMATHGoogle Scholar
  64. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. AAAI 2:129–134Google Scholar
  65. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324zbMATHGoogle Scholar
  66. Kononenko I (1994) Estimating attributes: analysis and extensions of Relief. In: Machine learning: ECML-94. Springer, pp 171–182Google Scholar
  67. Krishnan S, Samudravijaya K, Rao P (1996) Feature selection for pattern classification with Gaussian mixture models: a new objective criterion. Pattern Recognit Lett 17(8):803–809Google Scholar
  68. Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166Google Scholar
  69. Law MH, Jain AK, Figueiredo M (2002) Feature selection in mixture-based clustering. In: Advances in neural information processing systems, pp 625–632Google Scholar
  70. Li X, Bilmes J (2003) Feature pruning in likelihood evaluation of HMM-based speech recognition. In: 2003 IEEE workshop on automatic speech recognition and understanding, 2003. ASRU’03. IEEE, pp 303–308Google Scholar
  71. Li X, Bilmes J (2005) Feature pruning for low-power ASR systems in clean and noisy environments. IEEE Signal Process Lett 12(7):489–492Google Scholar
  72. Li Y, Dong M, Hua J (2008) Localized feature selection for clustering. Pattern Recognit Lett 29(1):10–18Google Scholar
  73. Li Y, Dong M, Hua J (2009) Simultaneous localized feature selection and model detection for Gaussian mixtures. IEEE Trans Pattern Anal Mach Intell 31(5):953–960Google Scholar
  74. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502Google Scholar
  75. Liu X, Chen T (2003) Video-based face recognition using adaptive hidden Markov models. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings, vol 1. IEEE, pp I–340Google Scholar
  76. Liu X, Gong Y, Xu W, Zhu S (2002) Document clustering with cluster refinement and model selection capabilities. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 191–198Google Scholar
  77. Lv F, Nevatia R (2006) Recognition and segmentation of 3-d human action using HMM and multi-class adaboost. In: Computer vision–ECCV 2006. Springer, pp 359–372Google Scholar
  78. MacKay DJ (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4(3):448–472Google Scholar
  79. Marbac M, Sedki M (2017) Variable selection for model-based clustering using the integrated complete-data likelihood. Stat Comput 27(4):1049–1063MathSciNetzbMATHGoogle Scholar
  80. Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with Gaussian mixture models. Biometrics 65(3):701–709MathSciNetzbMATHGoogle Scholar
  81. Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection in model-based clustering: a general variable role modeling. Comput Stat Data Anal 53(11):3872–3882MathSciNetzbMATHGoogle Scholar
  82. Maugis C, Michel B (2011) A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM Probab Stat 15:41–68MathSciNetzbMATHGoogle Scholar
  83. McGrory CA, Titterington D (2009) Variational Bayesian analysis for hidden Markov models. Aust N Z J Stat 51(2):227–244MathSciNetzbMATHGoogle Scholar
  84. McLachlan GJ, Peel D (2000) Mixtures of factor analyzers. In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann Publishers Inc, pp 599–606Google Scholar
  85. Merialdo B (1988) Phonetic recognition using hidden Markov models and maximum mutual information training. In: 1988 international conference on acoustics, speech, and signal processing, 1988. ICASSP-88. IEEE, pp 111–114Google Scholar
  86. Meyer C (2002) Utterance-level boosting of HMM speech recognizers. In: 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol 1. IEEE, pp I–109Google Scholar
  87. Minka TP (2001) Expectation propagation for approximate Bayesian inference. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 362–369Google Scholar
  88. Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312Google Scholar
  89. Molina LC, Belanche L, Nebot À (2002) Feature selection algorithms: a survey and experimental evaluation. In: 2002 IEEE international conference on data mining, 2002. ICDM 2003. Proceedings. IEEE, pp 306–313Google Scholar
  90. Montero JA, Sucar LE (2004) Feature selection for visual gesture recognition using hidden Markov models. In: Proceedings of 5th international conference on computer science, 2004. ENC 2004. IEEE, pp 196–203Google Scholar
  91. Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, CambridgezbMATHGoogle Scholar
  92. Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 100(9):917–922zbMATHGoogle Scholar
  93. Ng AY (1998) On feature selection: learning with exponentially many irrelevant features as training examples. In: Proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann Publishers Inc, pp 404–412Google Scholar
  94. Nouza J (1996) Feature selection methods for hidden Markov model-based speech recognition. In: Proceedings of 13th international conference on pattern recognition vol 2, pp 186–190Google Scholar
  95. Novovicová J, Pudil P, Kittler J (1996) Divergence based feature selection for multimodal class densities. IEEE Trans Pattern Anal Mach Intell 18(2):218–223Google Scholar
  96. Olier I, Vellido A (2008) Advances in clustering and visualization of time series using GTM through time. Neural Netw 21(7):904–913zbMATHGoogle Scholar
  97. Paisley J, Carin L (2009) Hidden Markov models with stick-breaking priors. IEEE Trans Signal Process 57(10):3905–3917MathSciNetzbMATHGoogle Scholar
  98. Palaniappan R, Wissel T (2011) Considerations on strategies to improve EOG signal analysis. Int J Artif Life Res 2(3):6–21Google Scholar
  99. Paliwal K (1992) Dimensionality reduction of the enhanced feature set for the HMM-based speech recognizer. Digital Signal Process 2(3):157–173Google Scholar
  100. Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164zbMATHGoogle Scholar
  101. Pan W, Shen X, Jiang A, Hebbel RP (2006) Semi-supervised learning via penalized mixture model with application to microarray sample classification. Bioinformatics 22(19):2388–2395Google Scholar
  102. Pudil P, Ferri F, Novovicova J, Kittler J (1994a) Floating search methods for feature selection with nonmonotonic criterion functions. In: Proceedings of the twelveth international conference on pattern recognition, IAPR. CiteseerGoogle Scholar
  103. Pudil P, Novovičová J, Kittler J (1994b) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125Google Scholar
  104. Pudil P, Novovičová J, Choakjarernwanit N, Kittler J (1995) Feature selection based on the approximation of class densities by finite mixtures of special type. Pattern Recognit 28(9):1389–1398Google Scholar
  105. Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286Google Scholar
  106. Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178MathSciNetzbMATHGoogle Scholar
  107. Ribeiro PC, Santos-Victor J (2005) Human activity recognition from video: modeling, feature selection and classification architecture. In: Proceedings of international workshop on human activity recognition and modelling. Citeseer, pp 61–78Google Scholar
  108. Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B (Stat Methodol) 59(4):731–792zbMATHGoogle Scholar
  109. Robert CP, Ryden T, Titterington DM (2000) Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. J R Stat Soc Ser B (Stat Methodol) 62(1):57–75MathSciNetzbMATHGoogle Scholar
  110. Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69zbMATHGoogle Scholar
  111. Rydén T et al (2008) EM versus Markov chain Monte Carlo for estimation of hidden Markov models: a computational perspective. Bayesian Anal 3(4):659–688MathSciNetzbMATHGoogle Scholar
  112. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517Google Scholar
  113. Schwenk H (1999) Using boosting to improve a hybrid HMM/neural network speech recognizer. In: 1999 IEEE international conference on acoustics, speech, and signal processing, 1999. Proceedings, vol 2. IEEE, pp 1009–1012Google Scholar
  114. Scott SL (2002) Bayesian methods for hidden Markov models: recursive computing in the 21st century. J Am Stat Assoc 97(457):337–351Google Scholar
  115. Scrucca L (2016) Genetic algorithms for subset selection in model-based clustering. In: Unsupervised learning algorithms. Springer, pp 55–70Google Scholar
  116. Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26(7):900–912Google Scholar
  117. Städler N, Mukherjee S et al (2013) Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models. Ann Appl Stat 7(4):2157–2179MathSciNetzbMATHGoogle Scholar
  118. Steinley D, Brusco MJ (2008) Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika 73(1):125–144MathSciNetzbMATHGoogle Scholar
  119. Swartz MD, Mo Q, Murphy ME, Lupton JR, Turner ND, Hong MY, Vannucci M (2008) Bayesian variable selection in clustering high-dimensional data with substructure. J Agric Biol Environ Stat 13(4):407–423MathSciNetzbMATHGoogle Scholar
  120. Tadesse MG, Sha N, Vannucci M (2005) Bayesian variable selection in clustering high-dimensional data. J Am Stat Assoc 100(470):602–617MathSciNetzbMATHGoogle Scholar
  121. Valente F, Wellekens C (2004) Variational Bayesian feature selection for Gaussian mixture models. In: IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings.(ICASSP’04), vol 1. IEEE, pp I–513Google Scholar
  122. Vannucci M, Stingo FC (2010) Bayesian models for variable selection that incorporate biological information. Bayesian Stat 9:659–678Google Scholar
  123. Vellido A (2006) Assessment of an unsupervised feature selection method for generative topographic mapping. In: International conference on artificial neural networks. Springer, pp 361–370Google Scholar
  124. Vellido A, Lisboa PJ, Vicente D (2006) Robust analysis of MRS brain tumour data using t-GTM. Neurocomputing 69(7):754–768Google Scholar
  125. Vellido A, Velazco J (2008) The effect of noise and sample size on an unsupervised feature selection method for manifold learning. In: IEEE international joint conference on neural networks, 2008. IJCNN 2008 (IEEE world congress on computational intelligence). IEEE, pp 522–527Google Scholar
  126. Wang S, Zhu J (2008) Variable selection for model-based high-dimensional clustering and its application to microarray data. Biometrics 64(2):440–448MathSciNetzbMATHGoogle Scholar
  127. Wei X, Li C (2011) The Student’s \(t\) -hidden Markov model with truncated stick-breaking priors. IEEE Signal Process Lett 18(6):355–358Google Scholar
  128. Windridge D, Bowden R (2005) Hidden Markov chain estimation and parameterisation via ICA-based feature-selection. Pattern Anal Appl 8(1–2):115–124MathSciNetGoogle Scholar
  129. Wissel T, Pfeiffer T, Frysch R, Knight RT, Chang EF, Hinrichs H, Rieger JW, Rose G (2013) Hidden Markov model and support vector machine based decoding of finger movements using electrocorticography. J Neural Eng 10(5):056,020Google Scholar
  130. Witten DM, Tibshirani R (2010) A framework for feature selection in clustering. J Am Stat Assoc 105(490):713–726MathSciNetzbMATHGoogle Scholar
  131. Xie B, Pan W, Shen X (2008) Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables. Electron J Stat 2:168MathSciNetGoogle Scholar
  132. Xie B, Pan W, Shen X (2008) Variable selection in penalized model-based clustering via regularization on grouped parameters. Biometrics 64(3):921–930MathSciNetzbMATHGoogle Scholar
  133. Xie B, Pan W, Shen X (2010) Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data. Bioinformatics 26(4):501–508Google Scholar
  134. Xie L, Chang SF, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden Markov models. In: Proceedings of IEEE international conferene on acoustics, speech, and signal processing, vol 4Google Scholar
  135. Yin P, Essa I, Rehg JM (2004) Asymmetrically boosted HMM for speech reading. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 2. IEEE, p II-755Google Scholar
  136. Yin P, Essa I, Starner T, Rehg JM (2008) Discriminative feature selection for hidden Markov models using segmental boosting. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE, pp 2001–2004Google Scholar
  137. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML 3:856–863Google Scholar
  138. Yu SZ (2010) Hidden semi-Markov models. Artif Intell 174(2):215–243MathSciNetzbMATHGoogle Scholar
  139. Zeng H, Cheung YM (2009) A new feature selection method for Gaussian mixture clustering. Pattern Recognit 42(2):243–250zbMATHGoogle Scholar
  140. Zhou H, Pan W, Shen X (2009) Penalized model-based clustering with unconstrained covariance matrices. Electron J Stat 3:1473MathSciNetzbMATHGoogle Scholar
  141. Zhou J, Zhang XP (2008) An ICA mixture hidden Markov model for video content analysis. IEEE Trans Circuits Syst Video Technol 18(11):1576–1586Google Scholar
  142. Zhu H, He Z, Leung H (2012) Simultaneous feature and model selection for continuous hidden Markov models. IEEE Signal Process Lett 19(5):279–282Google Scholar
  143. Zhu K, Hong G, Wong Y (2008) A comparative study of feature selection for hidden Markov model-based micro-milling tool wear monitoring. Mach Sci Technol 12(3):348–369Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.University of VirginiaCharlottesvilleUSA

Personalised recommendations