Skip to main content

Hybrid Subspace Mixture Models for Prediction and Anomaly Detection in High Dimensions

  • Conference paper
  • First Online:
  • 3031 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10604))

Abstract

Robust learning of mixture models in high dimensions remains an open challenge and especially so in current big data era. This paper investigates twelve variants of hybrid mixture models that combine the G-means clustering, Gaussian, and Student t-distribution mixture models for high-dimensional predictive modeling and anomaly detection. High-dimensional data is first reduced to lower-dimensional subspace using whitened principal component analysis. For real-time data processing in batch mode, a technique based on Gram-Schmidt orthogonalization process is proposed and demonstrated to update the reduced dimensions to remain relevant in fulfilling the task objectives. In addition, a model-adaptation technique is proposed and demonstrated for big data incremental learning by statistically matching the mixture components’ mean and variance vectors; the adapted parameters are computed based on weighted average that takes into account the sample size of new and older statistics with a parameter to scale down the influence of older statistics in each iterative computation. The hybrid models’ performance are evaluated using simulation and empirical studies. Results show that simple hybrid models without the Expectation-Maximization training step can achieve equally high performance in high dimensions that is comparable to the more sophisticated models. For unsupervised anomaly detection, the hybrid models achieve detection rate \(\gtrsim 90\%\) with injected anomalies from \(1\%\) to \(60\%\) using the KDD Cup 1999 network intrusion dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For reproducibility, the Matlab scripts to run the simulation and experimental studies in this paper are obtainable from https://github.com/jennbing/hybrid-models.

References

  1. Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California Irvine School of Information (2013). http://www.ics.uci.edu/mlearn/MLRepository.html

  2. Barkan, O., Averbuch, A.: Robust mixture models for anomaly detection. In: IEEE International Workshop on Machine Learning for Signal Processing (2016)

    Google Scholar 

  3. Bishop, C.M.: Pattern recognition and machine learning. Pattern Recogn. 4(4), 738 (2006)

    MathSciNet  MATH  Google Scholar 

  4. Chaudhuri, K., Dasgupta, S., Vattani, A.: Learning mixtures of Gaussians using the k-means algorithm, pp. 1–22 (2009). arXiv preprint arXiv:0912.0086

  5. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ge, R., Huang, Q., Kakade, S.M.: Learning mixtures of Gaussians in high dimensions. In: STOC 2015 (2015)

    Google Scholar 

  7. Hamerly, G., Elkan, C.: Learning the k in k-means. In: Neural Information Processing Systems, pp. 281–288 (2003)

    Google Scholar 

  8. Hoque, M.S., Mukit, M.A., Bikas, M.A.N., Sazzadul Hoque, M.: An implementation of intrusion detection system using genetic algorithm. Int. J. Netw. Secur. Appl. 4(2), 109–120 (2012)

    Google Scholar 

  9. Lafon, S.: Diffusion maps and geometric harmonics. Ph.D. thesis, Yale University, U.S.A, p. 97 (2004)

    Google Scholar 

  10. Peel, D., McLachlan, G.J.: Robust mixture modelling using the t distribution. Stat. Comput. 10(4), 339–348 (2000)

    Article  Google Scholar 

  11. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)

    Article  Google Scholar 

  12. Song, M., Wang, H.: Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering. Intell. Comput. Theory Appl. 5803, 174–183 (2005)

    Google Scholar 

  13. Vempala, S.S.: Technical perspective modeling high-dimensional data. Commun. ACM 55(2), 112 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenn-Bing Ong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ong, JB., Ng, WK. (2017). Hybrid Subspace Mixture Models for Prediction and Anomaly Detection in High Dimensions. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69179-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69178-7

  • Online ISBN: 978-3-319-69179-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics