Advertisement

Anomaly Detection for Categorical Observations Using Latent Gaussian Process

  • Fengmao Lv
  • Guowu Yang
  • Jinzhao WuEmail author
  • Chuan Liu
  • Yuhong Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10638)

Abstract

Anomaly detection is an important problem in many applications, ranging from medical informatics to network security. Various distribution-based techniques have been proposed to tackle this issue, which try to learn the probabilistic distribution of conventional behaviors and consider the observations with low densities as anomalies. For categorical observations, multinomial or dirichlet compound multinomial distributions were adopted as effective statistical models for conventional samples. However, when faced with small-scale data set containing multivariate categorical samples, these models will suffer from the curse of dimensionality and fail to capture the statistical properties of conventional behavior, since only a small proportion of possible categorical configurations will exist in the training data. As an effective bayesian non-parametric technique, categorical latent Gaussian process is able to model small-scale categorical data through learning a continuous latent space for multivariate categorical samples with Gaussian process. Therefore, on the basis of categorical latent Gaussian process, we propose an anomaly detection technique for multivariate categorical observations. In our method, categorical latent Gaussian process is adopted to capture the probabilistic distributions of conventional categorical samples. Experimental results on categorical data set show that our method can effectively detect anomalous categorical observations and achieve better detection performance compared with other anomaly detection techniques.

Keywords

Anomaly detection Categorical data Bayesian non-parametric model Gaussian process Data-efficient learning 

Notes

Acknowledgments

This paper is supported by the National Natural Science Foundation of China under grant No. 61572109, No. 11461006 and No. 61402080. The authors would like to thank the anonymous reviewers for their helpful and constructive comments.

References

  1. 1.
    Abolhasanzadeh, B.: Gaussian process latent variable model for dimensionality reduction in intrusion detection. In: Electrical Engineering (2015)Google Scholar
  2. 2.
    Agarwal, D.: Detecting anomalies in cross-classified streams: a Bayesian approach. Knowl. Inf. Syst. 11(1), 29–44 (2007)CrossRefGoogle Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: ACM Sigmod Record, vol. 30, pp. 37–46. ACM (2001)Google Scholar
  4. 4.
    Anscombe, F.J.: Rejection of outliers. Technometrics 2(2), 123–146 (1960)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Beal, M.J.: Variational algorithms for approximate Bayesian inference. University of London United Kingdom (2003)Google Scholar
  6. 6.
    Butun, I., Morgera, S.D., Sankar, R.: A survey of intrusion detection systems in wireless sensor networks. IEEE Commun. Surv. Tutor. 16(1), 266–282 (2014)CrossRefGoogle Scholar
  7. 7.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
  8. 8.
    D’Alconzo, A., Coluccia, A., Ricciato, F., Romirer-Maierhofer, P.: A distribution-based approach to anomaly detection and application to 3G mobile traffic. In: GLOBECOM, pp. 1–8 (2009)Google Scholar
  9. 9.
    Damianou, A.C., Titsias, M.K., Lawrence, N.D.: Variational inference for latent variables and uncertain inputs in Gaussian processes. J. Mach. Learn. Res. 17(1), 1425–1486 (2016)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Gal, Y., Chen, Y., Ghahramani, Z.: Latent Gaussian processes for distribution estimation of multivariate categorical data. In: Proceedings of the 32nd International Conference on Machine Learning, ICML2015, pp. 645–654 (2015)Google Scholar
  11. 11.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  12. 12.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  13. 13.
    Kudo, D., Waizumi, Y., Nemoto, Y.: Network traffic anomaly detection using multinomial distribution model according to service. Gastroenterology 148(4), S-500–S-501 (2015)Google Scholar
  14. 14.
    Laxhammar, R., Falkman, G., Sviestins, E.: Anomaly detection in sea traffic - a comparison of the Gaussian mixture model and the kernel density estimator. In: International Conference on Information Fusion, pp. 756–763. IEEE Computer Society (2009)Google Scholar
  15. 15.
    Oliveira, H., Caeiro, J.J., Correia, P.L.: Improved road crack detection based on one-class parzen density estimation and entropy reduction. In: 2010 17th IEEE International Conference on Image Processing (ICIP), pp. 2201–2204 (2010)Google Scholar
  16. 16.
    Orbanz, P., Teh, Y.W.: Bayesian nonparametric models. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 81–89. Springer, Boston (2011)Google Scholar
  17. 17.
    Ranganathan, A.: PLISS: detecting and labeling places using online change-point detection. Auton. Robots 32(4), 351–368 (2010)CrossRefGoogle Scholar
  18. 18.
    Shewhart, W.A.: Economic Control of Quality of Manufactured Product. Van Nostrand, New York City (1931)Google Scholar
  19. 19.
    Swarnkar, M., Hubballi, N.: OCPAD: one class Naive Bayes classifier for payload based anomaly detection. Expert Syst. Appl. 64, 330–339 (2016)CrossRefGoogle Scholar
  20. 20.
    Titsias, M.K.: Variational learning of inducing variables in sparse Gaussian processes. In: AISTATS, vol. 5, pp. 567–574 (2009)Google Scholar
  21. 21.
    Van Vlasselaer, V., Bravo, C., Caelen, O., Eliassi-Rad, T., Akoglu, L., Snoeck, M., Baesens, B.: APATE: a novel approach for automated credit card transaction fraud detection using network-based extensions. Decis. Support Syst. 75, 38–48 (2015)CrossRefGoogle Scholar
  22. 22.
    Wang, W., Zhang, B., Wang, D., Jiang, Y., Qin, S., Xue, L.: Anomaly detection based on probability density function with Kullback-Leibler divergence. Sig. Process. 126, 12–17 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Fengmao Lv
    • 1
  • Guowu Yang
    • 1
  • Jinzhao Wu
    • 2
    Email author
  • Chuan Liu
    • 1
  • Yuhong Yang
    • 3
  1. 1.The Bid Data Center, The School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduPeople’s Republic of China
  2. 2.Guangxi Key Laboratory of Hybrid Computation and IC Design AnalysisGuangxi University for NationalitiesNanningPeople’s Republic of China
  3. 3.The School of StatisticsUniversity of MinnesotaMinneapolisUSA

Personalised recommendations