Skip to main content

An Approach to Detect the Internet Water Army via Dirichlet Process Mixture Model Based GSP Algorithm

  • Conference paper
Book cover Applications and Techniques in Information Security (ATIS 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 490))

Abstract

The Internet Water Army (IWA) brings a great threat on cyber security. How to accurately recognize the IWA has become a challenging research issue. Most work exploits the behavioral analysis to distinguish IWA and non-IWA. These approaches are mainly divided into categories: direct compute method and training learning method. The direct calculation method mainly relies on crawler, and makes multidimensional eigenvector to detect IWA. Nevertheless, it did not consider the behavior rules based on the time sequence, and just determine the user behavior by feather vector, so the results are not very accurate. The recognition rate also needs to be improved. The second method mainly relies on cluster approaches. However, cluster approaches require pre-determined the number of clustering, which will directly lead to the model over fitting and owe fitting because of inadequate unit number. In this paper we propose a sequential pattern approach based on DPMM for IWA identification. Firstly, we analyze the user behavior of potential IWA and get a feature vector of user behavior. Secondly, we use DPMM to get effective and accurate clustering results. Finally, we use the sequential pattern mining algorithms to detect navy accounts. Our clustering results with datasets come from Tianya forum show a very ideal consequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, Y.: Data Clustering via Nonparametric Bayesian Modelsm. Journal of Ningbo University (NSEE) 26(4), 24–28 (2013)

    Google Scholar 

  2. Chen, C., Wu, K., Srinivasan, V., et al.: Battling the internet water army: Detection of hidden paid posters. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 116–120. ACM (2013)

    Google Scholar 

  3. Xu, Q., Zhou, J., Chen, J.: Dirichlet Process and Its Applications in Natural Language Processing. Journal of Chinese Information Processing 23(5), 25–32 (2009)

    Google Scholar 

  4. Zhang, L., Liu, H.: A clustering method based on Dirichlet process mixture model. Journal of Chian University of Mining Technology 41(1), 159–163 (2012)

    Google Scholar 

  5. Ding, Z., Song, W., Li, J.: User Behavior An alysis in Social Network Service Based on Sequential Pattern. Journal of Moder Information 33(3), 56–60 (2013)

    Google Scholar 

  6. Zhou, J., Wang, F., Zeng, D.: Hierarchical Dirichlet Processes and Their Applications. Acta Automatica Sinica 37(4), 389–407 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  7. Mei, S., Wang, F., Zhou, S.: Dirichlet process mixture model, extensions and applications. Chin. Sci. Bull. (Chin Ver.) 57(34), 3243–3257 (2012)

    Google Scholar 

  8. Xia, M., Wang, X., Sun, Y., Jin, T.: Research on Sequential Pattern Mining Algorithms. Computer Technology and Development 16(4), 4–6 (2006)

    Google Scholar 

  9. Lu, F., Zhang, W.: Research on the Characters of Four Sequential Patterns Mining Algorithms. Journal of Wuhan University of Technology 28(2), 57–60 (2006)

    Google Scholar 

  10. Chen, Z., Yang, B., Song, W., Song, Z.: Survey of sequential pattern mining. Application Research of Computers 25(7), 1960–1963 (2008)

    Google Scholar 

  11. Teh, Y.W., Jordan, M.I., Beal, M.J., et al.: Hierarchical dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. Wang, C., Blei, D.M.: Variational inference for the nested Chinese restaurant process. In: Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Canada, pp. 1990–1998 (2009)

    Google Scholar 

  13. Casella, G., George, E.I.: Explaining the Gibbs sampler. The American Statistician 46(3), 167–174 (2009)

    MathSciNet  Google Scholar 

  14. Sudderth, E.B.: Graphical Models for Visual Object Recognition and Tracking [Ph. D. dissertation], Department of Electrical Engineering and Computer Science, USA (2006)

    Google Scholar 

  15. Escobar, M.D., West, M., West, M.: Bayesian density estimation and inference using mixtures. Journal of the AmericanStatistical Association 90(430), 577–588 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  16. Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics 2(6), 1152–1174 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  17. Hjort, N.L., Holmes, C., Muller, P., Walker, S.G.: Bayesian Nonparametrics. Cambridge University Press, Cambridge (2010)

    Book  MATH  Google Scholar 

  18. Koller, D., Friedman, N.: Probabilistic Graphical Models:Principles and Techniques. The MIT Press, Massachusetts (2009)

    Google Scholar 

  19. MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967)

    Google Scholar 

  20. Ester, M., Kriecel, H.P., Aander, J., et al.: A density-based algorithm for discovering clusters in large spatial database with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovering and Data Mining, Portland, pp. 226–231 (1996)

    Google Scholar 

  21. Sheikholeslami, G., Chattrerjee, S., Zhang, A.: WaveCluster:A Multi-Resolution Clustering Approach for Very Large Apatial Databases. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 428–439 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, D., Li, Q., Hu, Y., Niu, W., Tan, J., Guo, L. (2014). An Approach to Detect the Internet Water Army via Dirichlet Process Mixture Model Based GSP Algorithm. In: Batten, L., Li, G., Niu, W., Warren, M. (eds) Applications and Techniques in Information Security. ATIS 2014. Communications in Computer and Information Science, vol 490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45670-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45670-5_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45669-9

  • Online ISBN: 978-3-662-45670-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics