Abstract
The Internet water army (IWA) usually refers to hidden paid posters and collusive spammers, which has already generated big threats for cyber security. Many researchers begin to study how to effectively identify the IWA. Currently, most efforts to distinguish non-IWA and IWA in data mining context focus on utilizing classification-based algorithms, including Bayesian Network, SVM, KNN and etc... However, Bayesian Network need strong conditional independence assumption, KNN has big computation costs, above approach may affect the effectiveness to some extent in real industrial applications. Hence, Neural Networks-like deep approach for IWA identification gradually becomes an emerging but possible direction and attempt. Unfortunately, there also exists one main problem, which is how to balance the deep learning and computation costs in hierarchical architecture. More specially, combine leaning-level heuristic training design and computing-level concurrent computation is a challenging issue. In this paper, we propose a collaborative hierarchical approach based on the deep belief network (DBN) for IWA identification. Firstly, a DBN-based collaborative model with hierarchical classifying mechanism is built. Then towards Hadoop platform, the Downpour Stochastic gradient descent (Downpour SGD) is exploited for DBN pre-training. Finally, the dynamical workflow will be designed for managing the whole learning-based classifying process. The experimental evaluation shows that the valid of our approach.
Chapter PDF
Similar content being viewed by others
References
Chen, K., Zhou, Q., Zhou, Y., Lin, C.: Method for capturing water armies on microblog platforms, Univ. Shanghai Jiaotong, CN103095499A, Chinese Patent (2013)
Zhang, G., Bian, J., Fu, C., Li, Y.: Microblog ghostwriter identifying method and device, INST Computing Tech CN Academy, CN103198161A, Chinese Patent (2013)
Han, Z., Wan, Y., Microblog, F.X.: Microblog water army identifying method based on probabilistic graphical model, Univ. Beijing Tech & Business, CN103077240A, Chinese Patent (2013)
Zhang, W., Zheng, Z., Gao, W., Shuai, Z., Zhou, Y.: Detection and determination method of network navy, Anhui Boryou Information Technology Co. Ltd., Cn102629904A, Chinese Patent (2012)
Xu, Q., Zhao, H.: Using Deep Linguistic Features for Finding Deceptive Opinion Spam. In: International Conference on Computational Linguistics, pp. 1341–1350 (2012)
Lau, R.Y.K., Liao, S.Y., Kwok, R.C.W., et al.: Text mining and probabilistic language modeling for online review spam detecting. ACM Transactions on Management Information Systems 2(4), 1–30 (2011)
Harris, C.: Detecting deceptive opinion spam using human computation. In: Workshops at AAAI on Artificial Intelligence (2012)
Bhattarai, A., Rus, V., Dasgupta, D.: Characterizing comment spam in the blogosphere through content analysis. In: IEEE Symposium on Computational Intelligence in Cyber Security, CICS 2009, pp. 37–44. IEEE (2009)
Gansterer, W., Ilger, M., Lechner, P., et al.: Anti-spam methods-state of the art. Institute of Distributed and Multimedia Systems, University of Vienna (2005)
Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Systems with Applications 36(7), 10206–10222 (2009)
Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W.S.: Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 410–421. Springer, Heidelberg (2004)
Ruan, G., Tan, Y.: A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Computing 14(2), 139–150 (2010)
Yeh, C.Y., Wu, C.H., Doong, S.H.: Effective spam classification based on meta-heuristics. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, pp. 3872–3877. IEEE (2005)
Hershkop, S.: Behavior-based email analysis with application to spam detection. Columbia University (2006)
Ramachandran, A., Feamster, N.: Understanding the network-level behavior of spammers. In: ACM SIGCOMM Computer Communication Review, pp. 291–302. ACM (2006)
Ming, L., Yunchun, L., Wei, L.: Spam filtering by stages. In: International Conference on Convergence Information Technology, pp. 2209–2213. IEEE (2007)
Wang, M., Li, Z., Wu, H.: An improved Bayes algorithm for filtering spam email. Journal of Huazhong University of Science and Technology (Nature Science Edition) (8), 27–30 (2009)
Li, X., Tian, Y., Duan, H.: Implementation and evaluation of Chinese spam filtering system. Journal of Dalian University of Technology (z1), 189–195 (2008)
Sakkis, G., Androutsopoulos, I., Paliouras, G., et al.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6(1), 49–73 (2003)
Schapire, R.E., Singer, Y.: BoosTexter. A boosting-based system for text categorization. Machine Learning 39(2-3), 135–168 (2000)
Drueker, H., Donghui, W.W., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Transactions Neural Networks 10(5), 1048–1054 (1999)
Tang, Z.-H., Fu, J.-M., Du, N.-S.: Design and Analysis of Spam-Filtering System Based on Words Segmentation. Journal of Wuhan University (Natural Science Edition) 51(S2), 191–194 (2005)
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215–223. ACM (1998)
Clark, J., Koprinska, I., Poon, J.: A Neural Network Based Approach to Automated E-Mail Classification. In: Web Intelligence, pp. 702–705 (2003)
Xu, Z.B., Zhang, R., Jing, W.F.: When does online BP training converge? IEEE Transactions on Neural Networks 20(10), 1529–1539 (2009)
Guangchen, R., Ying, T.: A three-layer back-propagation neural network for spam detection using artificial immune concentration. Soft Computing 14(2), 139–150 (2010)
Hinton, G., Osindero, S., The, A.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Sun, W., Zhao, W., Niu, W., Chang, L. (2014). A DBN-Based Classifying Approach to Discover the Internet Water Army. In: Shi, Z., Wu, Z., Leake, D., Sattler, U. (eds) Intelligent Information Processing VII. IIP 2014. IFIP Advances in Information and Communication Technology, vol 432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44980-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-44980-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44979-0
Online ISBN: 978-3-662-44980-6
eBook Packages: Computer ScienceComputer Science (R0)