Early Detection of Promotion Campaigns in Community Question Answering

Li, Xin; Liu, Yiqun; Zhang, Min; Ma, Shaoping

doi:10.1007/978-981-10-2993-6_15

Xin Li¹⁴,
Yiqun Liu¹⁴,
Min Zhang¹⁴ &
…
Shaoping Ma¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 669))

Included in the following conference series:

Chinese National Conference on Social Media Processing

1160 Accesses

Abstract

As is the case with many social media websites, the Community Question Answering (CQA) portal has become a target for spammers to disseminate promotion information. Previous works mainly focus on identifying low-quality answers or detecting spam information in question-answer (QA) pairs. However, these works suffer from long delay since they all rely on the information of answers or answerers while questions have been displayed on the websites for some time and attracted certain user traffic. As a matter of fact, spammers on CQA platforms also act as questioners and involve promotion information in their questions. So if they can be detected as early as possible, the questions will not appear on the websites and affect legitimate users. In this paper, we design a framework for early detection of promotion campaigns in CQA based on only question information and questioner profile. First, we propose a novel sampling method for identifying the questions that contain promotion information, which compose the positive dataset. We also sample an unlabeled dataset of unsolved questions during a certain period of time. Then, we compare the characteristics of question information and user profiles between the two datasets, which are also used as features in the learning process. Finally, we apply and compare several PU (Positive and Unlabeled examples) learning algorithms to find positive examples in the unlabeled dataset. In our approach, no answer side information is needed, which means that it can detect spamming activities as soon as the question is posted. Experimental results based on about 0.7 million questions derived from a popular Chinese CQA portal indicate that our approach can detect questions related to promotion campaigns as effectively as but more efficiently than the state-of-the-art QA pair level detection methods.

This work was supported by Natural Science Foundation (61672311, 61622208, 61532011, 61472206) of China and National Key Basic Research Program (2015CB358700).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Chen, C., Wu, K., Srinivasan, V., Bharadwaj, R.K.: The best answers? Think twice: identifying commercial campagins in the CQA forums. JCST 30(4), 810–828 (2015)
Google Scholar
Chen, C., Wu, K., Srinivasan, V., Bharadwaj, R.K.: The best answers? Think twice: online detection of commercial campaigns in the CQA forums. In: ASONAM, pp. 458–465 (2013)
Google Scholar
Chen, Y.-R., Chen, H.-H.: Opinion spam detection in web forum: a real case study. In: WWW, pp. 173–183 (2015)
Google Scholar
Ding, Z., Gong, Y., Zhou, Y., Zhang, Q., Huang, X.: Detecting spammers in community question answering. In: IJCNLP, pp. 118–126 (2013)
Google Scholar
Fayazi, A., Lee, K., Caverlee, J., Squicciarini, A.: Uncovering crowdsourced manipulation of online reviews. In: SIGIR, pp. 233–242 (2015)
Google Scholar
Harper, F.M., Raban, D., Rafaeli, S., Konstan, J.A.: Predictors of answer quality in online Q&A sites. In: SIGCHI, pp. 865–874 (2008)
Google Scholar
Jeon, J., Croft, W.B., Lee, J.H., Park, S.: A framework to predict the quality of answers with non-textual features. In: SIGIR, pp. 228–235 (2006)
Google Scholar
Jiang, F., Liu, Y., Luan, H., Sun, J., Zhu, X., Zhang, M., Ma, S.: Microblog sentiment analysis with emoticon space model. JCST 30(5), 1120–1129 (2015)
Google Scholar
Li, B., Jin, T., Lyu, M.R., King, I., Mak, B.: Analyzing and predicting question quality in community question answering services. In: WWW, pp. 775–782 (2012)
Google Scholar
Li, X., Liu, Y., Zhang, M., Ma, S., Zhu, X., Sun, J.: Detecting promotion campaigns in community question answering. In: IJCAI, pp. 2348–2354 (2015)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM, pp. 179–186 (2003)
Google Scholar
Liu, Y., Chen, F., Kong, W., Yu, H., Zhang, M., Ma, S., Ru, L.: Identifying web spam with the wisdom of the crowds. TWEB 6(1), 1–30 (2012)
Article Google Scholar
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2002)
MATH Google Scholar
Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR, pp. 411–418 (2010)
Google Scholar
Suryanto, M.A., Lim, E.P., Sun, A., Chiang, R.H.: Quality-aware collaborative question answering: methods and evaluation. In: WSDM, pp. 142–151 (2009)
Google Scholar
Tian, T., Zhu, J., Xia, F., Zhuang, X., Zhang, T.: Crowd fraud detection in internet advertising. In: WWW, pp. 1100–1110 (2015)
Google Scholar
Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., Zhao, B.Y.: Serf and turf: crowdturfing for fun and profit. In: WWW, pp. 679–688 (2012)
Google Scholar
Xu, H., Liu, D., Wang, H., Stavrou, A.: E-commerce reputation manipulation: The emergence of reputation-escalation-as-a-service. In: WWW, pp. 1296–1306 (2015)
Google Scholar
Yu, H., Han, J., Chang, K.C.-C.: PEBL: positive example based learning for web page classification using SVM. In: SIGKDD, pp. 239–248 (2002)
Google Scholar
Zafarani, R., Liu, H.: 10 bits of surprise: detecting malicious users with minimum information. In: CIKM, pp. 423–431 (2015)
Google Scholar
Zhang, K., Wu, W., Wu, H., Li, Z., Zhou, M.: Question retrieval with high quality answers in community question answering. In: CIKM, pp. 371–380 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Xin Li, Yiqun Liu, Min Zhang & Shaoping Ma

Authors

Xin Li
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shaoping Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Li .

Editor information

Editors and Affiliations

Beijing Language and Culture University, Beijing, China
Yuming Li
Jiangxi Normal University, Nanchang, China
Guoxiong Xiang
Dalian University of Technology, Dalian, China
Hongfei Lin
Jiangxi Normal University, Nanchang, China
Mingwen Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Liu, Y., Zhang, M., Ma, S. (2016). Early Detection of Promotion Campaigns in Community Question Answering. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds) Social Media Processing. SMP 2016. Communications in Computer and Information Science, vol 669. Springer, Singapore. https://doi.org/10.1007/978-981-10-2993-6_15

Download citation

DOI: https://doi.org/10.1007/978-981-10-2993-6_15
Published: 19 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2992-9
Online ISBN: 978-981-10-2993-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics