Skip to main content

Early Detection of Promotion Campaigns in Community Question Answering

  • Conference paper
  • First Online:
Social Media Processing (SMP 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 669))

Included in the following conference series:

  • 1160 Accesses

Abstract

As is the case with many social media websites, the Community Question Answering (CQA) portal has become a target for spammers to disseminate promotion information. Previous works mainly focus on identifying low-quality answers or detecting spam information in question-answer (QA) pairs. However, these works suffer from long delay since they all rely on the information of answers or answerers while questions have been displayed on the websites for some time and attracted certain user traffic. As a matter of fact, spammers on CQA platforms also act as questioners and involve promotion information in their questions. So if they can be detected as early as possible, the questions will not appear on the websites and affect legitimate users. In this paper, we design a framework for early detection of promotion campaigns in CQA based on only question information and questioner profile. First, we propose a novel sampling method for identifying the questions that contain promotion information, which compose the positive dataset. We also sample an unlabeled dataset of unsolved questions during a certain period of time. Then, we compare the characteristics of question information and user profiles between the two datasets, which are also used as features in the learning process. Finally, we apply and compare several PU (Positive and Unlabeled examples) learning algorithms to find positive examples in the unlabeled dataset. In our approach, no answer side information is needed, which means that it can detect spamming activities as soon as the question is posted. Experimental results based on about 0.7 million questions derived from a popular Chinese CQA portal indicate that our approach can detect questions related to promotion campaigns as effectively as but more efficiently than the state-of-the-art QA pair level detection methods.

This work was supported by Natural Science Foundation (61672311, 61622208, 61532011, 61472206) of China and National Key Basic Research Program (2015CB358700).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://wenwen.sogou.com/z/q627682313.htm.

  2. 2.

    http://wenwen.sogou.com/z/q625841351.htm.

  3. 3.

    http://www.sandaha.com/.

References

  1. Chen, C., Wu, K., Srinivasan, V., Bharadwaj, R.K.: The best answers? Think twice: identifying commercial campagins in the CQA forums. JCST 30(4), 810–828 (2015)

    Google Scholar 

  2. Chen, C., Wu, K., Srinivasan, V., Bharadwaj, R.K.: The best answers? Think twice: online detection of commercial campaigns in the CQA forums. In: ASONAM, pp. 458–465 (2013)

    Google Scholar 

  3. Chen, Y.-R., Chen, H.-H.: Opinion spam detection in web forum: a real case study. In: WWW, pp. 173–183 (2015)

    Google Scholar 

  4. Ding, Z., Gong, Y., Zhou, Y., Zhang, Q., Huang, X.: Detecting spammers in community question answering. In: IJCNLP, pp. 118–126 (2013)

    Google Scholar 

  5. Fayazi, A., Lee, K., Caverlee, J., Squicciarini, A.: Uncovering crowdsourced manipulation of online reviews. In: SIGIR, pp. 233–242 (2015)

    Google Scholar 

  6. Harper, F.M., Raban, D., Rafaeli, S., Konstan, J.A.: Predictors of answer quality in online Q&A sites. In: SIGCHI, pp. 865–874 (2008)

    Google Scholar 

  7. Jeon, J., Croft, W.B., Lee, J.H., Park, S.: A framework to predict the quality of answers with non-textual features. In: SIGIR, pp. 228–235 (2006)

    Google Scholar 

  8. Jiang, F., Liu, Y., Luan, H., Sun, J., Zhu, X., Zhang, M., Ma, S.: Microblog sentiment analysis with emoticon space model. JCST 30(5), 1120–1129 (2015)

    Google Scholar 

  9. Li, B., Jin, T., Lyu, M.R., King, I., Mak, B.: Analyzing and predicting question quality in community question answering services. In: WWW, pp. 775–782 (2012)

    Google Scholar 

  10. Li, X., Liu, Y., Zhang, M., Ma, S., Zhu, X., Sun, J.: Detecting promotion campaigns in community question answering. In: IJCAI, pp. 2348–2354 (2015)

    Google Scholar 

  11. Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM, pp. 179–186 (2003)

    Google Scholar 

  12. Liu, Y., Chen, F., Kong, W., Yu, H., Zhang, M., Ma, S., Ru, L.: Identifying web spam with the wisdom of the crowds. TWEB 6(1), 1–30 (2012)

    Article  Google Scholar 

  13. Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2002)

    MATH  Google Scholar 

  14. Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR, pp. 411–418 (2010)

    Google Scholar 

  15. Suryanto, M.A., Lim, E.P., Sun, A., Chiang, R.H.: Quality-aware collaborative question answering: methods and evaluation. In: WSDM, pp. 142–151 (2009)

    Google Scholar 

  16. Tian, T., Zhu, J., Xia, F., Zhuang, X., Zhang, T.: Crowd fraud detection in internet advertising. In: WWW, pp. 1100–1110 (2015)

    Google Scholar 

  17. Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., Zhao, B.Y.: Serf and turf: crowdturfing for fun and profit. In: WWW, pp. 679–688 (2012)

    Google Scholar 

  18. Xu, H., Liu, D., Wang, H., Stavrou, A.: E-commerce reputation manipulation: The emergence of reputation-escalation-as-a-service. In: WWW, pp. 1296–1306 (2015)

    Google Scholar 

  19. Yu, H., Han, J., Chang, K.C.-C.: PEBL: positive example based learning for web page classification using SVM. In: SIGKDD, pp. 239–248 (2002)

    Google Scholar 

  20. Zafarani, R., Liu, H.: 10 bits of surprise: detecting malicious users with minimum information. In: CIKM, pp. 423–431 (2015)

    Google Scholar 

  21. Zhang, K., Wu, W., Wu, H., Li, Z., Zhou, M.: Question retrieval with high quality answers in community question answering. In: CIKM, pp. 371–380 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Li, X., Liu, Y., Zhang, M., Ma, S. (2016). Early Detection of Promotion Campaigns in Community Question Answering. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds) Social Media Processing. SMP 2016. Communications in Computer and Information Science, vol 669. Springer, Singapore. https://doi.org/10.1007/978-981-10-2993-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2993-6_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2992-9

  • Online ISBN: 978-981-10-2993-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics