Identifying ground truth in opinion spam: an empirical survey based on review psychology

Abstract

Because it is very harmful, opinion spam, especially that involving untruthful reviews, has attracted much attention in the last decade. However, the lack of annotations, i.e., the ground truth problem, still serves as the key challenge. It is difficult because spammers always deliberately forge their reviews, which cannot be distinguished even by field experts. Considering the obvious intention of spammers, i.e., to promote or demote an items reputation, the opportunity exists to label them by considering crowd psychology. To date, several studies have applied, verified, and presented helpful evidence, including prior, empirical, heuristic, and simulative pseudo truths. In this paper, after investigating both authentic and deceptive reviewers’ diverse motives, we survey state-of-the-art truth by considering two classical roles, e.g., crowdsourcing and expert spammers. For each role, several topics related to spam attacks either with or without disguising and possible outliers are highlighted. Comparison analyses led to some interesting conclusions: 1) data on professional spammers are more challenging to collect and less reliable than data on crowdsourcing spammers; 2) most linguistic evidences are less reliable than behavioral footprints; 3) abnormal activities are as trustworthy as spamming objectives, while they hardly need any extra support, such as the user profile; and 4) the top reliable facts requiring acceptable effort are deviation, burstiness, grouped spamming, deviation over the threshold, review distribution, opinion proportion and spam cost. Moreover, we introduce several promising directions for future research. In general, this survey may shed light on new angles that can be used to understand review spam and to improve the performance of any anti-spam platforms.

This is a preview of subscription content, log in to check access.

Fig. 1

References

  1. 1.

    Jindal N, Liu B (2008) Opinion spam and analysis. WSDM’08: the (2008) international conference on web search and data mining. ACM, New York, pp 219–230

  2. 2.

    Ott M, Cardie C, Hancock J (2012) Estimating the prevalence of deception in online review communities. Proceedings of the 21st international conference on World Wide Web - WWW '12. ACM, New York, pp 201–210

  3. 3.

    Cardoso EF, Silva RM, Almeida TA (2018) Towards automatic filtering of fake reviews. Neurocomputing 309(2):106–116

    Article  Google Scholar 

  4. 4.

    Ren Y, Ji D (2019) Learning to detect deceptive opinion spam: A survey. IEEE Access 7:42934–42945

    Article  Google Scholar 

  5. 5.

    Vidanagama DU, Silva TP (2019) Karunananda AS Deceptive consumer review detection: A survey. Artif Intell Rev :1–30

  6. 6.

    Jindal N, Liu B (2007) Analyzing and detecting review spam. Seventh IEEE International Conference on Data Mining (ICDM 2007). IEEE, Omaha, pp 547–552

  7. 7.

    Anderson E, Simester D (2013) Deceptive reviews: The influential tail

  8. 8.

    De Meo P, Messina F, Rosaci D, Sarnè GM (2015) L. 2d-socialnetworks:Away to virally distribute popular information avoiding spam. Stud Comput Intell 570:369–375

    Google Scholar 

  9. 9.

    Shih D-H, Chiang H-S, Lin B (2008) Collaborative spam filtering with heterogeneous agents. Expert Syst Appl 35:1555–1566

    Article  Google Scholar 

  10. 10.

    Somayeh S et al (2013) Detecting deceptive reviews using lexical and syntactic features. In: 13th International Conference on Intellient Systems Design and Applications

  11. 11.

    Duhan N, Divya; Mittal M (2017) Opinion mining using ontological spam detection. In: 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (Ictus), pp 557–562

  12. 12.

    Jindal N, Liu B, Lim E-P (2010) Finding unusual review patterns using unexpected rules. In: The 19th ACM international conference on Information and knowledge management, ACM, New York,  pp 1549–1552

  13. 13.

    Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. KDD’13. Chicago, Illinois, USA, pp 632–640

  14. 14.

    Savage D, Zhang XZ, Yu XH, Chou P, Wang QM (2015) Detection of opinion spam based on anomalous rating deviation. Expert Syst Appl 42:8650–8657

    Article  Google Scholar 

  15. 15.

    Zhang L, Wang S-F, Lin Z-Z, Wu Y (2019) Online ballot stuffing: Influence of self-boosting manipulation on rating dynamics in online rating systems. Telematics Inform 38:1–12

    Article  Google Scholar 

  16. 16.

    Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing. In: The Seventh International AAAI Conference on Weblogs and Social Media, AAAI, Menlo Park, pp 409–418

  17. 17.

    Hu N, Bose I, Gao Y, Liu L (2011) Manipulation in digital word-of-mouth: A reality check for book reviews. Decis Support Syst 50(2011):627–635

    Article  Google Scholar 

  18. 18.

    Mayzlin D, Dover Y, Chevalier J (2014) Promotional reviews: An empirical investigation of online review manipulation. Am Econ Rev 104:8

  19. 19.

    Heydari A, Tavakoli Ma, Salim N, Heydari Z (2015) Detection of review spam: A survey. Expert Syst Appl 42:3634–3642

    Article  Google Scholar 

  20. 20.

    Dewang RK, Singh AK (2018) State-of-art approaches for review spammer detection: A survey. J Intell Inf Syst 50(2):231–264

    Article  Google Scholar 

  21. 21.

    Li L, Qin B, Liu T (2018) Survey on fake review detection research. Chin J Comput 4(2017):946–968

    Google Scholar 

  22. 22.

    Hussain N, Turab Mirza H, Rasool G, Hussain I, Kaleem M (2019) Spam review detection techniques: A systematic literature review. Appl Sci 9:987

    Article  Google Scholar 

  23. 23.

    Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J  Big Data 2:1

  24. 24.

    Ahmad S, Pathak A, Jaiswal S (2018) A survey about spam detection and analysis using users’ reviews. Malay J Matematik S(1):1–4

  25. 25.

    Dou Y (2019) A review of recent advance in online spam detection

  26. 26.

    Wang Qianqian LB, Wenchang S, Zhaohui L, Wei S. (2010) Detecting spam comments with malicious users’ behavioral characteristics. In: ICITIS2010: 2010 IEEE International Conference on Information Theory and Information Security. IEEE, pp 563–567

  27. 27.

    Dichter E (1966) How word-of-mouth advertising works. Harvard Bus Rev 44:6

    Google Scholar 

  28. 28.

    Engel JF, Kegerreis RJ, Blackwell RD (1969) Word-of-mouth communication by the innovator. J Mark 33:15–19

    Article  Google Scholar 

  29. 29.

    Buttle FA (1998) Word of mouth: Understanding and managing referral marketing. J Strateg Mark 6:241–254

    Article  Google Scholar 

  30. 30.

    Sundaram DS, Mitra K, Webster C (1998) Word-of-mouth communications: A motivational analysis. ACR N Am Adv 25:1

    Google Scholar 

  31. 31.

    Hennig-Thurau T, Gwinner KP, Walsh G, Gremler DD (2004) Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the internet? J Interact Mark 18(1):38–52

    Article  Google Scholar 

  32. 32.

    Dellarocas C, Narayan R (2006) What motivates consumers to review a product online? A study of the product-specific antecedents of online movie reviews. WISE

  33. 33.

    Zhu F, Zhang X (2010) Impact of online consumer reviews on sales:The moderating role of product and consumer characteristics. J Mark 74:133–148

    Article  Google Scholar 

  34. 34.

    Balasubramanian S, Mahajan V (2001) The economic leverage of the virtual community. International journal of electronic commerce 5(3):103–138

    Article  Google Scholar 

  35. 35.

    Oliver RL, Swan JE (1989) Equity and disconfirmation perceptions as influences on merchant and product satisfaction. Journal of consumer research 16:372–383

    Article  Google Scholar 

  36. 36.

    Mark A, James B, Jeffrey G, Ml K, Jon M, Heather S, Robin S (1992) Complaining behavior in social interaction. Personality social psychology bulletin 18:286–295

    Article  Google Scholar 

  37. 37.

    Berkowitz L (1970) Experimental investigations of hostility catharsis. J Consult Clin Psychol 35(1):1–7

    MathSciNet  Article  Google Scholar 

  38. 38.

    Leibenstein H (1950) Bandwagon, snob, and veblen effects in the theory of consumers' demand. Q J Econ 64:183–207

  39. 39.

    Shyam SS, Oeldorf-Hirsch A, Xu Q (2008) The bandwagon effect of collaborative filtering technology. In: CHI'08: CHI'08 extended abstracts on Human factors in computing systems. ACM, New York, pp 3453–3458

  40. 40.

    Eric M (1999) Nash equilibrium and welfare optimality. Rev Econ Stud 66:1

    MathSciNet  MATH  Article  Google Scholar 

  41. 41.

    Deborah F, Baron J (1988) Ambiguity and rationality. J Behav Decis Mak 1(3):149–157

    Article  Google Scholar 

  42. 42.

    Pronin EL, Daniel Y, Ross (2002) Lee. The bias blind spot: Perceptions of bias in self versus others. Pers Soc Psychol Bull 28:369–381

    Article  Google Scholar 

  43. 43.

    Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York

  44. 44.

    Liu Y, Zhou W (2016) Can you really make profit from online rating manipulations?: An empirical study (2016). In: IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). IEEE, Atlanta, pp 509–518

  45. 45.

    Kuran T, Sunstein CR (1998) Availability cascades and risk regulation. Stan L Rev 51:683

    Article  Google Scholar 

  46. 46.

    Carragher DJ, Lawrence BJ, Thomas NA, Nicholls ME (2018) R. Visuospatial asymmetries do not modulate the cheerleader effect. Sci Rep 8:1

    Article  Google Scholar 

  47. 47.

    McDowell J, Starratt VG (2019) Experimental examination and extension of the cheerleader effect. Personality Individ Differ 147:245–249

    Article  Google Scholar 

  48. 48.

    Bickart BA (1993) Carryover and backfire effects in marketing research. Journal of Marketing research 30(1):52–62

    Article  Google Scholar 

  49. 49.

    Strack F, Mussweiler T (1997) Explaining the enigmatic anchoring effect: Mechanisms of selective accessibility. J Personal Soc Psychol 73:437

    Article  Google Scholar 

  50. 50.

    Tversky A, Kahneman D, Availability (1973) A heuristic for judging frequency and probability. Cogn Psychol 5:207–232

    Article  Google Scholar 

  51. 51.

    Burgoon JK, Blair JP, Qin T, Nunamaker JF Jr (2003) Detecting deception through linguistic analysis. Intelligence and Security Informatics

  52. 52.

    Bar-Hillel M (1980) The base-rate fallacy in probability judgments. Acta Physiol (Oxf) 44(3):211–233

    Google Scholar 

  53. 53.

    Hooi B, Shin K, Song HA, Beutel A, Shah N, Faloutsos C (2017) Graph-based fraud detection in the face of camouflage. ACM Trans Knowl Discov Data 11:4

    Article  Google Scholar 

  54. 54.

    Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: The 49th annual meeting of the association for computational linguistics: Human language technologies-vol 1. Association for Computational Linguistics, Stroudsburg, pp 309–319

  55. 55.

    Li X, Hitt LM (2008) Self-selection and information role of online product reviews. Inf Syst Res 19:456–474

    Article  Google Scholar 

  56. 56.

    Luca M, Zervas G (2016) Fake it till you make it: Reputation, competition, and yelp review fraud. Manag Sci 62:3412–3427

    Article  Google Scholar 

  57. 57.

    Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: Predicting deception from linguistic styles. Pers Soc Psychol Bull 29(5):665–675

    Article  Google Scholar 

  58. 58.

    Zhou L (2003) An exploratory study into deception detection in text-based computer-mediated communication. In: The 36th Annual Hawaii International Conference on System Sciences. IEEE, Big Island, pp 1–10

  59. 59.

    Anderson Erict (2013) Advertising in a competitive market: The role of product standards, customer learning, and switching costs. J Mark Res 50(4):489–504

    Article  Google Scholar 

  60. 60.

    Mukherjee S, Dutta S, Weikum G (2016) Credible review detection with limited information using consistency features. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, pp 195–213

  61. 61.

    Rout JK, Dalmia A, Choo KKR, Bakshi S, Jena SK (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5:1319–1327

    Article  Google Scholar 

  62. 62.

    Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: KDD’12: the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 823–831

  63. 63.

    Liu Y, Pang B (2018) A unified framework for detecting author spamicity by modeling review deviation. Expert Syst Appl 112:148–155

    Article  Google Scholar 

  64. 64.

    Lim E-P, Nguyen V-A, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, New York

  65. 65.

    Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of deceptive product reviews. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases

  66. 66.

    Li H (2016) Modeling review spam using temporal patterns and co-bursting behaviors. arXiv:1611.06625v1

  67. 67.

    Günnemann S, Günnemann N, Faloutsos C (2014) Detecting anomalies in dynamic rating data: A robust probabilistic model for rating evolution. In: The 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 841–850

  68. 68.

    Günnemann N, Günnemann S, Faloutsos C (2014) Robust multivariate autoregression for anomaly detection in dynamic product ratings. In: WWW’14: the 23rd international conference on World wide web. ACM, New York, pp 361–372

  69. 69.

    Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. In: The Seventh International AAAI Conference on Weblogs and Social Media

  70. 70.

    Wu F, Huberman BA (2008) How public opinion forms. In: International Workshop on Internet and Network Economics. Springer, Berlin, pp 334–341

  71. 71.

    Godes D, Silva JC (2012) Sequential and temporal dynamics of online opinion. Mark Sci 31(3):448–473

    Article  Google Scholar 

  72. 72.

    Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: WWW 2012: the 21st international conference on World Wide Web. ACM, New York, pp 191–200

  73. 73.

    Zhang L, Wu Z, Cao J (2018) Detecting spammer groups from product reviews: A partially supervised learning model. IEEE Access 6:2559–2568

    Article  Google Scholar 

  74. 74.

    Li Q, Wu Q, Zhu C, Zhang J, Zhao W (2019) Unsupervised user behavior representation for fraud review detection with cold-start problem. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, pp 222–236

  75. 75.

    Xu Y, Zhang F (2019) Detecting shilling attacks in social recommender systems based on time series analysis and trust features. Knowl-Based Syst 178:25–47

    Article  Google Scholar 

  76. 76.

    Aghdam NH, Ashtiani M, Azgomi MA (2020) An uncertainty-aware computational trust model considering the co-existence of trust and distrust in social networks. Inf Sci 513:465–503

    MathSciNet  Article  Google Scholar 

  77. 77.

    Jiang C, Liu S, Lin Z, Zhao G, Duan R, Liang K (2016) Domain-aware trust network extraction for trust propagation in large-scale heterogeneous trust networks. Knowl-Based Syst 111:237–247

    Article  Google Scholar 

  78. 78.

    Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews

  79. 79.

    Kakhki AM, Kliman-Silver C, Mislove A (2013) Iolaus: Securing online content rating systems. In: The 22nd international conference on World Wide Web.ACM, New York, pp 919–930

  80. 80.

    Mayzlin D (2006) Promotional chat on the internet. Mark Sci 25:2

    Article  Google Scholar 

  81. 81.

    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  82. 82.

    Chang W, Xu Z, Zhou S, Cao W (2018) Research on detection methods based on doc2vec abnormal comments. Futur Gener Comput Syst 86:656–662

    Article  Google Scholar 

  83. 83.

    Vrij A (2008) Detecting lies and deceit: Pitfalls and opportunities. Wiley, Hoboken

  84. 84.

    Wang X, Liu K, Zhao J (2017) Handling cold-start problem in review spam detection by jointly embedding texts and behaviors. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (vol 1: Long Papers)

  85. 85.

    McAuley J, Targett C, Shi Q, van den Hengel A. (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '15, pp 43–52

  86. 86.

    Liao XW, Xu XT, Pan JS, Chen GL (2017) Detect online review spammers based on comprehensive trustiness propagation model. J Internet Technol 18(3):637–644

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the following funds: National Nature Science Foundation of China, under grant 61702320, 61801285 and 61802247; Shanghai Municipal Commission of Economy and Informatization, under grant 201701014; Shanghai Pudong Science, Technology and Economy Commission, under grant PKJ2019-Y03.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jiandun Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, J., Wang, X., Yang, L. et al. Identifying ground truth in opinion spam: an empirical survey based on review psychology. Appl Intell (2020). https://doi.org/10.1007/s10489-020-01764-7

Download citation

Keywords

  • Opinion spam
  • Review spam
  • Ground truths
  • Review psychology
  • Crowdsourcing spammer
  • Expert spammer