Opinion spam detection framework using hybrid classification scheme

  • Muhammad Zubair AsgharEmail author
  • Asmat Ullah
  • Shakeel Ahmad
  • Aurangzeb Khan
Methodologies and Application


With the advent of social networking sites, opinion-mining applications have attracted the interest of the online community on review sites to know about products for their purchase decisions. However, due to increasing trend of posting spam (fake) reviews to promote the target products or defame the specific brands of competitors, Opinion Spam detection and classification has emerged as a hot issue in the community of opinion mining and sentiment analysis. We investigate the issue of Opinion Spam detection by using different combinations of entities, features, and their sentiment scores. We enrich the feature set of a baseline Spam detection method with Spam detection features (Opinion Spam, Opinion Spammer, Item Spam). Using a dataset of reviews from the Amazon site and sentences labeled for Spam detection, we evaluate the role of spamicity-related features in detecting and classifying spam (fake) clues and distinguishing them from genuine reviews. For this purpose, we introduce a rule-based feature weighting scheme and propose a method for tagging the review sentence as spam and non-spam. Experiments results depict that spam-related features improve Spam detection in review sentences posted on product review sites. Adding a revised feature weighting scheme achieved an accuracy increase from 93 to 96%. Furthermore, a hybrid set of features are shown to improve the performance of Opinion Spam detection in terms of better precision, recall, and F-measure values. This work shows that combining spam-related features with rule-based weighting scheme can improve the performance of even baseline Spam detection method. This improvement can be of use to Opinion Spam detection systems, due to the growing interest of individuals and companies in isolating fake (spam) and genuine (non-spam) reviews about products. The outcome of this work will provide an insight into spam-related features and feature weighting and will assist in developing more advanced applications for Opinion Spam detection. In the field of Opinion Spam detection, previous state-of-the-art studies used less number of spamicity-related features and less efficient feature weighting scheme. However, we provided a revised feature selection and a revised feature weighting scheme with normalized spamicity score computation technique. Therefore, our contribution is novel to the field because it provides a significant improvement over the comparing methods.


Opinion spam Spammer Spam detection Fake reviews 


Compliance with ethical standards

Conflict of interest

The authors declare that they no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

500_2019_4107_MOESM1_ESM.rar (136 kb)
Supplementary material 1 (RAR 135 kb)


  1. Abu Hammad A, El-Halees A (2015) An approach for detecting spam in Arabic opinion reviews. Int Arab J Inf Technol (IAJIT) 12(1):9–16Google Scholar
  2. Ahmed H, Traore I, Saad S (2017) Detection of online fake news using N-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, Cham, pp 127–138Google Scholar
  3. Algur SP, Biradar JG (2015a) Review spamicity based on rank and content of the review. In: 2015 international conference on applied and theoretical computing and communication technology (iCATccT). IEEE, pp 140–145Google Scholar
  4. Algur SP, Biradar JG (2015b) Rating consistency and review content based multiple stores review spam detection. In: 2015 international conference on information processing (ICIP). IEEE, pp 685–690Google Scholar
  5. Asghar MZ, Khan A, Ahmad S, Khan IA, Kundi FM (2015) A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS ONE 10(10):e0140204CrossRefGoogle Scholar
  6. Asghar MZ, Ahmad S, Qasim M, Zahra SR, Kundi FM (2016) SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5(1):1139CrossRefGoogle Scholar
  7. Asghar MZ, Khan A, Ahmad S, Qasim M, Khan IA (2017) Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2):e0171649CrossRefGoogle Scholar
  8. Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: twitter sentiment analysis framework using a hybrid classification scheme. Exp Syst 35(1):e12233CrossRefGoogle Scholar
  9. Bandakkanavar RV, Ramesh M, Geeta H (2014) A survey on detection of reviews using sentiment classification of methods. IJRITCC 2(2):310–314Google Scholar
  10. Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web Spam detection. ACM Trans Web TWEB 2(1):2Google Scholar
  11. Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media Inc, SebastopolzbMATHGoogle Scholar
  12. Chen YR, Chen HH (2015) Opinion spam detection in web forum: a real case study. In: Proceedings of the 24th international conference on world wide web. ACM, pp 173–183Google Scholar
  13. Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 373–380Google Scholar
  14. Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23CrossRefGoogle Scholar
  15. De Souza FB, De Magalhaes TR, Almeida VAF, De Almeida JM, Goncalves MA (2010) U.S. Patent application no. 12/967,923Google Scholar
  16. Elli MS, Wang YF (2015) Amazon reviews, business analytics with sentiment analysis.
  17. Fairbanks J, Fitch N, Knauf N, Briscoe E (2018) Credibility assessment in the news: do we need to read? MIS2’18, Feb 2018, Los Angeles, California USAGoogle Scholar
  18. Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. ICWSM 13:175–184Google Scholar
  19. Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of deceptive product reviews. ICWSM 12:98–105Google Scholar
  20. Gilbert E, Karahalios K (2010) Understanding deja reviewers. In: Proceedings of the 2010 ACM conference on computer supported cooperative work. ACM, pp 225–228Google Scholar
  21. Granik M, Mesyura V (2017) Fake news detection using naive Bayes classifier. In: 2017 IEEE first Ukraine conference on electrical and computer engineering (UKRCON). IEEE, pp 900–903Google Scholar
  22. Hosseinimotlagh S, Papalexakis EE (2018) Unsupervised content-based identification of fake news articles with tensor decomposition ensembles. MIS2, Marina Del Rey, CA, USAGoogle Scholar
  23. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 219–230Google Scholar
  24. Jindal N, Liu B, Lim EP (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, pp 1549–1552Google Scholar
  25. Kokate S, Tidke B (2015) Fake review and brand spam detection using J48 classifier. IJCSIT Int J Comput Sci Inf Technol 6(4):3523–3526Google Scholar
  26. Li J, Ott M, Cardie C, Hovy EH (2014) Towards a general rule for identifying deceptive opinion spam. In: ACL, vol 1, pp 1566–1576Google Scholar
  27. Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41CrossRefGoogle Scholar
  28. Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp 939–948Google Scholar
  29. Lloret E, Saggion H, Palomar M (2010) Experiments on summary-based opinion classification. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text. Association for Computational Linguistics, pp 107–115Google Scholar
  30. McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794Google Scholar
  31. Montes-y-Gomez M, Rosso P (2013) Using PU-learning to detect deceptive opinion spam. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 38–45Google Scholar
  32. Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 93–94Google Scholar
  33. Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on world wide web. ACM, pp 191–200Google Scholar
  34. Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013a) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 632–640Google Scholar
  35. Mukherjee A, Venkataraman V, Liu B, Glance NS (2013b) What yelp fake review filter might be doing? In: ICWSMGoogle Scholar
  36. Nair A, Phapale A, Yagnik V, Bathe K (2016) Opinion spam mining. Int Res J Eng Technol (IRJET) 3(4):1855–1859Google Scholar
  37. Noekhah S, Fouladfar E, Salim N, Ghorashi SH, Hozhabri AA (2014) A novel approach for opinion spam detection in e-commerce. In: Proceedings of the 8th IEEE international conference on E-commerce with focus on E-trustGoogle Scholar
  38. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 309–319Google Scholar
  39. Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: HLT-NAACL, pp 497–501Google Scholar
  40. Prajapati J, Bhatt M, Prajapati DJ (2012) Detection and summarization of genuine review using visual data mining. Int J Comput Appl 43(11):22–26Google Scholar
  41. Radulescu C, Dinsoreanu M, Potolea R (2014) Identification of spam comments using natural language processing techniques. In: 2014 IEEE international conference on intelligent computer communication and processing (ICCP). IEEE, pp 29–35Google Scholar
  42. Rajamohana SP, Umamaheswari K (2018) Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Comput Electr Eng 67:497–508CrossRefGoogle Scholar
  43. Rajamohana SP, Umamaheswari K, Karthiga R (2015) Sentiment classification based on latent Dirichlet allocation. Int J Comput Appl. ISSN 0975-8887Google Scholar
  44. Rajamohana SP, Umamaheshwari K, Karthiga R (2016) Sentiment analysis using shuffled frog leaping algorithm. Int J Adv Res Comput Sci Softw Eng 6(12)Google Scholar
  45. Raschka S (2018) About feature scaling and normalization. Last Accessed 03 Jan 2018
  46. Rout JK, Dalmia A, Choo KKR, Bakshi S, Jena SK (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5(1):1319–1327CrossRefGoogle Scholar
  47. Sharma K, Lin KI (2013) Review spam detector with rating consistency check. In: Proceedings of the 51st ACM Southeast conference. ACM, p 34Google Scholar
  48. Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th international conference on intelligent systems design and applications (ISDA). IEEE, pp 53–58Google Scholar
  49. Sun C, Du Q, Tian G (2016) Exploiting product related review features for fake review detection. Math Probl Eng 2016:4935792. Google Scholar
  50. Wang G, Xie S, Liu B, Philip SY (2011) Review graph based online store review spammer detection. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 1242–1247Google Scholar
  51. Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review spammers via social review graph. ACM Trans Intell Syst Technol (TIST) 3(4):61Google Scholar
  52. Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the first workshop on social media analytics. ACM, pp 10–13Google Scholar
  53. Wu J, Xu B, Li S (2011) An unsupervised approach to rank product reviews. In: 2011 eighth international conference on fuzzy systems and knowledge discovery (FSKD), vol 3. IEEE, pp 1769–1772Google Scholar
  54. Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 823–831Google Scholar
  55. Zhiyuli A, Liang X, Wang Y (2015) Discerning the trend: concealing deceptive reviews. In 2015 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 1833–1838Google Scholar
  56. Zubiaga A, Aker A, Bontcheva K, Liakata M, Procter R (2018) Detection and resolution of rumours in social media: a survey. ACM Comput Surve (CSUR) 51(2):32Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Computing and Information TechnologyGomal UniversityD.I.KhanPakistan
  2. 2.Faculty of Computing and Information Technology at Rabigh (FCITR)King Abdul Aziz University (KAU)JeddahKingdom of Saudi Arabia
  3. 3.Department of Computer ScienceUniversity of Science and TechnologyBannuPakistan

Personalised recommendations