Advertisement

Evolutionary Intelligence

, Volume 12, Issue 2, pp 147–164 | Cite as

Spam review detection using spiral cuckoo search clustering method

  • Avinash Chandra PandeyEmail author
  • Dharmveer Singh Rajpoot
Research Paper
  • 156 Downloads

Abstract

Nowadays, online reviews play an important role in customer’s decision. Starting from buying a shirt from an e-commerce site to dining in a restaurant, online reviews has become a basis of selection. However, peoples are always in a hustle and bustle since they don’t have time to pay attention to the intrinsic details of products and services, thus the dependency on online reviews have been hiked. Due to reliance on online reviews, some people and organizations pompously generate spam reviews in order to promote or demote the reputation of a person/product/organization. Thus, it is impossible to identify whether a review is a spam or a ham by the naked eye and it is also impractical to classify all the reviews manually. Therefore, a spiral cuckoo search based clustering method has been introduced to discover spam reviews. The proposed method uses the strength of cuckoo search and Fermat spiral to resolve the convergence issue of cuckoo search method. The efficiency of the proposed method has been tested on four spam datasets and one Twitter spammer dataset. To validate the efficacy of proposed clustering method it is compared with six metaheuristics clustering methods namely; particle swarm optimization, differential evolution, genetic algorithm, cuckoo search, K-means, and improved cuckoo search. The experimental results and statistical analysis validate that the proposed method outruns the existing methods.

Keywords

Data clustering Cuckoo search Metaheuristic method Spam detection Fermat spiral 

Notes

References

  1. 1.
    Lackermair G, Kailer D, Kanmaz K (2013) Importance of online product reviews from a consumer’s perspective. Adv Econ Bus 1:1–5Google Scholar
  2. 2.
    Dixit S, Agrawal A (2013) Survey on review spam detection. Int J Comput Commun Technol ISSN 4:0975–7449Google Scholar
  3. 3.
    Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: Intelligent systems design and applications (ISDA), 2013 13th international conference on, IEEE, pp 53–58Google Scholar
  4. 4.
    Rosso P, Cagnina LC (2017) Deception detection and opinion spam. In: A practical guide to sentiment analysis, Springer, New York, pp 155–171Google Scholar
  5. 5.
    Heredia B, Khoshgoftaar TM, Prusa JD, Crawford M (2017) Improving detection of untrustworthy online reviews using ensemble learners combined with feature selection. Soc Netw Anal Min 7(1):37CrossRefGoogle Scholar
  6. 6.
    Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Vol 1, association for computational linguistics, pp 309–319Google Scholar
  7. 7.
    Jindal N, Liu B, Lim E-P (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM international conference on information and knowledge management, ACM, pp 1549–1552Google Scholar
  8. 8.
    Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI proceedings of international joint conference on artificial intelligence, vol 22, p 2488Google Scholar
  9. 9.
    Cheng L-C, Tseng JC, Chung T-Y (2017) Case study of fake web reviews. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, ACM, pp 706–709Google Scholar
  10. 10.
    Munzel A (2016) Assisting consumers in detecting fake reviews: the role of identity information disclosure and consensus. J Retail Consumer Serv 32:96–108CrossRefGoogle Scholar
  11. 11.
    Narayan R, Rout JK, Jena SK (2018) Review spam detection using opinion mining. In: Progress in intelligent computing techniques: theory, practice, and applications, Springer, New York, pp 273–279Google Scholar
  12. 12.
    Petrescu M, O’Leary K, Goldring D, Mrad SB (2018) Incentivized reviews: promising the moon for a few stars. J Retail Consumer ServGoogle Scholar
  13. 13.
    Luca M, Zervas G (2016) Fake it till you make it: reputation, competition, and yelp review fraud. Manag Sci 62(12):3412–3427CrossRefGoogle Scholar
  14. 14.
    Gieseke F, Kramer O, Airola A, Pahikkala T (2012) Efficient recurrent local search strategies for semi-and unsupervised regularized least-squares classification. Evolut Intell 5(3):189–205CrossRefGoogle Scholar
  15. 15.
    Behdad M, Barone L, French T, Bennamoun M (2012) On XCSR for electronic fraud detection. Evolut Intell 5(2):139–150CrossRefGoogle Scholar
  16. 16.
    Mani S, Kumari S, Jain A, Kumar P (2018) Spam review detection using ensemble machine learning. In: International conference on machine learning and data mining in pattern recognition, Springer, New York, pp 198–209Google Scholar
  17. 17.
    Ghai R, Kumar S, Pandey AC (2019) Spam detection using rating and review processing method, smart innovations in communication and computational sciences. Springer, Singapore, pp 189–198Google Scholar
  18. 18.
    Heydari A, Tavakoli M, Salim N (2016) Detection of fake opinions using time series. Expert Syst Appl 58:83–92CrossRefGoogle Scholar
  19. 19.
    Liu Y, Pang B (2018) A unified framework for detecting author spamicity by modeling review deviation. Exp Syst Appl 112:148–155CrossRefGoogle Scholar
  20. 20.
    Li C, Liu S (2018) A comparative study of the class imbalance problem in twitter spam detection. Concurr Comput Pract Exp 30(5):e4281CrossRefGoogle Scholar
  21. 21.
    Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews-A text summarization approach. Inf Process Manag 53(2):436–449CrossRefGoogle Scholar
  22. 22.
    Hai Z, Zhao P, Cheng P, Yang P, Li X-L, Li G (2016) Deceptive review spam detection via exploiting task relatedness and unlabeled data. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1817–1826Google Scholar
  23. 23.
    Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for twitter. In: Applied sciences and technology (IBCAST), 2017 14th international Bhurban conference on, IEEE, pp 466–471Google Scholar
  24. 24.
    Vishwarupe V, Bedekar M, Pande M, Hiwale A (2018) Intelligent twitter spam detection: a hybrid approach. In: Smart trends in systems, security and sustainability, Springer, New York, pp 189–197Google Scholar
  25. 25.
    Sedhai S, Sun A (2018) Semi-supervised spam detection in twitter stream. arXiv:1702.01032
  26. 26.
    Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017) Statistical features-based real-time detection of drifted twitter spam. IEEE Trans Inf Forensics Secur 12(4):914–925CrossRefGoogle Scholar
  27. 27.
    Wu T, Wen S, Xiang Y, Zhou W (2018) Twitter spam detection: survey of new approaches and comparative study. Comput Secur 76:265–284CrossRefGoogle Scholar
  28. 28.
    Singh S, Singh AK (2018) Web-spam features selection using cfs-pso. Proc Comput Sci 125:568–575CrossRefGoogle Scholar
  29. 29.
    Li Y, Nie X, Huang R (2018) Web spam classification method based on deep belief networks. Expert Syst Appl 96:261–270CrossRefGoogle Scholar
  30. 30.
    Singh A, Batra S (2018) Ensemble based spam detection in social iot using probabilistic data structures. Fut Gen Comput Syst 81:359–371CrossRefGoogle Scholar
  31. 31.
    Wei Y, Singh L (2018) Detecting users who share extremist content on twitter. In: Surveillance in Action, Springer, New York, pp 351–368Google Scholar
  32. 32.
    Bindu P, Mishra R, Thilagam PS (2018) Discovering spammer communities in twitter. J Intell Inf Syst, pp 1–25Google Scholar
  33. 33.
    Liu S, Zhang J, Xiang Y (2016) Statistical detection of online drifting twitter spam. In: Proceedings of the 11th ACM on Asia conference on computer and communications security, ACM, pp 1–10Google Scholar
  34. 34.
    Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on Twitter. Neurocomputing 315:496–511CrossRefGoogle Scholar
  35. 35.
    Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73CrossRefGoogle Scholar
  36. 36.
    Singh M, Kumar L, Sinha S (2018) Model for detecting fake or spam reviews. In: ICT based innovations, Springer, New York, pp 213–217Google Scholar
  37. 37.
    Narayan R, Rout JK, Jena SK (2018) Review spam detection using semi-supervised technique. In: Progress in intelligent computing techniques: theory, practice, and applications, Springer, New York, pp 281–286Google Scholar
  38. 38.
    Salehi S, Selamat A, Bostanian M (2011) Enhanced genetic algorithm for spam detection in email. In: Software engineering and service science (ICSESS), 2011 IEEE 2nd international conference on, IEEE, pp 594–597Google Scholar
  39. 39.
    Idris I, Selamat A, Omatu S (2014) Hybrid email spam detection model with negative selection algorithm and differential evolution. Eng Appl Artif Intell 28:97–110CrossRefGoogle Scholar
  40. 40.
    Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341–359MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Kennedy J, Eberhart R (1995) Particle swarm optimization. Neural Netw 4:1942–1948Google Scholar
  42. 42.
    Idris I, Selamat A, Nguyen NT, Omatu S, Krejcar O, Kuca K, Penhaker M (2015) A combined negative selection algorithm-particle swarm optimization for an email spam detection system. Eng Appl Artif Intell 39:33–44CrossRefGoogle Scholar
  43. 43.
    Pereira FB, Marques JMC (2009) A study on diversity for cluster geometry optimization. Evolut Intell 2(3):121CrossRefGoogle Scholar
  44. 44.
    Simon D (2008) Biogeography-based optimization. IEEE Trans Evolut Comput 12(6):702–713CrossRefGoogle Scholar
  45. 45.
    Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33:1455–1465CrossRefGoogle Scholar
  46. 46.
    Žalik KR (2008) An efficient k’-means clustering algorithm. Pattern Recognit Lett 29:1385–1391CrossRefGoogle Scholar
  47. 47.
    Yang X-S, Deb S (2009) Cuckoo search via lévy flights. In: World congress on nature and biologically inspired computing, IEEE, pp 210–214Google Scholar
  48. 48.
    Pandey AC, Rajpoot DS, Saraswat M (2016) Data clustering using hybrid improved cuckoo search method. In: Contemporary Computing (IC3), 2016 9th international conference on, IEEE, pp 1–6Google Scholar
  49. 49.
    Pandey AC, Rajpoot DS, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manag 53(4):764–779CrossRefGoogle Scholar
  50. 50.
    Pandey AC, Rajpoot DS, Saraswat M (2017) Hybrid step size based cuckoo search. In: Contemporary computing (IC3), 2017 10th international conference on, IEEE, pp 1-6Google Scholar
  51. 51.
    Pavlyukevich I (2007) Lévy flights, non-local search and simulated annealing. J Comput Phys 226(2):1830–1844MathSciNetCrossRefzbMATHGoogle Scholar
  52. 52.
    Payne RB, Sorensen MD (2005) The cuckoos, vol 15. Oxford University Press, OxfordGoogle Scholar
  53. 53.
    Kulhari A, Pandey A, Pal R, Mittal H (2016) Unsupervised data classification using modified cuckoo search method. In: Contemporary computing (IC3), 2016 9th international conference on, IEEE, pp 1–5Google Scholar
  54. 54.
    Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc., NewtonzbMATHGoogle Scholar
  55. 55.
    Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of liwc2015, Tech. repGoogle Scholar
  56. 56.
    Tran CT, Zhang M, Andreae P, Xue B (2016) Improving performance for classification with incomplete data using wrapper-based feature selection. Evolut Intell 9(3):81–94CrossRefGoogle Scholar
  57. 57.
    Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312CrossRefGoogle Scholar
  58. 58.
    Roessler EB, Alder HL (1977) Introduction to probability and statistics. WH FreemanGoogle Scholar
  59. 59.
    Saraswat M, Arya K, Sharma H (2013) Leukocyte segmentation in tissue images using differential evolution algorithm. Swarm Evolut Comput 11:46–54CrossRefGoogle Scholar
  60. 60.
    Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184MathSciNetCrossRefGoogle Scholar
  61. 61.
    Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM Sigkdd Explor Newsl 6(1):1–6CrossRefGoogle Scholar
  62. 62.
    Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 783–792Google Scholar
  63. 63.
    Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1088–1096Google Scholar
  64. 64.
    Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing? In: ICWSM, pp 409–418Google Scholar
  65. 65.
    Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: classification and analysis of real and pseudo reviews. Technical Report UIC-CS-2013–03, University of Illinois at Chicago, Tech. RepGoogle Scholar
  66. 66.
    Pandey AC, Pal R, Kulhari A (2018) Unsupervised data classification using improved biogeography based optimization. Int J Syst Assur Eng Manag 9(4):821–829CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Jaypee Institute of Information TechnologyNoidaIndia

Personalised recommendations