Advertisement

Swarm intelligence based optimal feature selection for enhanced predictive sentiment accuracy on twitter

  • Akshi KumarEmail author
  • Arunima Jaiswal
Article
  • 19 Downloads

Abstract

A lot of uncertainty is generally associated with the micro-blog content, primarily due to the presence of noisy, heterogeneous, structured or unstructured data which may be high-dimensional, ambiguous, vague or imprecise. This makes feature engineering for predicting the sentiment arduous and challenging. Population-based meta-heuristics, especially the ones inspired by nature have been proposed in various pertinent studies for feature selection because of their probability to accept a less optimal solution and averting being stuck in local optimal solutions. This research demonstrates the use of two such swarm intelligence algorithms, namely, binary grey wolf and binary moth flame for feature optimization to enhance the sentiment classification performance accuracy. The study is conducted on tweets from two benchmark Twitter corpus (SemEval 2016 and SemEval 2017) and is initially analyzed using the conventional term frequency-inverse document frequency statistical weighting filter for feature extraction and subsequently using the swarm-based algorithms. The features are trained over five baseline classifiers namely, the Naïve Bayesian, support vector machines, k-nearest neighbor, multilayer perceptron and decision tree. The results validate that the population-based meta-heuristic algorithms for feature subset selection outperform the baseline supervised learning algorithms. For the binary grey wolf algorithm, an average improvement of 9.4% in accuracy is observed with an approximate 20.5% average reduction in features. Also, for the binary moth flame algorithm, an average accuracy improvement of 10.6% is observed with an approximate 40% average reduction in features. The highest accuracy of 76.5% is observed for support vector machine with binary grey wolf optimizer on SemEval 2016 benchmark dataset.

Keywords

Binary grey wolf Binary moth flame Swarm intelligence Meta-heuristic Sentiment Twitter 

Notes

References

  1. 1.
    Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: Classification, clustering and extraction techniques. Proc KDD Bigdas, 0–13Google Scholar
  2. 2.
    Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142(1):012012Google Scholar
  3. 3.
    Arias M, Arratia A, Xuriguera R (2013) Forecasting with twitter data. ACM Trans Intell Syst Technol 5(1):8.1–8.24Google Scholar
  4. 4.
    Basari ASH, Hussin B, Ananta IGP, Zeniarja J (2013) Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization. Proc Eng, Elsevier 53:453–462Google Scholar
  5. 5.
    Beheshti Z, Shamsuddin SMH (2013) A review of population-based meta-heuristic algorithms. Int J Adv Soft Comput Appl 5(1):1–35Google Scholar
  6. 6.
    Bhatia MPS, Kumar A (2008) A primer on the web information retrieval paradigm. J Theoret Appl Inform Technol 4 (7)Google Scholar
  7. 7.
  8. 8.
    Burnap P, Williams ML (2015) Cyber hate speech on twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7:223–242Google Scholar
  9. 9.
    Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Elect Eng 40(1):16–28Google Scholar
  10. 10.
    Dave K, Lawrence S, Pennock DM (2003) Mining the Peanut gallery: opinion extraction and semantic classification of product reviews. Proceedings of the 12th international conference on world wide web, Hungary: 19–528Google Scholar
  11. 11.
    Dhurve R, Seth M (2015) Weighted Sentiment Analysis Using Artificial Bee Colony Algorithm. International Journal of Science and Research (IJSR), ISSN (Online), 2319–7064Google Scholar
  12. 12.
    Dorigo M, Di Caro G (1999) Ant colony optimization: a new meta-heuristic. Proc Congress Evol Comput: 1470–1477Google Scholar
  13. 13.
    Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381Google Scholar
  14. 14.
    Faris H, Aljarah I, Al-Betar MA, Mirjalili S (2017) Grey wolf optimizer: a review of recent variants and applications. Neural Comput Applic: 1–23Google Scholar
  15. 15.
    Finn S, Mustafaraj E (2013) Learning to discover political activism in the twitterverse. KI-KünstlicheIntelligenz 27:17–24Google Scholar
  16. 16.
    Gupta DK, Reddy KS, Ekbal A (2015) Pso-asent: feature selection using particle swarm optimization for aspect based sentiment analysis. In international conference on applications of natural language to information systems. Springer, Cham, pp 220–233Google Scholar
  17. 17.
    Hassanien AE, Gaber T, Mokhtar U, Hefny H (2017) An improved moth flame optimization algorithm based on rough sets for tomato diseases detection. J Comput Electron Agric Arch ACM 136(C):86–96Google Scholar
  18. 18.
    Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879Google Scholar
  19. 19.
    Jong KAD (1975) Analysis of the behavior of a class of genetic adaptive systems [Ph.D. thesis], University of Michigan, Mich, USAGoogle Scholar
  20. 20.
    Kennedy J, Eberhart R (1995) Particle swarm optimization. Proc IEEE Int Conf Neural Netw (ICNN ‘95) 4:1942–1948, Perth, Western AustraliaGoogle Scholar
  21. 21.
    Kumar A, Abraham A (2017) Opinion mining to assist user acceptance testing for open-Beta versions. J Inform Assur Sec 12(4):146–153Google Scholar
  22. 22.
    Kumar A, Jaiswal A (2017) Empirical study of twitter and Tumblr for sentiment analysis using soft computing techniques. Proc World Congress Eng Comput Sci 1:1–5Google Scholar
  23. 23.
    Kumar A, Joshi A (2017) Ontology driven sentiment analysis on social web for government intelligence. Proceedings of the Special Collection on eGovernment Innovations in India, ACM: 134–139Google Scholar
  24. 24.
    Kumar A, Khorwal R (2017) Firefly algorithm for feature selection in sentiment analysis. Computational Intelligence in Data Mining, Springer: 693–703Google Scholar
  25. 25.
    Kumar A, Sebastian TM (2012) Machine learning assisted sentiment analysis. Proceedings of international conference on Computer Science & Engineering, ICCSE’2012, 123–130Google Scholar
  26. 26.
    Kumar A, Sebastian TM (2012) Sentiment analysis on twitter. IJCSI Int J Comput Sci 9(4):372–378Google Scholar
  27. 27.
    Kumar A, Sebastian TM (2012) Sentiment analysis: a perspective on its past, present and future. Int J Intell Syst Appl 4(10):1–14Google Scholar
  28. 28.
    Kumar A, Sharma A (2017) Systematic literature review on opinion Mining of big Data for government intelligence. Webology 14(2)Google Scholar
  29. 29.
    Kumar A, Sharma A (2018) Socio-Sentic framework for sustainable agricultural governance, Sustainable Computing: Informatics and Systems, 2018, ISSN 2210–5379,  https://doi.org/10.1016/j.suscom.2018.08.006 (http://www.sciencedirect.com/science/article/pii/S2210537918302336)
  30. 30.
    Kumar A, Dogra P, Dabas V (2015) Emotion analysis of twitter using opinion mining. In contemporary computing, 8th international conference on IC3, IEEE, 285–290Google Scholar
  31. 31.
    Kumar A, Khorwal R, Chaudhary S (2016) A survey on sentiment analysis using swarm intelligence. Indian J Sci Technol 9(39):1–7Google Scholar
  32. 32.
    Kumar A, Dabas V, Hooda P (2018) Text classification algorithms for mining unstructured data: a SWOT analysis. Int J Inform Technol Springer: 1–11Google Scholar
  33. 33.
    Kumar A, Jaiswal A, Garg S, Verma S, Kumar S (2019) Sentiment analysis using cuckoo search for optimized feature selection on Kaggle tweets. Int J Inform Retriev Res (IJIRR) 9(1):1–15Google Scholar
  34. 34.
    Lazar A, Reynolds RG (2003) Heuristic knowledge discovery for archaeological data using genetic algorithms and rough sets, artificial intelligence laboratory, Department of Computer Science, Wayne State UniversityGoogle Scholar
  35. 35.
    Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502Google Scholar
  36. 36.
    Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl-Based Syst 89:228–249Google Scholar
  37. 37.
    Mirjalili S (2015) How effective is the Grey wolf optimizer in training multi-layer perceptrons. Appl Intell 43(1):150–161Google Scholar
  38. 38.
    Mirjalili S, Mirjalili SM (2014) A. Lewis, Grey wolf optimizer. Adv Eng Softw 69:46–61Google Scholar
  39. 39.
    Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends® Inform Retriev 2(1–2):1–135Google Scholar
  40. 40.
    Rashedi E, Nezamabadi-pour H, Saryazdi S (2010) GSA: a gravitational search algorithm. Inf Sci 213:267–289zbMATHGoogle Scholar
  41. 41.
    Reddy S, Panwar L, Panigrahi BK, Kumar R (2017) A new binary moth-flame optimization algorithm (BMFOA) -development and application to solve unit commitment problem. In: Ying T (ed) Swarm intelligence: innovation, new algorithms and methods. Publisher: IET, UKGoogle Scholar
  42. 42.
    Shahana PH, Omman B (2015) Evaluation of features on sentimental analysis. Proc Comput Sci, Elsevier 46:1585–1592Google Scholar
  43. 43.
    Sinha NK, Gupta MM, Zadeh LA (2000) Soft computing and intelligent systems, Theory and applications. Academic Press, LondonGoogle Scholar
  44. 44.
    Sivanandam SN, Deepa SN (2007) Principles of soft computing, first edn. Wiley India, New YorkGoogle Scholar
  45. 45.
    Stylios G, Katsis CD, Christodoulakis D (2014) Using bio-inspired intelligence for web opinion mining. Int J Comput Appl 87(5)Google Scholar
  46. 46.
    Sulis E, Farías DIH, Rosso P, Patti V, Ruffo G (2016) Figurative messages and affect in twitter: differences between# irony, # sarcasm and # not. Knowl-Based Syst 108:132–143Google Scholar
  47. 47.
    Sumathi T, Karthik S, Marikkannan M (2014) Artificial bee colony optimization for feature selection in opinion mining. J Theoret Appl Inform Technol 66(1)Google Scholar
  48. 48.
    Tuarob S, Tucker CS, Salathe M, Ram N (2014) An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J Biomed Inform 49:255–268Google Scholar
  49. 49.
    Yao X, Liu Y, Lin G (1999) Evolutionary programming made faster. IEEE Trans Evol Comput 3(2):82–102Google Scholar
  50. 50.
    Zhang L, Shan L, Wang J (2017) Optimal feature selection using distance-based discrete firefly algorithm with mutual information criterion. Neural Comput & Applic 28(9):2795–2808Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringDelhi Technological UniversityDelhiIndia

Personalised recommendations