An Improved Feature Selection Based on Naive Bayes with Kernel Density Estimator for Opinion Mining


Opinion mining has gained much attention in the recent years due to the rapid growth of social media. It is a task of analyzing customer reviews to make decisions by classifying the reviews into positive or negative. These text reviews have high dimensions that lead to the curse of dimensionality. To handle this high dimension of text data, improved gain ratio is proposed to select the features with the highest ranking. Naїve Bayes classifier with kernel density function is used to evaluate the feature set. The Naїve Bayes classifier with Kernel density estimation is a nonparametric classifier that computes the probability density function based on the kernel estimator. This classifier produces higher accuracy in various benchmarking datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Meena, A.; Prabhakar, T.V.: Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. Eur Conf Inf Retr. 4425, 573–580 (2007)

    Google Scholar 

  2. 2.

    Khairnar, J.; Kinikar, M.: Machine learning algorithms for opinion mining and sentiment classification. Int. J. Sci. Res. Publ. 3, 1–6 (2013)

    Google Scholar 

  3. 3.

    Ravi, K.; Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. -Based Syst. 89, 14–46 (2015)

    Article  Google Scholar 

  4. 4.

    Moussa, M.E.; Mohamed, E.H.; Haggag, M.H.: A survey on opinion summarization techniques for social media. Future Comput. Informatics J. 3(1), 82–109 (2018)

    Article  Google Scholar 

  5. 5.

    Tang, B.; Kay, S.; He, H.; Baggenstoss, P.M.: EEF: exponentially embedded families with class-specific features for classification. IEEE Signal Process. Lett. 23(7), 969–973 (2016)

    Article  Google Scholar 

  6. 6.

    Tang, B.; Kay, S.; He, H.: Toward optimal feature selection in Naïve Bayes for text categorization. IEEE Trans. Knowl. Data Eng. 28(9), 2508–2521 (2016)

    Article  Google Scholar 

  7. 7.

    Zhu, D.; Lappas, T.; Zhang, J.: Unsupervised tip-mining from customer reviews. Decis. Support Syst. 107, 116–124 (2018)

    Article  Google Scholar 

  8. 8.

    Manochandar, S.; Punniyamoorthy, M.: Scaling feature selection method for enhancing the classification performance of support vector machines in text mining. J. Comput. Ind. Eng. 124, 139–156 (2018)

    Article  Google Scholar 

  9. 9.

    Ashok Kumar, J.; Abirami, S.: Aspect-based opinion ranking framework for product reviews using a Spearman’s rank correlation coefficient method. Inf. Sci. 460–461, 23–41 (2018)

    Google Scholar 

  10. 10.

    Yang, H.-L.; Lin, Q.-F.: Opinion mining for multiple types of emotion-embedded products/services through evolutionary strategy. Expert Syst. Appl. 99, 44–55 (2018)

    Article  Google Scholar 

  11. 11.

    Janardhana, D.R.; Manjunath, M.: Sentiment analysis and opinion mining using machine learning. Int. J. Innov. Res. Comput. Commun. Eng. 3, 9321–9329 (2015)

    Google Scholar 

  12. 12.

    Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, Now publishers. 2, 1–135 (2008)

  13. 13.

    Tommasel, A.; Godoy, D.: A Social-aware online short-text feature selection technique for social media. Inf. Fusion 40, 1–17 (2018)

    Article  Google Scholar 

  14. 14.

    Deshmukh, J.S.; Tripathy, A.K.: Entropy based classifier for cross-domain opinion mining. Appl. Comput Informatics 14, 55–64 (2018)

    Article  Google Scholar 

  15. 15.

    Khan, K.; Baharudin, B.; Khan, A.; Ullah, A.: Mining opinion components from unstructured reviews: A review. J. King Saud Univ. Comput. Inf. Sci. 26, 258–275 (2014)

    Google Scholar 

  16. 16.

    Murakami, Y.; Mizuguchi, K.: Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bio informatics 26, 1841–1848 (2010)

    Google Scholar 

  17. 17.

    Fersini, E.; Messina, E.; Pozzi, F.A.: Sentiment analysis: Bayesian ensemble learning. Decis. Support Syst. 68, 26–38 (2014)

    Article  Google Scholar 

  18. 18.

    Wu, C.; Wu, F.; Wu, S.; Yuan, Z.; Huang, Y.: A hybrid unsupervised method for aspect term and opinion target extraction. Knowl.-Based Syst. 148, 66–73 (2018)

    Article  Google Scholar 

  19. 19.

    Liu, B.: Synthesis lectures on human language technologies sentiment analysis and opinion mining. Morgan & Claypool Publishers, San Rafael (2012)

    Google Scholar 

  20. 20.

    Taboada, M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M.: Lexicon—based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2011)

    Article  Google Scholar 

  21. 21.

    Silverman, B.W.: Density Estimation for Statistics and Data Analysis -Monographs on Statistics and Applied Probability. Chapman and Hall Publishers, London (1986)

    Google Scholar 

  22. 22.

    Wang, F.; Xu, T.; Tang, T.; Wang, H.: Bilevel feature extraction-based text mining for fault diagnosis of railway systems. IEEE Trans. Intell. Transp. Syst. 18(1), 49–58 (2017)

    Article  Google Scholar 

  23. 23.

    Jin, W., Ho, H.H.: A novel lexicalized HMM-based learning framework for web opinion mining. In: Proceedings of the 26th International Conference on Machine Learning, pp. 465–472 (2009)

  24. 24.

    Kim, S.G.; Kang, J.: Analyzing the discriminative attributes of products using text mining focused on cosmetic reviews. Inf. Process. Manag. 54(6), 938–957 (2018)

    Article  Google Scholar 

  25. 25.

    Hamedmoghadam, H.; Jalili, M.; X Y, : An opinion formation based binary optimization approach for feature selection. Phys. A Stat. Mech. Appl. 491, 142–152 (2018)

    MathSciNet  Article  Google Scholar 

  26. 26.

    Martarelli, N.J.; Nagano, M.S.: A constructive evolutionary approach for feature selection in unsupervised learning. Swarm Evol Comput 42, 125–137 (2018)

    Article  Google Scholar 

  27. 27.

    Rajamohana, S.P.; Umamaheswari, K.: Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Comput. Electr. Eng. 67, 497–508 (2018)

    Article  Google Scholar 

  28. 28.

    Heyong, W.; Ming, H.: Supervised Hebb rule based feature selection for text classification. J. Inf. Process. Manag. 56, 167–191 (2018)

    Google Scholar 

  29. 29.

    Bhattacharya, A.; Dunson, D.: Nonparametric Bayes classification and hypothesis testing on manifolds. J. Multivar. Anal. 111, 1–19 (2012)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Tang, B.; He, H.; Baggenstoss, P.M.: A Bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 28(6), 1602–1606 (2016)

    Article  Google Scholar 

  31. 31.

    Yang, L.; Liguo, H.; Xuesen, C.: A kernel density estimation based text classification algorithm. Adv. Sci. Technol. Lett. SERSC Sci. Eng. Res. Support Soc. 78, 49–54 (2014)

    Google Scholar 

  32. 32.

    Xiao, M., Guo, Y.: Semi-supervised kernel matching for domain adaptation. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 1183–1189 (2012)

  33. 33.

    Uysal, A.K.: An improved global feature selection Scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)

    Article  Google Scholar 

  34. 34.

    Perez, A.; Larranaga, P.; Inza, I.: Bayesian classifiers based on kernel density estimation: Flexible classifiers. Int. J. Approx. Reason. 50, 341–362 (2009)

    Article  Google Scholar 

  35. 35.

    Hulden, M., Silfverberg, M., Francom, J.: Kernel density estimation for text-based geolocation. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 145–150 (2015)

  36. 36.

    Holmes, M.P., Gray, A.G., Isbell, C.L.: Fast nonparametric conditional density estimation. In: Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, 175–182 (2012)

  37. 37.

    Sharma, D.; Jain, S.: Evaluation of stemming and stop word techniques on text classification problem. Int. J. Sci. Res. Comput. Sci. Eng. 3, 1–4 (2015)

    Google Scholar 

  38. 38.

    Bilal, M.; Israr, H.; Shahid, M.; Khan, A.: Sentiment classification of Roman-Urdu opinions using Naı¨ve Bayesian decision tree and KNN classification techniques. J. King Saud Univ. Comput. Inf. Scie. 28, 330–344 (2016)

    Google Scholar 

  39. 39.

    Kang, M.; Ahn, J.; Lee, K.: Opinion mining using ensemble text hidden Markov models for text classification. Expert Syst. Appl. 94, 218–227 (2018)

    Article  Google Scholar 

  40. 40.

    Jiang, L.; Zhang, L.; Yu, L.; Wang, D.: Class-specific attribute weighted Naive Bayes. Pattern Recognit. 88, 321–330 (2019)

    Article  Google Scholar 

  41. 41.

    Jiang, L.; Li, C.; Wang, S.; Zhang, L.: Deep feature weighting for Naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39 (2016)

    Article  Google Scholar 

  42. 42.

    Jiang, L.; Zhang, L.; Li, C.; Wu, J.: A correlation-based feature weighting filter for Naive Bayes. IEEE Trans. Knowl. Data Eng. 31(2), 201–213 (2019)

    Article  Google Scholar 

  43. 43.

    Chen, S.; Webb, G.I.; Liu, L.; Ma, X.: A novel selective Naive Bayes algorithm. Knowl. -Based Syst. 192, 105361 (2020)

    Article  Google Scholar 

  44. 44.

    Cao, P.; Liu, X.; Zhang, J.; Zhao, D.; Huang, M.; Zaiane, O.: ℓ2,1 norm regularized multi-kernel based joint nonlinear feature selection and over-sampling for imbalanced data classification. Neurocomputing 234, 38–57 (2017)

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Raja Rajeswari Sethuraman.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sethuraman, R.R., Athisayam, J.S.K. An Improved Feature Selection Based on Naive Bayes with Kernel Density Estimator for Opinion Mining. Arab J Sci Eng (2021).

Download citation


  • Opinion mining
  • Feature selection
  • Filter method
  • Naïve Bayes
  • Kernel density estimation