Abstract
Opinion mining has gained much attention in the recent years due to the rapid growth of social media. It is a task of analyzing customer reviews to make decisions by classifying the reviews into positive or negative. These text reviews have high dimensions that lead to the curse of dimensionality. To handle this high dimension of text data, improved gain ratio is proposed to select the features with the highest ranking. Naїve Bayes classifier with kernel density function is used to evaluate the feature set. The Naїve Bayes classifier with Kernel density estimation is a nonparametric classifier that computes the probability density function based on the kernel estimator. This classifier produces higher accuracy in various benchmarking datasets.
Similar content being viewed by others
References
Meena, A.; Prabhakar, T.V.: Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. Eur Conf Inf Retr. 4425, 573–580 (2007)
Khairnar, J.; Kinikar, M.: Machine learning algorithms for opinion mining and sentiment classification. Int. J. Sci. Res. Publ. 3, 1–6 (2013)
Ravi, K.; Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. -Based Syst. 89, 14–46 (2015)
Moussa, M.E.; Mohamed, E.H.; Haggag, M.H.: A survey on opinion summarization techniques for social media. Future Comput. Informatics J. 3(1), 82–109 (2018)
Tang, B.; Kay, S.; He, H.; Baggenstoss, P.M.: EEF: exponentially embedded families with class-specific features for classification. IEEE Signal Process. Lett. 23(7), 969–973 (2016)
Tang, B.; Kay, S.; He, H.: Toward optimal feature selection in Naïve Bayes for text categorization. IEEE Trans. Knowl. Data Eng. 28(9), 2508–2521 (2016)
Zhu, D.; Lappas, T.; Zhang, J.: Unsupervised tip-mining from customer reviews. Decis. Support Syst. 107, 116–124 (2018)
Manochandar, S.; Punniyamoorthy, M.: Scaling feature selection method for enhancing the classification performance of support vector machines in text mining. J. Comput. Ind. Eng. 124, 139–156 (2018)
Ashok Kumar, J.; Abirami, S.: Aspect-based opinion ranking framework for product reviews using a Spearman’s rank correlation coefficient method. Inf. Sci. 460–461, 23–41 (2018)
Yang, H.-L.; Lin, Q.-F.: Opinion mining for multiple types of emotion-embedded products/services through evolutionary strategy. Expert Syst. Appl. 99, 44–55 (2018)
Janardhana, D.R.; Manjunath, M.: Sentiment analysis and opinion mining using machine learning. Int. J. Innov. Res. Comput. Commun. Eng. 3, 9321–9329 (2015)
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, Now publishers. 2, 1–135 (2008)
Tommasel, A.; Godoy, D.: A Social-aware online short-text feature selection technique for social media. Inf. Fusion 40, 1–17 (2018)
Deshmukh, J.S.; Tripathy, A.K.: Entropy based classifier for cross-domain opinion mining. Appl. Comput Informatics 14, 55–64 (2018)
Khan, K.; Baharudin, B.; Khan, A.; Ullah, A.: Mining opinion components from unstructured reviews: A review. J. King Saud Univ. Comput. Inf. Sci. 26, 258–275 (2014)
Murakami, Y.; Mizuguchi, K.: Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bio informatics 26, 1841–1848 (2010)
Fersini, E.; Messina, E.; Pozzi, F.A.: Sentiment analysis: Bayesian ensemble learning. Decis. Support Syst. 68, 26–38 (2014)
Wu, C.; Wu, F.; Wu, S.; Yuan, Z.; Huang, Y.: A hybrid unsupervised method for aspect term and opinion target extraction. Knowl.-Based Syst. 148, 66–73 (2018)
Liu, B.: Synthesis lectures on human language technologies sentiment analysis and opinion mining. Morgan & Claypool Publishers, San Rafael (2012)
Taboada, M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M.: Lexicon—based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2011)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis -Monographs on Statistics and Applied Probability. Chapman and Hall Publishers, London (1986)
Wang, F.; Xu, T.; Tang, T.; Wang, H.: Bilevel feature extraction-based text mining for fault diagnosis of railway systems. IEEE Trans. Intell. Transp. Syst. 18(1), 49–58 (2017)
Jin, W., Ho, H.H.: A novel lexicalized HMM-based learning framework for web opinion mining. In: Proceedings of the 26th International Conference on Machine Learning, pp. 465–472 (2009)
Kim, S.G.; Kang, J.: Analyzing the discriminative attributes of products using text mining focused on cosmetic reviews. Inf. Process. Manag. 54(6), 938–957 (2018)
Hamedmoghadam, H.; Jalili, M.; X Y, : An opinion formation based binary optimization approach for feature selection. Phys. A Stat. Mech. Appl. 491, 142–152 (2018)
Martarelli, N.J.; Nagano, M.S.: A constructive evolutionary approach for feature selection in unsupervised learning. Swarm Evol Comput 42, 125–137 (2018)
Rajamohana, S.P.; Umamaheswari, K.: Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Comput. Electr. Eng. 67, 497–508 (2018)
Heyong, W.; Ming, H.: Supervised Hebb rule based feature selection for text classification. J. Inf. Process. Manag. 56, 167–191 (2018)
Bhattacharya, A.; Dunson, D.: Nonparametric Bayes classification and hypothesis testing on manifolds. J. Multivar. Anal. 111, 1–19 (2012)
Tang, B.; He, H.; Baggenstoss, P.M.: A Bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 28(6), 1602–1606 (2016)
Yang, L.; Liguo, H.; Xuesen, C.: A kernel density estimation based text classification algorithm. Adv. Sci. Technol. Lett. SERSC Sci. Eng. Res. Support Soc. 78, 49–54 (2014)
Xiao, M., Guo, Y.: Semi-supervised kernel matching for domain adaptation. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 1183–1189 (2012)
Uysal, A.K.: An improved global feature selection Scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)
Perez, A.; Larranaga, P.; Inza, I.: Bayesian classifiers based on kernel density estimation: Flexible classifiers. Int. J. Approx. Reason. 50, 341–362 (2009)
Hulden, M., Silfverberg, M., Francom, J.: Kernel density estimation for text-based geolocation. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 145–150 (2015)
Holmes, M.P., Gray, A.G., Isbell, C.L.: Fast nonparametric conditional density estimation. In: Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, 175–182 (2012)
Sharma, D.; Jain, S.: Evaluation of stemming and stop word techniques on text classification problem. Int. J. Sci. Res. Comput. Sci. Eng. 3, 1–4 (2015)
Bilal, M.; Israr, H.; Shahid, M.; Khan, A.: Sentiment classification of Roman-Urdu opinions using Naı¨ve Bayesian decision tree and KNN classification techniques. J. King Saud Univ. Comput. Inf. Scie. 28, 330–344 (2016)
Kang, M.; Ahn, J.; Lee, K.: Opinion mining using ensemble text hidden Markov models for text classification. Expert Syst. Appl. 94, 218–227 (2018)
Jiang, L.; Zhang, L.; Yu, L.; Wang, D.: Class-specific attribute weighted Naive Bayes. Pattern Recognit. 88, 321–330 (2019)
Jiang, L.; Li, C.; Wang, S.; Zhang, L.: Deep feature weighting for Naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39 (2016)
Jiang, L.; Zhang, L.; Li, C.; Wu, J.: A correlation-based feature weighting filter for Naive Bayes. IEEE Trans. Knowl. Data Eng. 31(2), 201–213 (2019)
Chen, S.; Webb, G.I.; Liu, L.; Ma, X.: A novel selective Naive Bayes algorithm. Knowl. -Based Syst. 192, 105361 (2020)
Cao, P.; Liu, X.; Zhang, J.; Zhao, D.; Huang, M.; Zaiane, O.: ℓ2,1 norm regularized multi-kernel based joint nonlinear feature selection and over-sampling for imbalanced data classification. Neurocomputing 234, 38–57 (2017)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sethuraman, R.R., Athisayam, J.S.K. An Improved Feature Selection Based on Naive Bayes with Kernel Density Estimator for Opinion Mining. Arab J Sci Eng 46, 4059–4071 (2021). https://doi.org/10.1007/s13369-021-05381-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-021-05381-5