Abstract
When the datasets about online public opinion are imbalanced, the classifier is prone to sacrifice the accuracy of minority class to achieve the overall best performance. In order to solve this problem, an online public opinion text multi-classification algorithm based on random forest and cost-sensitive is proposed in this essay. The algorithm uses Naïve Bayes to construct cost matrix, chooses Gini index with misclassification cost to select the decision tree node. After the comparative experiment, the classifier has improved performance by 3% overall and 8% on minority classes, which can solve the problem of data imbalance to some extent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Mining Knowl. Discov. 28(1), 92–122 (2014)
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2010)
Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16–35 (2013)
Fithian, W., Hastie, T.: Local case-control sampling: efficient subsampling in imbalanced data sets. PMC 42(5), 1693–1724 (2014)
Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines. Inf. Sci. 286(1), 228–246 (2014)
Fernández, A., López, V., Galar, M., Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)
George, N.I., Lu, T.P., Chang, C.W.: Cost-sensitive performance metric for comparing multiple ordinal classifiers. Artif. Intell. Res. 5(1), 135–143 (2016)
Kulkarni, V.Y., Sinha, P.K.: Random forest classifiers: a survey and future research directions. Int. J. Adv. Comput. 36(1), 1144–1153 (2013)
Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111 (2015)
Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.-S.: Fores texter: an efficient random forest algorithm for imbalanced text categorization. Knowl.-Based Syst. 67, 105–116 (2014)
Kim, A., Oh, K., Jung, J.-Y.: Imbalanced classification of manufacturing quality conditions using cost-sensitive decision tree ensembles. Comput. Integr. Manuf. 31, 701–717 (2017)
Raeder, T., Forman, G., Chawla, N.V.: Learning from imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol. 23, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12
Fang, S., et al.: Feature selection method based on class discriminative degree for intelligent medical diagnosis. CMC: Comput. Mater. Continua 55(3), 419–433 (2018)
Xi, X., Sheng, V.S., Sun, B., Wang, L., Hu, F.: An empirical comparison on multi-target regression learning. CMC: Comput. Mater. Continua 56(2), 185–198 (2018)
Acknowledgements
This work is supported by the National Key Research and Development Plan (Grant No. 2017YFC0820603), BUPT’s Graduate education reform project (2018Y003) and the Project of Chinese Society of Academic degrees and graduate education (2017Y0502).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Dang, X., Wu, X., Xie, X., Zhang, T. (2019). An Improved Multi-classification Algorithm for Imbalanced Online Public Opinion Data. In: Sun, X., Pan, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2019. Lecture Notes in Computer Science(), vol 11635. Springer, Cham. https://doi.org/10.1007/978-3-030-24268-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-24268-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24267-1
Online ISBN: 978-3-030-24268-8
eBook Packages: Computer ScienceComputer Science (R0)