An Improved Multi-classification Algorithm for Imbalanced Online Public Opinion Data

Dang, Xige; Wu, Xu; Xie, Xiaqing; Zhang, Tianle

doi:10.1007/978-3-030-24268-8_6

Xige Dang^17,18,
Xu Wu^17,18,19,
Xiaqing Xie^17,18 &
…
Tianle Zhang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11635))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

2178 Accesses

Abstract

When the datasets about online public opinion are imbalanced, the classifier is prone to sacrifice the accuracy of minority class to achieve the overall best performance. In order to solve this problem, an online public opinion text multi-classification algorithm based on random forest and cost-sensitive is proposed in this essay. The algorithm uses Naïve Bayes to construct cost matrix, chooses Gini index with misclassification cost to select the decision tree node. After the comparative experiment, the classifier has improved performance by 3% overall and 8% on minority classes, which can solve the problem of data imbalance to some extent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Mining Knowl. Discov. 28(1), 92–122 (2014)
Article MathSciNet MATH Google Scholar
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2010)
Article MathSciNet Google Scholar
Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16–35 (2013)
Article MATH Google Scholar
Fithian, W., Hastie, T.: Local case-control sampling: efficient subsampling in imbalanced data sets. PMC 42(5), 1693–1724 (2014)
MathSciNet MATH Google Scholar
Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines. Inf. Sci. 286(1), 228–246 (2014)
Article Google Scholar
Fernández, A., López, V., Galar, M., Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)
Article Google Scholar
George, N.I., Lu, T.P., Chang, C.W.: Cost-sensitive performance metric for comparing multiple ordinal classifiers. Artif. Intell. Res. 5(1), 135–143 (2016)
Article Google Scholar
Kulkarni, V.Y., Sinha, P.K.: Random forest classifiers: a survey and future research directions. Int. J. Adv. Comput. 36(1), 1144–1153 (2013)
Google Scholar
Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111 (2015)
Article Google Scholar
Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.-S.: Fores texter: an efficient random forest algorithm for imbalanced text categorization. Knowl.-Based Syst. 67, 105–116 (2014)
Article Google Scholar
Kim, A., Oh, K., Jung, J.-Y.: Imbalanced classification of manufacturing quality conditions using cost-sensitive decision tree ensembles. Comput. Integr. Manuf. 31, 701–717 (2017)
Article Google Scholar
Raeder, T., Forman, G., Chawla, N.V.: Learning from imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol. 23, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12
Google Scholar
Fang, S., et al.: Feature selection method based on class discriminative degree for intelligent medical diagnosis. CMC: Comput. Mater. Continua 55(3), 419–433 (2018)
Google Scholar
Xi, X., Sheng, V.S., Sun, B., Wang, L., Hu, F.: An empirical comparison on multi-target regression learning. CMC: Comput. Mater. Continua 56(2), 185–198 (2018)
Google Scholar

Download references

Acknowledgements

This work is supported by the National Key Research and Development Plan (Grant No. 2017YFC0820603), BUPT’s Graduate education reform project (2018Y003) and the Project of Chinese Society of Academic degrees and graduate education (2017Y0502).

Author information

Authors and Affiliations

Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing, China
Xige Dang, Xu Wu & Xiaqing Xie
School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China
Xige Dang, Xu Wu & Xiaqing Xie
Beijing University of Posts and Telecommunications Library, Beijing, China
Xu Wu
Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Tianle Zhang

Authors

Xige Dang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaqing Xie
View author publications
You can also search for this author in PubMed Google Scholar
Tianle Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Wu .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Zhaoqing Pan
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dang, X., Wu, X., Xie, X., Zhang, T. (2019). An Improved Multi-classification Algorithm for Imbalanced Online Public Opinion Data. In: Sun, X., Pan, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2019. Lecture Notes in Computer Science(), vol 11635. Springer, Cham. https://doi.org/10.1007/978-3-030-24268-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-24268-8_6
Published: 11 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24267-1
Online ISBN: 978-3-030-24268-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics