Skip to main content

An Improved Multi-classification Algorithm for Imbalanced Online Public Opinion Data

  • Conference paper
  • First Online:
Artificial Intelligence and Security (ICAIS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11635))

Included in the following conference series:

  • 2178 Accesses

Abstract

When the datasets about online public opinion are imbalanced, the classifier is prone to sacrifice the accuracy of minority class to achieve the overall best performance. In order to solve this problem, an online public opinion text multi-classification algorithm based on random forest and cost-sensitive is proposed in this essay. The algorithm uses Naïve Bayes to construct cost matrix, chooses Gini index with misclassification cost to select the decision tree node. After the comparative experiment, the classifier has improved performance by 3% overall and 8% on minority classes, which can solve the problem of data imbalance to some extent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Mining Knowl. Discov. 28(1), 92–122 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  2. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2010)

    Article  MathSciNet  Google Scholar 

  3. Lomax, S., Vadera, S.: A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 45(2), 16–35 (2013)

    Article  MATH  Google Scholar 

  4. Fithian, W., Hastie, T.: Local case-control sampling: efficient subsampling in imbalanced data sets. PMC 42(5), 1693–1724 (2014)

    MathSciNet  MATH  Google Scholar 

  5. Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines. Inf. Sci. 286(1), 228–246 (2014)

    Article  Google Scholar 

  6. Fernández, A., López, V., Galar, M., Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)

    Article  Google Scholar 

  7. George, N.I., Lu, T.P., Chang, C.W.: Cost-sensitive performance metric for comparing multiple ordinal classifiers. Artif. Intell. Res. 5(1), 135–143 (2016)

    Article  Google Scholar 

  8. Kulkarni, V.Y., Sinha, P.K.: Random forest classifiers: a survey and future research directions. Int. J. Adv. Comput. 36(1), 1144–1153 (2013)

    Google Scholar 

  9. Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111 (2015)

    Article  Google Scholar 

  10. Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.-S.: Fores texter: an efficient random forest algorithm for imbalanced text categorization. Knowl.-Based Syst. 67, 105–116 (2014)

    Article  Google Scholar 

  11. Kim, A., Oh, K., Jung, J.-Y.: Imbalanced classification of manufacturing quality conditions using cost-sensitive decision tree ensembles. Comput. Integr. Manuf. 31, 701–717 (2017)

    Article  Google Scholar 

  12. Raeder, T., Forman, G., Chawla, N.V.: Learning from imbalanced data: evaluation matters. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol. 23, pp. 315–331. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23166-7_12

    Google Scholar 

  13. Fang, S., et al.: Feature selection method based on class discriminative degree for intelligent medical diagnosis. CMC: Comput. Mater. Continua 55(3), 419–433 (2018)

    Google Scholar 

  14. Xi, X., Sheng, V.S., Sun, B., Wang, L., Hu, F.: An empirical comparison on multi-target regression learning. CMC: Comput. Mater. Continua 56(2), 185–198 (2018)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key Research and Development Plan (Grant No. 2017YFC0820603), BUPT’s Graduate education reform project (2018Y003) and the Project of Chinese Society of Academic degrees and graduate education (2017Y0502).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dang, X., Wu, X., Xie, X., Zhang, T. (2019). An Improved Multi-classification Algorithm for Imbalanced Online Public Opinion Data. In: Sun, X., Pan, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2019. Lecture Notes in Computer Science(), vol 11635. Springer, Cham. https://doi.org/10.1007/978-3-030-24268-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24268-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24267-1

  • Online ISBN: 978-3-030-24268-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics