Utilizing DTRS for Imbalanced Text Classification

Zhou, Bing; Yao, Yiyu; Liu, Qingzhong

doi:10.1007/978-3-319-47160-0_20

Bing Zhou²³,
Yiyu Yao²⁴ &
Qingzhong Liu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9920))

Included in the following conference series:

International Joint Conference on Rough Sets

973 Accesses

Abstract

Imbalanced data classification is one of the challenging problems in data mining and machine learning research. The traditional classification algorithms are often biased towards the majority class when learning from imbalanced data. Much work have been proposed to address this problem, including data re-sampling, algorithm modification, and cost-sensitive learning. However, most of them focus on one of these techniques. This paper proposes to utilize both algorithm modification and cost-sensitive learning based on decision-theoretic rough set (DTRS) model. In particular, we use naive Bayes classifier as the base classifier and modify it for imbalanced learning. For cost-sensitive learning, we adopt the systematic method from DTRS to derive required thresholds that have the minimum decision cost. Our experimental results on three well-known text classification databases show that unified DTRS provides similar performance on balanced class distribution, outperforms naive Bayes classifier on imbalanced datasets, and is competitive with other imbalanced learning classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Breiman, L., Friedman, J., Stone, C.J., Olshen, R.: Classification and Regression Trees. Chapman and Hall, Belmont (1984)
MATH Google Scholar
Dietterich, T., Kearns, M., Mansour, Y.: Applying the weak learning framework to understand and improve C4.5. In: Proceedings of the 13th International Conference on Machine Learning, pp. 96–104. Morgan Kaufmann (1996)
Google Scholar
Domingos, P., Pazzani, M.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning, pp. 105–112 (1996)
Google Scholar
Drummond, C., Holte, R.: Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: ICML, pp. 239–246 (2000)
Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classication and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Flach, P.A.: The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In: ICML, pp. 194–201 (2003)
Google Scholar
Good, I.J.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)
MATH Google Scholar
Langley, P., Wayne, I., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the 10th National Conference on Artificial Intelligence, pp. 223–228 (1992)
Google Scholar
Lpez, V., Fernndez, A., Garca, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Article Google Scholar
Margineantu, D.D. When does imbalanced data require cost-sensitive learning? AAAI Technical report WS-00-05 (2000)
Google Scholar
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Probost, F. Machine learning from imbalanced data sets 101. Invited Paper for the AAAI 2000 Workshop on Imbalanced Data Sets (2000)
Google Scholar
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)
Google Scholar
Raskutti, B.: Extreme re-balancing for SVM’s: a case study. In: ICML-KDD 2003 Workshop: Learning from Imbalanced Data Sets (2003)
Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Article MATH Google Scholar
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)
Article Google Scholar
Wu, G.: Class-boundary alignment for imbalanced dataset learning. In: ICML-KDD 2003 Workshop: Learning from Imbalanced Data Sets (2003)
Google Scholar
Yang, Q., Wu, X.D.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 05, 597 (2006)
Article Google Scholar
Yao, Y.Y.: Three-way decisions and cognitive computing. Cogn. Comput. 8, 543–554 (2016). doi:10.1007/s12559-016-9397-5
Article Google Scholar
Yao, Y.Y., Wong, S.K.M., Lingras, P.: A decision-theoretic rough set model. In: Ras, Z.W., Zemankova, M., Emrich, M.L. (eds.) Methodologies for Intelligent Systems, vol. 5, pp. 17–24. North-Holland, New York (1990)
Google Scholar
Yao, Y.Y., Zhou, B.: Two Bayesian approaches to rough sets. Eur. J. Oper. Res. 251, 904–917 (2016)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Sam Houston State University, Huntsville, TX, 77341, USA
Bing Zhou & Qingzhong Liu
Department of Computer Science, University of Regina, Regina, SK, S4S 0A2, Canada
Yiyu Yao

Authors

Bing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yiyu Yao
View author publications
You can also search for this author in PubMed Google Scholar
Qingzhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bing Zhou .

Editor information

Editors and Affiliations

Universidad Católica del Norte, Antofagasta, Chile
Víctor Flores
University of Campinas, Campinas, SP, Brazil
Fernando Gomide
University of Warsaw, Warsaw, Poland
Andrzej Janusz
Universidad del Católica del Norte, Antofagasta, Chile
Claudio Meneses
Tongji University, Shanghai, China
Duoqian Miao
University of Applied Sciences, Munich, Germany
Georg Peters
University of Warsaw, Warsaw, Poland
Dominik Ślęzak
Chongqing University of Posts and Telecommunications, Chongqing, China
Guoyin Wang
Universidad de Chile, Santiago, Chile
Richard Weber
University of Regina, Regina, Saskatchewan, Canada
Yiyu Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, B., Yao, Y., Liu, Q. (2016). Utilizing DTRS for Imbalanced Text Classification. In: Flores, V., et al. Rough Sets. IJCRS 2016. Lecture Notes in Computer Science(), vol 9920. Springer, Cham. https://doi.org/10.1007/978-3-319-47160-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-47160-0_20
Published: 29 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47159-4
Online ISBN: 978-3-319-47160-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics