Abstract
Imbalanced data classification is one of the challenging problems in data mining and machine learning research. The traditional classification algorithms are often biased towards the majority class when learning from imbalanced data. Much work have been proposed to address this problem, including data re-sampling, algorithm modification, and cost-sensitive learning. However, most of them focus on one of these techniques. This paper proposes to utilize both algorithm modification and cost-sensitive learning based on decision-theoretic rough set (DTRS) model. In particular, we use naive Bayes classifier as the base classifier and modify it for imbalanced learning. For cost-sensitive learning, we adopt the systematic method from DTRS to derive required thresholds that have the minimum decision cost. Our experimental results on three well-known text classification databases show that unified DTRS provides similar performance on balanced class distribution, outperforms naive Bayes classifier on imbalanced datasets, and is competitive with other imbalanced learning classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.: Classification and Regression Trees. Chapman and Hall, Belmont (1984)
Dietterich, T., Kearns, M., Mansour, Y.: Applying the weak learning framework to understand and improve C4.5. In: Proceedings of the 13th International Conference on Machine Learning, pp. 96–104. Morgan Kaufmann (1996)
Domingos, P., Pazzani, M.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning, pp. 105–112 (1996)
Drummond, C., Holte, R.: Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: ICML, pp. 239–246 (2000)
Duda, R.O., Hart, P.E.: Pattern Classication and Scene Analysis. Wiley, New York (1973)
Flach, P.A.: The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In: ICML, pp. 194–201 (2003)
Good, I.J.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)
Langley, P., Wayne, I., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the 10th National Conference on Artificial Intelligence, pp. 223–228 (1992)
Lpez, V., Fernndez, A., Garca, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Margineantu, D.D. When does imbalanced data require cost-sensitive learning? AAAI Technical report WS-00-05 (2000)
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Probost, F. Machine learning from imbalanced data sets 101. Invited Paper for the AAAI 2000 Workshop on Imbalanced Data Sets (2000)
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)
Raskutti, B.: Extreme re-balancing for SVM’s: a case study. In: ICML-KDD 2003 Workshop: Learning from Imbalanced Data Sets (2003)
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)
Wu, G.: Class-boundary alignment for imbalanced dataset learning. In: ICML-KDD 2003 Workshop: Learning from Imbalanced Data Sets (2003)
Yang, Q., Wu, X.D.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 05, 597 (2006)
Yao, Y.Y.: Three-way decisions and cognitive computing. Cogn. Comput. 8, 543–554 (2016). doi:10.1007/s12559-016-9397-5
Yao, Y.Y., Wong, S.K.M., Lingras, P.: A decision-theoretic rough set model. In: Ras, Z.W., Zemankova, M., Emrich, M.L. (eds.) Methodologies for Intelligent Systems, vol. 5, pp. 17–24. North-Holland, New York (1990)
Yao, Y.Y., Zhou, B.: Two Bayesian approaches to rough sets. Eur. J. Oper. Res. 251, 904–917 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhou, B., Yao, Y., Liu, Q. (2016). Utilizing DTRS for Imbalanced Text Classification. In: Flores, V., et al. Rough Sets. IJCRS 2016. Lecture Notes in Computer Science(), vol 9920. Springer, Cham. https://doi.org/10.1007/978-3-319-47160-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-47160-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47159-4
Online ISBN: 978-3-319-47160-0
eBook Packages: Computer ScienceComputer Science (R0)