Improving Reliability of Unbalanced Text Mining by Reducing Performance Bias

  • Ling Zhuang
  • Min Gan
  • Honghua Dai
Conference paper


Class imbalance in textual data is one important factor that affects the reliability of text mining. For imbalanced textual data, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities.

This chapter aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized oneclass classifiers achieve more balanced performance than the standard approaches.


Minority Class Class Imbalance Performance Bias Improve Reliability High Accuracy Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dai, H., Ciesielski, V.: Learning of inexact rules by the fish-net algorithm from low quality data. In: Proceedings of the Eighth Australian Joint Artificial Intelligence Conference, pp. 108–115, (1994)Google Scholar
  2. 2.
    Dai, H.: A study on reliability in graph mining. In: Proceedings of IEEE ICDM workshops 2006, pp. 775–779, (2006)Google Scholar
  3. 3.
    Dai, H.: A case study on classification reliability. In: Proceedings of IEEE ICDM workshops 2006, pp. 69–73, (2008)Google Scholar
  4. 4.
    David, R. P. D., Tax, M.J.: Support vector domain description. Pattern Recognition Letters, 20, pp. 1191–1199, (1999)CrossRefGoogle Scholar
  5. 5.
    Forman, F., A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the 21st International Conference on Machine Learning, (2004).Google Scholar
  6. 6.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Second Edition, Elsevier, (2006)MATHGoogle Scholar
  7. 7.
    Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In Proceedings of the AAAI Workshop on Learning from Imbalanced Data Sets, pp. 10–15, (2000)Google Scholar
  8. 8.
    Lunts, A., Brailovskiy, V.: Evaluation of attributes obtained in statistical decision rules. Engineering Cybernetics, pp. 98–109, (1967)Google Scholar
  9. 9.
    Liu, Y., Loh, H. T., Sun, A.: Imbalanced text classification: A term weight approach. Expert Systems with Applications, 36, pp. 690–701, (2009)CrossRefGoogle Scholar
  10. 10.
    Manevitz, L. M., Yousef M.: One-class svms for document classification. Journal of Machine Learning Research, 2, pp. 139–154, (2001)Google Scholar
  11. 11.
    Raskutti, B., Kowalczyk, A.: Extreme re-balancing for svms: a case study. SIGKDD Explorations, 6, pp. 60–69, (2004)CrossRefGoogle Scholar
  12. 12.
    Scholkopt, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., Williamson, R. C.: Estimating the support of a high-dimensional distribution. Neural Computation, 13, pp. 1443–1471, (2001)CrossRefGoogle Scholar
  13. 13.
    Staelin, C.: Parameter selection for support vector machines. Technical Report HPL-2002-354R1, Hewlett-Packard Company, 2003.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.School of Information TechnologyDeakin UniversityMelbourneAustralia

Personalised recommendations