Abstract
The recently developed Conformal Predictor (CP) can provide calibrated confidence for prediction which is out of the traditional predictors’ capacity. However, CP works for balanced data and fails in the case of imbalanced data. To handle this problem, Local Clustering Conformal Predictor (LCCP) which plugs a two-level partition into the framework of CP is proposed. In the first-level partition, the whole imbalanced training dataset is partitioned into some class-taxonomy data subsets. Secondly, the majority class examples proceed to be partitioned into some cluster-taxonomy data subsets by clustering method. To predict a new instance, LCCP selects the nearest cluster, incorporated with the minority class examples, to build a re-balanced training data. The designed LCCP model aims to not only provide valid confidence for prediction, but significantly improve the prediction efficiency as well. The experimental results show that LCCP model presents superiority than CP model for imbalanced data classification.
Chapter PDF
Similar content being viewed by others
References
Li, H.R.: Reliability and Validity in Qualitative Research. PhD thesis, Harbin Engineering University (2009)
Melluish, T., Saunders, C., Nouretdinov, I., Vovk, V.: Comparing the Bayes and Typicalness Frameworks. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 360–371. Springer, Heidelberg (2001)
Elazmeh, W., Japkowicz, N., Matwin, S.: Evaluating misclassifications in imbalanced data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 126–137. Springer, Heidelberg (2006)
Li, F., Mi, H., Yang, F.: Exploring the stability of feature selection for imbalanced intrusion detection data. In: 9th IEEE International Conference on Control and Automation, Santiago, pp. 750–754 (2011)
Shafer, G., Vovk, V.: A tutorial on conformal prediction. Journal of Machine Learning Research 9, 371–421 (2005)
Vovk, V., Gammerman, A., Shafer, A.G.: Algorithmic Learning in a Random World. Springer, New York (2005)
Saunders, C., Gammerman, A., Vovk, V.: Transduction with confidence and credibility. In: 16th International Joint Conference on Artificial Intelligence, Stockholm, pp. 722–726 (1999)
Gammerman, A., Vovk, V.: Kolmogorov complexity: Sources, theory and applications. The Computer Journal 42(4), 252–255 (1999)
Bellotti, T., Luo, Z., Gammerman, A.: Qualified predictions for microarray and proteomics pattern diagnosis with confidence machines. International Journal of Neural Systems 15(4), 247–258 (2005)
Vega, J., Murari, A., Pereira, A.: Accurate and reliable image classification by using conformal predictors in the TJ-II Thomson scattering. Review of Scientific Instruments 81, 10–18 (2010)
Papadopoulos, H., Vovk, V., Gammerman, A.: Regression Conformal Prediction with Nearest Neighbours. J. Artif. Intell. Res (JAIR) 40, 815–840 (2011)
Papadopoulos, H., Haralambous, H.: Reliable Prediction Intervals with Regression Neural Networks. Neural Networks 24(8), 842–851 (2011)
Li, F., Kosecka, J., Wechsler, H.: Strangeness based feature selection for part based recognition. In: Computer Vision and Pattern Recognition Workshop, p. 22 (2006)
Papadopoulos, H.: Inductive Conformal Prediction: Theory and Application to Neural Networks. In: Tools in Artificial Intelligence, ch.18, pp. 315–330. I-Tech, Vienna (2008)
Huazhen, W., Chengde, L., Fan, Y., Jinfa, Z.: An online Algorithm with confidence for Real-Time Fault Detection. Journal of Information and Computational Science 6(1), 305–313 (2009)
Fan, Y., Huazhen, W., Hong, M., Weiwen, C.: Using random forest for reliable classification and cost-sensitive learning for medical diagnosis. Bmc Bioinformatics10(1), S22, 14-18 (2009)
Devetyarov, D., Nouretdinov, I., Burford, B.: Conformal predictors in early diagnostics of ovarian and breast cancers. In: Progress in Artificial Intelligence, pp. 1–13 (2012)
Chawla, N.V., Bowyer, K.W., Hall, L.O.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16(1), 321–357 (2002)
Grzymala, J.W., Stefanowski, J.: A comparison of two approaches to data mining from imbalanced data. Journal of Intelligent Manufacturing 16(6), 565–573 (2005)
Yen, S.-J., Lee, Y.-S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36(3), 5718–5727 (2009)
Ji, H., Zhang, H.X.: Classification with Local Clustering in Imbalanced Data Sets. Advanced Materials Research 219, 151–155 (2011)
Wu, J., Xiong, H., Chen, J.: COG.: Local decomposition for rare class analysis. Data Mining and Knowledge Discovery 20(2), 191–220 (2010)
Prachuabsupakij, W., Soonthornphisaj, N.: Clustering and combined sampling approaches for multi-class imbalanced data classification. In: Zeng, D. (ed.) Advances in Information Technology and Industry Applications. LNEE, vol. 136, pp. 717–724. Springer, Heidelberg (2012)
HuaZhen, W., ChengDe, L., Fan, Y., XueQin, H.: Hedged predictions for traditional Chinese chronic gastritis diagnosis with confidence machine. Computers in Biology and Medicine 39(5), 425–432 (2009)
Lyman, P., Georgakis, C.: Plant-wide control of the Tennessee Eastman problem. Computers and Chemical Engineering 19(3), 321–331 (1995)
Kulkarni, A., Jayaraman, V., Kulkarni, B.: Knowledge incorporated support vector machines to detect faults in tennessee eastman process. Computers and Chemical Engineering 29(10), 2128–2133 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 IFIP International Federation for Information Processing
About this paper
Cite this paper
Wang, H., Chen, Y., Chen, Z., Yang, F. (2013). Local Clustering Conformal Predictor for Imbalanced Data Classification. In: Papadopoulos, H., Andreou, A.S., Iliadis, L., Maglogiannis, I. (eds) Artificial Intelligence Applications and Innovations. AIAI 2013. IFIP Advances in Information and Communication Technology, vol 412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41142-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-41142-7_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41141-0
Online ISBN: 978-3-642-41142-7
eBook Packages: Computer ScienceComputer Science (R0)