Abstract
In this paper we propose a novel combined approach to solve the imbalanced data issue in the application to the problem of the post-operative life expectancy prediction for the lung cancer patients. This solution makes use of undersampling techniques together with cost-sensitive SVM (Support Vector Machines). First, we eliminate non-informative examples by applying Tomek links together with one-sided selection. Second, we take advantage of using cost-sensitive SVM with penalty costs calculated respecting cardinalities of minority and majority examples. We evaluate the presented solution by comparing the performance of our method with SVM-based approaches that deal with uneven data. The experimental evaluation was performed on real-life data from the postoperative risk management domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Chang, E.Y., Li, B., Wu, G., Goh, K.: Statistical learning for effective visual information retrieval. In: Proceedings of the 2003 International Conference on Image Processing, vol. 3, pp. 609–613. IEEE (2003)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O.: SMOTE: Synthetic Minority Over-sampling TEchnique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Chen, S., He, H., Garcia, E.A.: Ramoboost: Ranked minority oversampling in boosting. IEEE Transactions on Neural Networks 21(10), 1624–1642 (2010)
Elkan, C.: The foundations of cost-sensitive learning. In: The Proceedings of International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Lawrence Erlbaum Associates, Ltd. (2001)
Ertekin, S., Huang, J., Giles, C.L.: Active learning for class imbalance problem. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 823–824. ACM (2007)
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man and Cybernetics-Part C: Applications and Reviews 42(4), 3358–3378 (2012)
Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM SIGKDD Explorations Newsletter 6(1), 30–39 (2004)
He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Kubat, M., Matwin, S.: et al. Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, pp. 179–186. Morgan Kaufmann Publishers (1997)
Kukar, M., Kononenko, I.: Cost-sensitive learning with neural networks. In: Proceedings of the 13th European Conference on Artificial Intelligence (ECAI 1998), pp. 445–449. Citeseer (1998)
Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach-a case study in intensive care monitoring. In: Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), pp. 268–277. Morgan Kaufmann (1999)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)
Sun, Y., Kamel, M., Wong, A., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)
Tang, Y., Jin, B., Zhang, Y.Q.: Granular support vector machines with association rules mining for protein homology prediction. Artificial Intelligence in Medicine 35(1-2), 121–134 (2005)
Tomek, I.: Two Modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics 6(11), 769–772 (1976)
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCA I999), Workshop ML3, vol. 1999, pp. 55–60 (1999)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining Proceedings, pp. 324–331. IEEE (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zięba, M., Świątek, J., Lubicz, M. (2014). Cost Sensitive SVM with Non-informative Examples Elimination for Imbalanced Postoperative Risk Management Problem. In: Swiątek, J., Grzech, A., Swiątek, P., Tomczak, J. (eds) Advances in Systems Science. Advances in Intelligent Systems and Computing, vol 240. Springer, Cham. https://doi.org/10.1007/978-3-319-01857-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-01857-7_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01856-0
Online ISBN: 978-3-319-01857-7
eBook Packages: EngineeringEngineering (R0)