Abstract
Many industrial tasks are related to the problem of the classification of unbalanced datasets. In these cases rare patterns of interest for the particular applications have to be detected among a much larger amount of patterns. Since data unbalance strongly affects the performance of standard classifiers, several ad–hoc methods have been developed. In this work the main techniques for handling class unbalance are depicted and three methods developed by the authors and based on the use of neural networks are described and tested on industrial case studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbani, R., Kwek, S., Japkowicz, N.: 15th European Conference on Machine Learning ECML 2004, Pisa, Italy, Sept. 20–24, pp. 39–50. Springer, Berlin (2004)
Borselli, A., Colla, V., Vannucci, M., Veroli, M.: A fuzzy inference system applied to defect detection in flat steel production. In: 2010 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1–6 (2010)
Cateni, S., Colla, V., Vannucci, M.: Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems. Int. Conf. Intell. Syst. Des. Appl. ISDA 2011, 402–407 (2011)
Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 135, 32–41 (2014)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)
Chawla, N.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of ICML03 Works on Class Imbalances (2003)
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced datasets. Comp. Intell. 20(1), 18–36 (2004)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Leskovec, J., Shawne-Taylor, J.: Linear programming boosting for uneven datasets. In: 20th International Conference on Machine Learning (ICML’03), pp. 456–463. AAAI Press, event Dates: 21–24 August (2003)
Li, P., Chan, K., Fang, W.: Hybrid kernel machine ensemble for imbalanced data sets. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, pp. 1108–1111 (2006)
Ling, C., Yang, Q., Wang, J., Zhang, S.: Decision trees with minimal costs. In: Proceedings of the 21-st International Conference on Machine Learning ICML ’04, p. 69. ACM, New York, NY, USA (2004)
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12(5), 1207–1245 (2000)
Soler, V., Prim, M.: 17th International Conference on Artificial Neural Networks—ICANN 2007, vol. I, pp. 511–519. Springer, Berlin (2007)
Vannucci, M., Colla, V.: Novel classification method for sensitive problems and uneven datasets based on neural networks and fuzzy logic. Appl. Soft Comput. J. 11(2), 2383–2390 (2011)
Vannucci, M., Colla, V., Cateni, S., Sgarbi, M.: Artificial intelligence techniques for unbalanced datasets in real world classification tasks, chap. In: Computational Modeling and Simulation of Intellect: Current State and Future Perspectives, pp. 551–565. IGI Global (2011)
Vannucci, M., Colla, V., Nastasi, G., Matarese, N.: Detection of rare events within industrial datasets by means of data resampling and specific algorithms. Int. J. Simul. Syst. Sci. Technol. 11(3), 1–11 (2010)
Vannucci, M., Colla, V., Sgarbi, M., Toscanelli, O.: Thresholded neural networks for sensitive industrial classification tasks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5517 LNCS(PART 1), pp. 1320–1327 (2009)
Vannucci, M., Colla, V., Vannocci, M., Reyneri, L.: Dynamic resampling method for classification of sensitive problems and uneven datasets. In: Communications in Computer and Information Science 298 CCIS (PART 2), pp. 78–87 (2012)
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6(1), 7–19 (2004)
Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Int. Res. 19(1), 315–354 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Vannucci, M., Colla, V. (2018). Advanced Neural Networks Systems for Unbalanced Industrial Datasets. In: Esposito, A., Faudez-Zanuy, M., Morabito, F., Pasero, E. (eds) Multidisciplinary Approaches to Neural Computing. Smart Innovation, Systems and Technologies, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56904-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-56904-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56903-1
Online ISBN: 978-3-319-56904-8
eBook Packages: EngineeringEngineering (R0)