Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm

Giraldo-Forero, Andrés Felipe; Jaramillo-Garzón, Jorge Alberto; Ruiz-Muñoz, José Francisco; Castellanos-Domínguez, César Germán

doi:10.1007/978-3-642-41822-8_42

Andrés Felipe Giraldo-Forero¹⁸,
Jorge Alberto Jaramillo-Garzón^18,19,
José Francisco Ruiz-Muñoz¹⁸ &
…
César Germán Castellanos-Domínguez¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8258))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

9688 Accesses
14 Citations
1 Altmetric

Abstract

Multi-label learning has been becoming an increasingly active area into the machine learning community since a wide variety of real world problems are naturally multi-labeled. However, it is not uncommon to find disparities among the number of samples of each class, which constitutes an additional challenge for the learning algorithm. Smote is an oversampling technique that has been successfully applied for balancing single-labeled data sets, but has not been used in multi-label frameworks so far. In this work, several strategies are proposed and compared in order to generate synthetic samples for balancing data sets in the training of multi-label algorithms. Results show that a correct selection of seed samples for oversampling improves the classification performance of multi-label algorithms. The uniform generation oversampling, provides an efficient methodology for a wide scope of real world problems.

Download to read the full chapter text

Chapter PDF

A First Approach to Deal with Imbalance in Multi-label Datasets

Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

Article 30 March 2024

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004)
Article Google Scholar
Elisseeff, A.: Kernel methods for multi-labelled classification and categorical regression problems. In: Advances in Neural Information Processing (2002)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)
Article Google Scholar
Jaramillo-Garzón, J.A., et al.: Predictability of protein subcellular locations by pattern recognition techniques. In: EMBC-IEEE (2010)
Google Scholar
Zhang, M., Zhou, Z.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Huang, S.J., Zhou, Z.H.: Multi-Label Learning by Exploiting Label Correlations Locally. In: IAAA (2012)
Google Scholar
Kong, X., Ng, M., Zhou, Z.: Transductive Multi-Label Learning via Label Set Propagation. IEEE Transactions on Knowledge and Data Engineering, 1–14 (2011)
Google Scholar
He, H., Garcia, E.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Article Google Scholar
Chawla, N., Bowyer, K., Hall, L.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial 16 (2002)
Google Scholar
Tahir, M.A., Kittler, et al.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition (2012)
Google Scholar
Dendamrongvit, S., Kubat, M.: Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) PAKDD Workshops 2009. LNCS, vol. 5669, pp. 40–52. Springer, Heidelberg (2010)
Chapter Google Scholar
Chen, K., Liang Lu, B.: Efficient classification of multilabel and imbalanced data using min-max modular classifiers. In: The International Joint Conference on Neural Networks (IJCNN 2006), pp. 1770–1775 (2006)
Google Scholar
Tsoumakas, G., Vilcek, J., Spyromitros, E., Vlahavas, I.: Mulan: A java library for multi-label learning. Journal of Machine Learning Research 1, 1–48 (2010)
Google Scholar
Zhou, Z.-H., Zhang, M.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems (2007)
Google Scholar
Klimt, B., Yang, Y.: Introducing the Enron Corpus. Machine Learning (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Signal Processing and Recognition Group, Universidad Nacional de Colombia, Campus la Nubia, Km 7 vía al Magdalena, Manizales, Colombia
Andrés Felipe Giraldo-Forero, Jorge Alberto Jaramillo-Garzón, José Francisco Ruiz-Muñoz & César Germán Castellanos-Domínguez
Grupo de Máquinas Inteligentes y Reconocimiento de Patrones - MIRP, Instituto Tecnológico Metropolitano, Cll 54A No 30-01, Medellín, Colombia
Jorge Alberto Jaramillo-Garzón

Authors

Andrés Felipe Giraldo-Forero
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Alberto Jaramillo-Garzón
View author publications
You can also search for this author in PubMed Google Scholar
José Francisco Ruiz-Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
César Germán Castellanos-Domínguez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Advanced Technologies Application Center (CENATAV), 7a A#21406 esq. 214 y 216, Rpto. Siboney, Playa., C.P. 12200, La Habana, Cuba
José Ruiz-Shulcloper
National Research Council (CNR), Institute of Cybernetics “E. Caianiello”, Via Campi Flegrei 34, 80078, Pozzuoli, Naples, Italy
Gabriella Sanniti di Baja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giraldo-Forero, A.F., Jaramillo-Garzón, J.A., Ruiz-Muñoz, J.F., Castellanos-Domínguez, C.G. (2013). Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41822-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-41822-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41821-1
Online ISBN: 978-3-642-41822-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

A First Approach to Deal with Imbalance in Multi-label Datasets

Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

A First Approach to Deal with Imbalance in Multi-label Datasets

Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation