Mixed Data Balancing through Compact Sets Based Instance Selection

Villuendas-Rey, Yenny; Matilde García-Lorenzo, María

doi:10.1007/978-3-642-41822-8_32

Yenny Villuendas-Rey¹⁸ &
María Matilde García-Lorenzo¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8258))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

3476 Accesses

Abstract

Learning in datasets that suffer from imbalanced class distribution is an important problem in Pattern Recognition. This paper introduces a novel algorithm for data balancing, based on compact set clustering of the majority class. The proposed algorithm is able to deal with mixed, as well as incomplete data, and with arbitrarily dissimilarity functions. Numerical experiments over repository databases show the high quality performance of the method proposed in this paper according to area under the ROC curve and imbalance ratio.

Download to read the full chapter text

Chapter PDF

Imbalanced instance selection based on Laplacian matrix decomposition with weighted k-nearest-neighbor graph

Article 21 April 2024

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification

Article 02 January 2023

Keywords

References

Weiss, G.M.: Learning with rare cases and small disjuncts. In: Proceedings of the International Conference on Machine Learning, ICML 2003, pp. 558–565 (2003)
Google Scholar
Hand, D.J., Vinciotti, V.: Choosing k for two-class nearest neighbor classifiers with imbalanced classes. Pattern Recognition Letters 24, 1555–1562 (2003)
Article MATH Google Scholar
Zhang, J., Mani, I.: kNN approach to unbalanced data distribution: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets (2003)
Google Scholar
Moreno, J., Rodriguez, D., Sicilia, M.A., Riquelme, J.C., Ruiz, R.: SMOTE-I: improvement of SMOTE algorithm for minority classes balancing. In: Proceedings of Workshops of Software Engineering and Databases 3 (2009) (in Spanish)
Google Scholar
García, V.: Distributions of non-balanced classes: metrics, complexity analysis and learning algorithms. PhD Dissertation Thesis, Department of Languages and Computer Systems, University Jaume I, Spain (2010)
Google Scholar
Laurikkala, J.: Instance-based data reduction for improved identification of difficult small classes. Intelligent Data Analysis 6, 311–322 (2002)
MATH Google Scholar
Alejo, R., Valdovinos, R.M., García, V., Pacheco-Sanchez, J.H.: A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognition Letters 34, 380–388 (2013)
Article Google Scholar
García-Borroto, M., Ruiz-Shulcloper, J.: Selecting prototypes in Mixed and Incomplete data. In: Sanfeliu, A., Cortés, M.L. (eds.) CIARP 2005. LNCS, vol. 3773, pp. 450–459. Springer, Heidelberg (2005)
Chapter Google Scholar
Villuendas-Rey, Y., Rey-Benguría, C., Caballero-Mota, Y., García-Lorenzo, M.M.: Nearest prototype classification of special school families based on hierarchical compact sets clustering. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS, vol. 7637, pp. 662–671. Springer, Heidelberg (2012)
Chapter Google Scholar
Ruiz-Shulcloper, J., Abidi, M.A.: Logical combinatorial Pattern Recognition: A review. In: Pandalai, S.G. (ed.) Recent Research Developments in Pattern Recognition. Transword Research Networks, pp. 133–176 (2002)
Google Scholar
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing 17, 255–287 (2011)
Google Scholar
Wilson, R.D., Martinez, T.R.: Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
MathSciNet MATH Google Scholar
Bradley, A.: The use of Area under the ROC curve in the evaluation of Machine Learning Algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond Accuracy, F-Score and ROC: a family of Discriminant measures for Performance evaluations. In: Proceedings of the Australian Conference on Artificial Intelligence, pp. 1015–1021 (2006)
Google Scholar
Demsar, J.: Statistical comparison of classifiers over multiple datasets. Journal of Machine Learning Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Ciego de Ávila, Carr. A Morón km 9 ½, Cuba
Yenny Villuendas-Rey
Department of Computer Science, Universidad Central Marta Abreu of Las Villas, Carr. A Camajuaní, km 5 ½, Cuba
María Matilde García-Lorenzo

Authors

Yenny Villuendas-Rey
View author publications
You can also search for this author in PubMed Google Scholar
María Matilde García-Lorenzo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Advanced Technologies Application Center (CENATAV), 7a A#21406 esq. 214 y 216, Rpto. Siboney, Playa., C.P. 12200, La Habana, Cuba
José Ruiz-Shulcloper
National Research Council (CNR), Institute of Cybernetics “E. Caianiello”, Via Campi Flegrei 34, 80078, Pozzuoli, Naples, Italy
Gabriella Sanniti di Baja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Villuendas-Rey, Y., Matilde García-Lorenzo, M. (2013). Mixed Data Balancing through Compact Sets Based Instance Selection. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41822-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-41822-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41821-1
Online ISBN: 978-3-642-41822-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Mixed Data Balancing through Compact Sets Based Instance Selection

Abstract

Chapter PDF

Similar content being viewed by others

Imbalanced instance selection based on Laplacian matrix decomposition with weighted k-nearest-neighbor graph

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Mixed Data Balancing through Compact Sets Based Instance Selection

Abstract

Chapter PDF

Similar content being viewed by others

Imbalanced instance selection based on Laplacian matrix decomposition with weighted k-nearest-neighbor graph

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation