Clustering Methodology in Mixed Data Sets

González León, Jacobo Gerardo; Mata Rivera, Miguel Félix

doi:10.1007/978-3-030-33229-7_13

Clustering Methodology in Mixed Data Sets

Jacobo Gerardo González León⁹ &
Miguel Félix Mata Rivera⁹

Conference paper
First Online: 24 October 2019

824 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1053))

Abstract

One of the most challenging tasks of data analysis is finding clusters in mixed data sets, as they have numerical and categorical variables, and lack a labeled variable to serve as a guide. These clusters could serve to summarize all the variables of a data set into one and be able to find information more easily than generating summarizations for each variable. In this research thesis, a methodology of clustering on mixed data sets is proposed, which yields better results than the methods applied in the state of the art.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ströing, P.: Scientific Phenomena and Patterns in Data. Ludwig-Maximilians-Universität, München (2018)
Google Scholar
Zaki, M.J., Meira, W.: Data Mining and Analysis. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Bramer, M.: Principles of Data Mining. Springer, London (2016). https://doi.org/10.1007/978-1-4471-7307-6
Book MATH Google Scholar
Soley-Bori, M.: Dealing with missing data: key assumptions and methods for applied analysis, vol. 23. Boston University (2013)
Google Scholar
Yadav, M., Roychoudhury, B.: Handling missing values: a study of popular imputation packages in R. Knowl.-Based Syst. 160, 104–118 (2018)
Article Google Scholar
Larose, D., Larose, C.: Discovering Knowledge in Data: An Introduction to Data Mining, 2nd edn. Wiley, Hoboken (2014)
MATH Google Scholar
Patki, N., Wedge, R., Veeramachaneni, K.: The synthetic data vault. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410. IEEE (2016)
Google Scholar
Adolfsson, A., Ackerman, M., Brownstein, N.: To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn. 88, 13–26 (2019)
Article Google Scholar
McCue, C.: Public-safety-specific evaluation. In: Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis, pp. 157–183. Butterworth-Heinemann (2015)
Google Scholar
Wu, X., Ma, T., Cao, J., Tian, Y., Alabdulkarim, A.: A comparative study of clustering ensemble algorithms. Comput. Electr. Eng. 68, 603–615 (2018)
Article Google Scholar
Jukes, E.: Encyclopedia of machine learning and data mining (2nd edition). Ref. Rev. 32, 3–4 (2018)
Google Scholar
Loshin, D.: Knowledge discovery and data mining for predictive analytics. In: Business Intelligence. The Savvy Manager’s Guide MK Series on Business Intelligence, 2nd edn., pp. 271–286 (2013)
Google Scholar
Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. (2018)
Google Scholar
Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)
Article Google Scholar
Gurrutxaga, I., Muguerza, J., Arbelaitz, O., Pérez, J., Martín, J.: Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recogn. Lett. 32, 505–515 (2011)
Article Google Scholar
Jauhiainen, J., Kärkkäinen, S.: Comparison of internal clustering validation indices for prototype-based clustering. Algorithms 10, 105 (2017)
Article MathSciNet Google Scholar
Desgraupes, B.: Clustering Indices. University of Paris Ouest-Lab Modal’X, vol. 1, pp. 34 (2013)
Google Scholar
Han, J., Kamber, M., Pei, J.: Cluster analysis: basic concepts and methods. In: Data Mining, pp. 443–495 (2012)
Google Scholar
Benabdellah, A., Benghabrit, A., Bouhaddou, I.: A survey of clustering algorithms for an industrial context. Proc. Comput. Sci. 148, 291–302 (2019)
Article Google Scholar
Rodriguez, M., Comin, C., Casanova, D., Bruno, O., Amancio, D., Costa, L., Rodrigues, F.: Clustering algorithms: a comparative approach. PLoS One 14, e0210236 (2019)
Article Google Scholar
Yang, Y.: Temporal Data Mining via Unsupervised Ensemble Learning. Elsevier Science, Amsterdam (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Politécnico Nacional, Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Avenida Instituto Politécnico Nacional No. 2580, Col Barrio la Laguna Ticomán, Gustavo A. Madero, 07340, Mexico City, Mexico
Jacobo Gerardo González León & Miguel Félix Mata Rivera

Authors

Jacobo Gerardo González León
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Félix Mata Rivera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jacobo Gerardo González León or Miguel Félix Mata Rivera .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Miguel Felix Mata-Rivera
Instituto Politécnico Nacional, Mexico City, Mexico
Roberto Zagal-Flores
Universidad Mayor, Santiago, Chile
Cristian Barría-Huidobro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González León, J.G., Mata Rivera, M.F. (2019). Clustering Methodology in Mixed Data Sets. In: Mata-Rivera, M., Zagal-Flores, R., Barría-Huidobro, C. (eds) Telematics and Computing. WITCOM 2019. Communications in Computer and Information Science, vol 1053. Springer, Cham. https://doi.org/10.1007/978-3-030-33229-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-33229-7_13
Published: 24 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33228-0
Online ISBN: 978-3-030-33229-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics