Abstract
One of the most challenging tasks of data analysis is finding clusters in mixed data sets, as they have numerical and categorical variables, and lack a labeled variable to serve as a guide. These clusters could serve to summarize all the variables of a data set into one and be able to find information more easily than generating summarizations for each variable. In this research thesis, a methodology of clustering on mixed data sets is proposed, which yields better results than the methods applied in the state of the art.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ströing, P.: Scientific Phenomena and Patterns in Data. Ludwig-Maximilians-Universität, München (2018)
Zaki, M.J., Meira, W.: Data Mining and Analysis. Cambridge University Press, Cambridge (2014)
Bramer, M.: Principles of Data Mining. Springer, London (2016). https://doi.org/10.1007/978-1-4471-7307-6
Soley-Bori, M.: Dealing with missing data: key assumptions and methods for applied analysis, vol. 23. Boston University (2013)
Yadav, M., Roychoudhury, B.: Handling missing values: a study of popular imputation packages in R. Knowl.-Based Syst. 160, 104–118 (2018)
Larose, D., Larose, C.: Discovering Knowledge in Data: An Introduction to Data Mining, 2nd edn. Wiley, Hoboken (2014)
Patki, N., Wedge, R., Veeramachaneni, K.: The synthetic data vault. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410. IEEE (2016)
Adolfsson, A., Ackerman, M., Brownstein, N.: To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn. 88, 13–26 (2019)
McCue, C.: Public-safety-specific evaluation. In: Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis, pp. 157–183. Butterworth-Heinemann (2015)
Wu, X., Ma, T., Cao, J., Tian, Y., Alabdulkarim, A.: A comparative study of clustering ensemble algorithms. Comput. Electr. Eng. 68, 603–615 (2018)
Jukes, E.: Encyclopedia of machine learning and data mining (2nd edition). Ref. Rev. 32, 3–4 (2018)
Loshin, D.: Knowledge discovery and data mining for predictive analytics. In: Business Intelligence. The Savvy Manager’s Guide MK Series on Business Intelligence, 2nd edn., pp. 271–286 (2013)
Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. (2018)
Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)
Gurrutxaga, I., Muguerza, J., Arbelaitz, O., Pérez, J., Martín, J.: Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recogn. Lett. 32, 505–515 (2011)
Jauhiainen, J., Kärkkäinen, S.: Comparison of internal clustering validation indices for prototype-based clustering. Algorithms 10, 105 (2017)
Desgraupes, B.: Clustering Indices. University of Paris Ouest-Lab Modal’X, vol. 1, pp. 34 (2013)
Han, J., Kamber, M., Pei, J.: Cluster analysis: basic concepts and methods. In: Data Mining, pp. 443–495 (2012)
Benabdellah, A., Benghabrit, A., Bouhaddou, I.: A survey of clustering algorithms for an industrial context. Proc. Comput. Sci. 148, 291–302 (2019)
Rodriguez, M., Comin, C., Casanova, D., Bruno, O., Amancio, D., Costa, L., Rodrigues, F.: Clustering algorithms: a comparative approach. PLoS One 14, e0210236 (2019)
Yang, Y.: Temporal Data Mining via Unsupervised Ensemble Learning. Elsevier Science, Amsterdam (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
González León, J.G., Mata Rivera, M.F. (2019). Clustering Methodology in Mixed Data Sets. In: Mata-Rivera, M., Zagal-Flores, R., Barría-Huidobro, C. (eds) Telematics and Computing. WITCOM 2019. Communications in Computer and Information Science, vol 1053. Springer, Cham. https://doi.org/10.1007/978-3-030-33229-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-33229-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33228-0
Online ISBN: 978-3-030-33229-7
eBook Packages: Computer ScienceComputer Science (R0)