Abstract
Similarity measure is very important in data mining techniques such as clustering, nearest-neighbor classification, outlier detection and so on [1][4]. There are many similarity measures have been proposed. For numeric data, there are many Minkowski distance-based similarity measures. However, the similarity measures for categorical data have been studied for a long time, it also has many issues. The main issue is to understand relationship between categorical attribute values. For categorical data, the similarity measure is not clear as well as numeric data. In this paper, we propose a new approach to understand relationship between categorical data. This approach is based on artificial neural network to extract significant features for computing distance between two categorical data objects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boriah, S., Chandola, V., Kumar, V.: Similarity Measures for Categorical Data: A Comparative Evaluation. In: ACM Computing Surveys (CSUR), pp. 243–254 (2008)
Gershenson, C.: Artificial Neural Networks for Beginners (2003)
Hornik, K.: Multilayer Feedforward Networks are Niversal Approximators. Neural networks 2, 359–366 (1989)
Kelil, A., Wang, S.: SCS: A New Similarity Measure for Categorical Sequences. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 343–352 (2008)
Li, X., Hwang, M.Y., Kim, H., Park, K.S., Bae, K.H., Ryu, K.H.: Extracting Method of Significant Features from Categorical Data. In: International Symposium on Remote Sensing (2010)
Ahmad, A., Dey, L.: A Method to Compute Distance between two Categorical Values of Same Attribute in Unsupervised Learning for Categorical Data Set. Pattern Recognition Letters 28, 110–118 (2007)
Sneat, P.H.A., Sokal, R.R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification (1973)
Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. LNCS, pp. 16–27 (2007)
Yi, W.: Artificial Neural Networks (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, C.H., Li, X., Lee, Y.K., Pok, G., Ryu, K.H. (2011). A New Approach for Calculating Similarity of Categorical Data. In: Lee, G., Howard, D., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2011. Communications in Computer and Information Science, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24106-2_74
Download citation
DOI: https://doi.org/10.1007/978-3-642-24106-2_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24105-5
Online ISBN: 978-3-642-24106-2
eBook Packages: Computer ScienceComputer Science (R0)