A New Approach for Calculating Similarity of Categorical Data

Jin, Cheng Hao; Li, Xun; Lee, Yang Koo; Pok, Gouchol; Ryu, Keun Ho

doi:10.1007/978-3-642-24106-2_74

Cheng Hao Jin⁴,
Xun Li⁴,
Yang Koo Lee⁴,
Gouchol Pok⁵ &
…
Keun Ho Ryu⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 206))

Included in the following conference series:

International Conference on Hybrid Information Technology

1769 Accesses

Abstract

Similarity measure is very important in data mining techniques such as clustering, nearest-neighbor classification, outlier detection and so on [1][4]. There are many similarity measures have been proposed. For numeric data, there are many Minkowski distance-based similarity measures. However, the similarity measures for categorical data have been studied for a long time, it also has many issues. The main issue is to understand relationship between categorical attribute values. For categorical data, the similarity measure is not clear as well as numeric data. In this paper, we propose a new approach to understand relationship between categorical data. This approach is based on artificial neural network to extract significant features for computing distance between two categorical data objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boriah, S., Chandola, V., Kumar, V.: Similarity Measures for Categorical Data: A Comparative Evaluation. In: ACM Computing Surveys (CSUR), pp. 243–254 (2008)
Google Scholar
Gershenson, C.: Artificial Neural Networks for Beginners (2003)
Google Scholar
Hornik, K.: Multilayer Feedforward Networks are Niversal Approximators. Neural networks 2, 359–366 (1989)
Article Google Scholar
Kelil, A., Wang, S.: SCS: A New Similarity Measure for Categorical Sequences. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 343–352 (2008)
Google Scholar
Li, X., Hwang, M.Y., Kim, H., Park, K.S., Bae, K.H., Ryu, K.H.: Extracting Method of Significant Features from Categorical Data. In: International Symposium on Remote Sensing (2010)
Google Scholar
Ahmad, A., Dey, L.: A Method to Compute Distance between two Categorical Values of Same Attribute in Unsupervised Learning for Categorical Data Set. Pattern Recognition Letters 28, 110–118 (2007)
Article Google Scholar
Sneat, P.H.A., Sokal, R.R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification (1973)
Google Scholar
Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. LNCS, pp. 16–27 (2007)
Google Scholar
Yi, W.: Artificial Neural Networks (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Education, Chungbuk National University, Cheongju, Republic of Korea
Cheng Hao Jin, Xun Li, Yang Koo Lee & Keun Ho Ryu
Department of Computer Science, Yanbian University of Science and Technology, Yanji, China
Gouchol Pok

Authors

Cheng Hao Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Koo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Gouchol Pok
View author publications
You can also search for this author in PubMed Google Scholar
Keun Ho Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

90301, 3F, Computer Engineering Department, Hannam University, 70 Hannamro, Daedeuk-gu, Daejeon, Korea
Geuk Lee
QinetiQ Company Fellow, Howard Science Limited, 24 Sunrise, WR14 2NJ, Malvern, United Kingdom
Daniel Howard
Institute of Mathematics, University of Warsaw, ul. Banacha 2, 02-097, Warsaw, Poland
Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, C.H., Li, X., Lee, Y.K., Pok, G., Ryu, K.H. (2011). A New Approach for Calculating Similarity of Categorical Data. In: Lee, G., Howard, D., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2011. Communications in Computer and Information Science, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24106-2_74

Download citation

DOI: https://doi.org/10.1007/978-3-642-24106-2_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24105-5
Online ISBN: 978-3-642-24106-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics