Measuring the Similarity for Heterogenous Data: An Ordered Probability-Based Approach

Le, SiQuang; Ho, TuBao

doi:10.1007/978-3-540-30214-8_10

SiQuang Le²⁰ &
TuBao Ho²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3245))

Included in the following conference series:

International Conference on Discovery Science

891 Accesses
5 Citations

Abstract

In this paper we propose a solution to the similarity measuring for heterogenous data. The key idea is to consider the similarity of a given attribute-value pair as the probability of picking randomly a value pair that is less similar than or equally similar in terms of order relations defined appropriately for data types. Similarities of attribute value pairs are then integrated into similarities between data objects using a statistical method. Applying our method in combination with distance-based clustering to real data shows the merit of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gowda, K.C., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recognition 24(6), 567–578 (1991)
Article Google Scholar
Gowda, K.C., Diday, E.: Unsuppervised learning throught symbolic clustering. Pattern Recognition lett. 12, 259–264 (1991)
Article Google Scholar
Gowda, K.C., Diday, E.: Symbolic clustering using a new similarity measure. IEEE Trans. Syst. Man Cybernet 22(2), 368–378 (1992)
Article Google Scholar
Ichino, M., Yaguchi, H.: Generalized minkowski metrics for mixed feature-type data analysis. IEEE Transactions on Systems Man, and Cybernetics 24(4) (1994)
Google Scholar
de Carvalho, F.A.T.: Proximity coefficients between boolean symbolic objects. In: Diday, E., et al. (eds.) New Approaches in Classification and Data Analysis. Studies in Classification, DataAnalysis, and Knowledge Organisation, vol. 5, pp. 387–394. Springer, Berlin (1994)
Google Scholar
de Carvalho, F.A.T.: Extension based proximity coefficients between constrained boolean symbolicobjects. In: Hayashi, C., et al. (eds.) IFCS 1996, pp. 370–378. Springer, Berlin (1996)
Google Scholar
Geist, S., Lengnink, K., Wille, R.: An order-theoretic foundation for similarity measures. In: Diday, E., Lechevallier, Y. (eds.) Ordinal and symbolic data analysis. studies in classification, data analysis, and knowledge organization, pp. 225–237. Springer, Heidelberg (1996)
Google Scholar
Fisher, R.A.: Statistical methods for research workers, 11th edn. Oliver and Boyd (1950)
Google Scholar
Stouffer, S.A., Suchman, E.A., Devinney, L.C., Williams, R.M.: Adjustment during army life. The American Solder, 1 (1949)
Google Scholar
Mudholkar, G.S., George, E.O.: The logit method for combining probabilities. In: Rustagi, J. (ed.) Symposium on Optimizing methods in statistics, pp. 345–366. Academic Press, London (1979)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press and McGraw-Hill (2002)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observation. In: Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. Statistical Data Analysis based on the L1 Norm, 405–416 (1987)
Google Scholar
Sneath, P.H.A.: The application of computers to taxonomy. Journal of general microbiology 17, 201–226 (1957)
Google Scholar
McQuitty, L.L.: Hierarchical linkage analysis for the isolation of types. Education and Psychological measurements 20, 55–67 (1960)
Article Google Scholar
Sokal, R.R., Michener, C.D.: Statistical method for evaluating systematic relationships. University of Kansas science bulletin 38, 1409–1438 (1958)
Google Scholar
McQuitty, L.L.: Expansion of similarity analysis by reciprocal pairs for discrete and continuous data. Education and Psychological measurements 27, 253–255 (1967)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, 923-1292, Japan
SiQuang Le & TuBao Ho

Authors

SiQuang Le
View author publications
You can also search for this author in PubMed Google Scholar
TuBao Ho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, 744 Motooka, Nishi, 819-0395, Fukuoka, Japan
Einoshin Suzuki
Kyushu University, 6–10–1 Hakozaki Higashi-ku, 812–8581, Fukuoka, Japan
Setsuo Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, S., Ho, T. (2004). Measuring the Similarity for Heterogenous Data: An Ordered Probability-Based Approach. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-30214-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics