Skip to main content

Measuring the Similarity for Heterogenous Data: An Ordered Probability-Based Approach

  • Conference paper
Book cover Discovery Science (DS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3245))

Included in the following conference series:

Abstract

In this paper we propose a solution to the similarity measuring for heterogenous data. The key idea is to consider the similarity of a given attribute-value pair as the probability of picking randomly a value pair that is less similar than or equally similar in terms of order relations defined appropriately for data types. Similarities of attribute value pairs are then integrated into similarities between data objects using a statistical method. Applying our method in combination with distance-based clustering to real data shows the merit of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gowda, K.C., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recognition 24(6), 567–578 (1991)

    Article  Google Scholar 

  2. Gowda, K.C., Diday, E.: Unsuppervised learning throught symbolic clustering. Pattern Recognition lett. 12, 259–264 (1991)

    Article  Google Scholar 

  3. Gowda, K.C., Diday, E.: Symbolic clustering using a new similarity measure. IEEE Trans. Syst. Man Cybernet 22(2), 368–378 (1992)

    Article  Google Scholar 

  4. Ichino, M., Yaguchi, H.: Generalized minkowski metrics for mixed feature-type data analysis. IEEE Transactions on Systems Man, and Cybernetics 24(4) (1994)

    Google Scholar 

  5. de Carvalho, F.A.T.: Proximity coefficients between boolean symbolic objects. In: Diday, E., et al. (eds.) New Approaches in Classification and Data Analysis. Studies in Classification, DataAnalysis, and Knowledge Organisation, vol. 5, pp. 387–394. Springer, Berlin (1994)

    Google Scholar 

  6. de Carvalho, F.A.T.: Extension based proximity coefficients between constrained boolean symbolicobjects. In: Hayashi, C., et al. (eds.) IFCS 1996, pp. 370–378. Springer, Berlin (1996)

    Google Scholar 

  7. Geist, S., Lengnink, K., Wille, R.: An order-theoretic foundation for similarity measures. In: Diday, E., Lechevallier, Y. (eds.) Ordinal and symbolic data analysis. studies in classification, data analysis, and knowledge organization, pp. 225–237. Springer, Heidelberg (1996)

    Google Scholar 

  8. Fisher, R.A.: Statistical methods for research workers, 11th edn. Oliver and Boyd (1950)

    Google Scholar 

  9. Stouffer, S.A., Suchman, E.A., Devinney, L.C., Williams, R.M.: Adjustment during army life. The American Solder, 1 (1949)

    Google Scholar 

  10. Mudholkar, G.S., George, E.O.: The logit method for combining probabilities. In: Rustagi, J. (ed.) Symposium on Optimizing methods in statistics, pp. 345–366. Academic Press, London (1979)

    Google Scholar 

  11. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press and McGraw-Hill (2002)

    Google Scholar 

  12. MacQueen, J.: Some methods for classification and analysis of multivariate observation. In: Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  13. Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. Statistical Data Analysis based on the L1 Norm, 405–416 (1987)

    Google Scholar 

  14. Sneath, P.H.A.: The application of computers to taxonomy. Journal of general microbiology 17, 201–226 (1957)

    Google Scholar 

  15. McQuitty, L.L.: Hierarchical linkage analysis for the isolation of types. Education and Psychological measurements 20, 55–67 (1960)

    Article  Google Scholar 

  16. Sokal, R.R., Michener, C.D.: Statistical method for evaluating systematic relationships. University of Kansas science bulletin 38, 1409–1438 (1958)

    Google Scholar 

  17. McQuitty, L.L.: Expansion of similarity analysis by reciprocal pairs for discrete and continuous data. Education and Psychological measurements 27, 253–255 (1967)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Le, S., Ho, T. (2004). Measuring the Similarity for Heterogenous Data: An Ordered Probability-Based Approach. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30214-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23357-2

  • Online ISBN: 978-3-540-30214-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics