Skip to main content

Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering

  • Conference paper
  • First Online:
German-Japanese Interchange of Data Analysis Results

Abstract

There are increasing requirements for analysing very large and complex datasets derived from recent super-high cost performance computer devices and its application software. We need to aggregate and then analyze those datasets. Symbolic Data Analysis (SDA) was proposed by E. Diday in 1980s (Billard L, Diday E (2007) Symboic data analysis. Wiley, Chichester), mainly targeted for large scale complex datasets. There are many researches of SDA with interval-valued data and histogram-valued data. On the other hand, recently, distribution-valued data is becoming more important, (e.g. Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework, vo 147. Elsevier Science Publishers B. V., Amsterdam, pp 27–41; Mizuta M, Minami H (2012) Analysis of distribution valued dissimilarity data. In: Gaul WA, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin/Heidelberg, pp 23–28). In this paper, we focus on distribution-valued dissimilarity data and hierarchical cluster analysis. Cluster analysis plays a key role in data mining, knowledge discovery, and also in SDA. Conventional inputs of cluster analysis are real-valued data, but in some cases, e.g., in cases of data aggregation, the inputs may be stochastic over ranges, i.e., distribution-valued dissimilarities. For hierarchical cluster analysis, an order relation of dissimilarity is necessary, i.e., dissimilarities need to satisfy the properties of an ultrametric. However, distribution-valued dissimilarity does not have a natural order relation. Therefore we develop a method for investigating order relation of distribution-valued dissimilarity. We also apply the ordering relation to hierarchical symbolic clustering. Finally, we demonstrate the use of our order relation for finding a hierarchical cluster of Japanese Internet sites according to Internet traffic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Billard L, Diday E (2007) Symbolic data analysis. Wiley, Chichester

    Google Scholar 

  • Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework, vol 147. Elsevier Science Publishers B. V., Amsterdam, pp 27–41

    Google Scholar 

  • Gordon AD (1985, 1987) A review of hierarchical classification. JSTOR 150:119–137

    Google Scholar 

  • Levy H (2006) Stochastic dominance: investment decision making under uncertainty. Studies in risk and uncertainty. Springer, New York

    Google Scholar 

  • McMorris FR, Neumann D (1983) Consensus functions defined on trees. Math Soc Sci 4:131–136

    Article  MathSciNet  MATH  Google Scholar 

  • Mizuta M, Minami H (2012) Analysis of distribution valued dissimilarity data. In: Gaul WA, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin/Heidelberg, pp 23–28

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yusuke Matsui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Matsui, Y., Komiya, Y., Minami, H., Mizuta, M. (2014). Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering. In: Gaul, W., Geyer-Schulz, A., Baba, Y., Okada, A. (eds) German-Japanese Interchange of Data Analysis Results. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01264-3_3

Download citation

Publish with us

Policies and ethics