Abstract
There are increasing requirements for analysing very large and complex datasets derived from recent super-high cost performance computer devices and its application software. We need to aggregate and then analyze those datasets. Symbolic Data Analysis (SDA) was proposed by E. Diday in 1980s (Billard L, Diday E (2007) Symboic data analysis. Wiley, Chichester), mainly targeted for large scale complex datasets. There are many researches of SDA with interval-valued data and histogram-valued data. On the other hand, recently, distribution-valued data is becoming more important, (e.g. Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework, vo 147. Elsevier Science Publishers B. V., Amsterdam, pp 27–41; Mizuta M, Minami H (2012) Analysis of distribution valued dissimilarity data. In: Gaul WA, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin/Heidelberg, pp 23–28). In this paper, we focus on distribution-valued dissimilarity data and hierarchical cluster analysis. Cluster analysis plays a key role in data mining, knowledge discovery, and also in SDA. Conventional inputs of cluster analysis are real-valued data, but in some cases, e.g., in cases of data aggregation, the inputs may be stochastic over ranges, i.e., distribution-valued dissimilarities. For hierarchical cluster analysis, an order relation of dissimilarity is necessary, i.e., dissimilarities need to satisfy the properties of an ultrametric. However, distribution-valued dissimilarity does not have a natural order relation. Therefore we develop a method for investigating order relation of distribution-valued dissimilarity. We also apply the ordering relation to hierarchical symbolic clustering. Finally, we demonstrate the use of our order relation for finding a hierarchical cluster of Japanese Internet sites according to Internet traffic data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Billard L, Diday E (2007) Symbolic data analysis. Wiley, Chichester
Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework, vol 147. Elsevier Science Publishers B. V., Amsterdam, pp 27–41
Gordon AD (1985, 1987) A review of hierarchical classification. JSTOR 150:119–137
Levy H (2006) Stochastic dominance: investment decision making under uncertainty. Studies in risk and uncertainty. Springer, New York
McMorris FR, Neumann D (1983) Consensus functions defined on trees. Math Soc Sci 4:131–136
Mizuta M, Minami H (2012) Analysis of distribution valued dissimilarity data. In: Gaul WA, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin/Heidelberg, pp 23–28
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Matsui, Y., Komiya, Y., Minami, H., Mizuta, M. (2014). Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering. In: Gaul, W., Geyer-Schulz, A., Baba, Y., Okada, A. (eds) German-Japanese Interchange of Data Analysis Results. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01264-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-01264-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01263-6
Online ISBN: 978-3-319-01264-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)