Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering

Matsui, Yusuke; Komiya, Yuriko; Minami, Hiroyuki; Mizuta, Masahiro

doi:10.1007/978-3-319-01264-3_3

Yusuke Matsui²²,
Yuriko Komiya²³,
Hiroyuki Minami²³ &
…
Masahiro Mizuta²³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

890 Accesses

Abstract

There are increasing requirements for analysing very large and complex datasets derived from recent super-high cost performance computer devices and its application software. We need to aggregate and then analyze those datasets. Symbolic Data Analysis (SDA) was proposed by E. Diday in 1980s (Billard L, Diday E (2007) Symboic data analysis. Wiley, Chichester), mainly targeted for large scale complex datasets. There are many researches of SDA with interval-valued data and histogram-valued data. On the other hand, recently, distribution-valued data is becoming more important, (e.g. Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework, vo 147. Elsevier Science Publishers B. V., Amsterdam, pp 27–41; Mizuta M, Minami H (2012) Analysis of distribution valued dissimilarity data. In: Gaul WA, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin/Heidelberg, pp 23–28). In this paper, we focus on distribution-valued dissimilarity data and hierarchical cluster analysis. Cluster analysis plays a key role in data mining, knowledge discovery, and also in SDA. Conventional inputs of cluster analysis are real-valued data, but in some cases, e.g., in cases of data aggregation, the inputs may be stochastic over ranges, i.e., distribution-valued dissimilarities. For hierarchical cluster analysis, an order relation of dissimilarity is necessary, i.e., dissimilarities need to satisfy the properties of an ultrametric. However, distribution-valued dissimilarity does not have a natural order relation. Therefore we develop a method for investigating order relation of distribution-valued dissimilarity. We also apply the ordering relation to hierarchical symbolic clustering. Finally, we demonstrate the use of our order relation for finding a hierarchical cluster of Japanese Internet sites according to Internet traffic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Billard L, Diday E (2007) Symbolic data analysis. Wiley, Chichester
Google Scholar
Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework, vol 147. Elsevier Science Publishers B. V., Amsterdam, pp 27–41
Google Scholar
Gordon AD (1985, 1987) A review of hierarchical classification. JSTOR 150:119–137
Google Scholar
Levy H (2006) Stochastic dominance: investment decision making under uncertainty. Studies in risk and uncertainty. Springer, New York
Google Scholar
McMorris FR, Neumann D (1983) Consensus functions defined on trees. Math Soc Sci 4:131–136
Article MathSciNet MATH Google Scholar
Mizuta M, Minami H (2012) Analysis of distribution valued dissimilarity data. In: Gaul WA, Geyer-Schulz A, Schmidt-Thieme L, Kunze J (eds) Challenges at the interface of data analysis, computer science, and optimization. Studies in classification, data analysis, and knowledge organization. Springer, Berlin/Heidelberg, pp 23–28
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Yusuke Matsui
Information Initiative Center, Hokkaido University, Sapporo, Japan
Yuriko Komiya, Hiroyuki Minami & Masahiro Mizuta

Authors

Yusuke Matsui
View author publications
You can also search for this author in PubMed Google Scholar
Yuriko Komiya
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Minami
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Mizuta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yusuke Matsui .

Editor information

Editors and Affiliations

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Wolfgang Gaul
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Andreas Geyer-Schulz
The Institute of Statistical Mathematics, Tokyo, Japan
Yasumasa Baba
Graduate School of Management and Information Systems, Tama University, Tokyo, Japan
Akinori Okada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matsui, Y., Komiya, Y., Minami, H., Mizuta, M. (2014). Comparison of Two Distribution Valued Dissimilarities and Its Application for Symbolic Clustering. In: Gaul, W., Geyer-Schulz, A., Baba, Y., Okada, A. (eds) German-Japanese Interchange of Data Analysis Results. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01264-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-01264-3_3
Published: 10 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01263-6
Online ISBN: 978-3-319-01264-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics