Abstract
Classical data analysis considers data vectors with real-valued or categorical components. In contrast, Symbolic Data Analysis (SDA) deals with data vectors whose components are intervals, sets of categories, or even frequency distributions. SDA generalizes common methods of multivariate statistics to the case of symbolic data tables. This paper presents a brief survey on basic problems and methods of this fast-developing branch of data analysis. As an alternative to the current more or less heuristic approaches, we propose a new probabilistic approach in this context. Our presentation concentrates on visualization, dissimilarities, and partition-type clustering for symbolic data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baddeley, A.J., and Molchanov, I.S. (1997): On the expected measure of a random set. In: D. Jeulin (ed.): Advances in theory and applications of random sets. World Scientific, Singapore, 3–20.
Baddeley, A.J., and Molchanov, I.S. (1998): Averaging of random sets based on their distance functions. Journal of Mathematical Imaging and Vision 8, 79–92.
Bock, H.-H. (1996a): Probability models and hypotheses testing in partitioning cluster analysis. In: Ph. Arable, L. Hubert, and G. De Soete (Eds.): Clustering and classification. World Science, River Edge, NJ, 1996, 377–453.
Bock, H.-H. (1996b): Probabilistic models in cluster analysis. Computational Statistics and Data Analysis 23, 5–28.
Bock, H.-H. (1996c): Probabilistic models in partitional cluster analysis. In: A. Ferligoj and A. Kramberger (Eds.): Developments in data analysis. FDV, Metodoloski zvezki, 12, Ljubljana, Slovenia, 1996, 3–25.
Bock, H.-H. (1999): Clustering and neural network approaches. In: W. Gaul, and H. Locarek-Junge (Eds): Classification in the information age. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg, 1999, 42–57.
Bock, H.-H. (2003): Clustering methods and Kohonen maps for symbolic data. Journal of the Japanese Society of Computational Statistics 15.2, 217–229.
Bock, H.-H (2005): Optimization in symbolic data analysis: dissimilarities, class centers, and clustering. In: D. Baier, R. Decker, and L. Schmidt-Thieme (eds.): Data analsis and decision support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg, 3–10.
Bock, H.-H. (2008): Visualizing symbolic data by Kohonen maps. In: E. Diday, and M. Noirhomme (Eds.): Symbolic data analysis and the SODAS software. Wiley, Chichester, 2008, 205–234.
Bock, H.-H., and Diday, E. (2000): Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg.
De Carvalho, F., Brito, B., and Bock, H.-H. (2005): Dynamic clustering for interval data based on L 2 distance. Computational Statistics 21, 231–250.
Diday, E., and Noirhomme, M. (Eds.) (2008): Symbolic data analysis and the SODAS software. Wiley, Chichester.
El Golli, A., Conan-Guez, B., and Rossi, F. (2004): A self-organizing map for dissimilarity data. In: D. Banks, L. House, F.R. McMorris, P. Arabie, and W. Gaul (Eds.): Classification, clustering, and data mining applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg, 61–68.
Hansen, P., and Jaumard, B. (1997): Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215.
Johnson, N.L., Kotz, S., and Balakrishnan, N (1994): Continuous univariate distributions, Vol. 1. Wiley, New York.
Kotz, S., Balakrishnan, N., Read, C.B., and Vidakovic, B. (2006): Encyclopedia of statistical sciences, Vol. 4. Wiley, New York.
Kruse, R. (1987): On the variance of random sets. Journal of Mathematical Analysis and Applications 122(2), 469–473.
Mathéron, G. (1975): Random sets and integral geometry. Wiley, New York.
Molchanov, I. (1997): Statistical problems for random sets. In: J. Goutsias (Ed.): Random sets: theory and applications. Springer, Berlin, 27–45.
Nordhoff, O. (2003): Expectation of random intervals (in German: Erwartungswerte zufälliger Quader). Diploma thesis. Institute of Statistics, RWTH Aachen University, 2003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bock, HH. (2009). Analyzing Symbolic Data. In: Gaul, W., Bock, HH., Imaizumi, T., Okada, A. (eds) Cooperation in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00668-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-00668-5_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00667-8
Online ISBN: 978-3-642-00668-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)