Advertisement

Clustering Methods for Spherical Data: An Overview and a New Generalization

  • Sungsu Kim
  • Ashis SenGupta
Conference paper
Part of the ICSA Book Series in Statistics book series (ICSABSS)

Abstract

Recent advances in data acquisition technologies have led to massive amount of data collected routinely in information sciences and technology, as well as engineering sciences. In this big data era, a clustering analysis is a fundamental and crucial step in an attempt to explore structures and patterns in massive data sets, where clustering objects (data) are represented as vectors. Often such high-dimensional vectors are \(L_2\) normalized so that they lie on the surface of unit hypersphere, transforming them into spherical data. Thus, clustering such data is equivalent to grouping spherical data, where either cosine similarity or correlation is a desired metric to identify similar observations, rather than Euclidean similarity metrics. In this chapter, an overview of different clustering methods for spherical data in the literature is presented. A model-based generalization for asymmetric spherical data is also introduced.

References

  1. Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345–1382.MathSciNetzbMATHGoogle Scholar
  2. Dhillon, I. S., & Modha, D. S. (2001). Concept decompositions for large sparse text using clustering. Machine Learning, 42, 143–175.CrossRefzbMATHGoogle Scholar
  3. Dortet-Bernadet, J.-N., & Wicker, N. (2008). Model-based clustering on the unit sphere with an illustration using gene expression profiles. Biostatistics, 9, 66–80.CrossRefzbMATHGoogle Scholar
  4. Everitt, B. S., & Hand, D. J. (1981). Finite mixture distributions. London: Chapman and Hall.CrossRefzbMATHGoogle Scholar
  5. Hornik, K., Feinerer, I., Kober, M., & Buchta, C. (2012). Spherical K-means clustering. Journal of Statistical Software, 50, 1–22.CrossRefGoogle Scholar
  6. Jammalamadaka, S., & SenGupta, A. (2001). Topics in circular statistics. Singapore: World Scientific.CrossRefGoogle Scholar
  7. Johnson, R. A., & Wichern, D. W. (2008). Applied multivariate statistical analysis. New York: Pearson.zbMATHGoogle Scholar
  8. Kent, J. T. (1982). The Fisher-Bingham distribution on the sphere. Journal of the Royal Statistical Society Series B, 44, 71–80.MathSciNetzbMATHGoogle Scholar
  9. Kesemen, O., Tezel, Ö., & Özkul, E. (2016). Fuzzy c-means clustering algorithm for directional data (FCM4DD). Expert Systems with Applications, 58, 76–82.CrossRefGoogle Scholar
  10. Kim, S., & SenGupta, A. (2012). A three-parameter generalized von Mises distribution. Statistical Papers, 54, 685–693.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Peel, D., Whiten, W. J., & McLachlan, G. J. (2001). Fitting mixtures of Kent distributions to aid in joint set identification. Journal of the American Statistical Association, 96, 56–63.MathSciNetCrossRefGoogle Scholar
  12. Rosenbaum, P. R., Rubin, D. B. (1983) The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55Google Scholar
  13. SenGupta, A. (2016). High volatility, multimodal distributions and directional statistics. Special Invited Paper, Platinum Jubilee International Conference on Applicaitons of Statistics, Calcutta University, 21–23 DecGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of LouisianaLafayetteUSA
  2. 2.Applied Statistics UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations