Multiscale Clustering for Functional Data

Abstract

In an era of massive and complex data, clustering is one of the most important procedures for understanding and analyzing unstructured multivariate data. Classical methods such as K-means and hierarchical clustering, however, are not efficient in grouping data that are high dimensional and have inherent multiscale structures. This paper presents new clustering procedures that can adapt to multiscale characteristics and high dimensionality of data. The proposed methods are based on a novel combination of multiresolution analysis and functional data analysis. As the core of the methodology, a clustering approach using the concept of multiresolution analysis may reflect both the global trend and local activities of data, and functional data analysis handles the high-dimensional data efficiently. Practical algorithms to implement the proposed methods are further discussed. The empirical performance of the proposed methods is evaluated through numerical studies including a simulation study and real data analysis, which demonstrates promising results of the proposed clustering.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Antoniadis, A., Brossat, X., Cugliari, J., Poggi, J. M. (2013). Clustering functional data using wavelets. International Journal of Wavelets, Multiresolution and Information Processing, 11(01), 1350003.

    MathSciNet  Article  Google Scholar 

  2. Chiou, J. M., & Li, P. L. (2007). Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society Series B, 69, 679–699.

    MathSciNet  Article  Google Scholar 

  3. Floriello, D., & Vitelli, V. (2017). Sparse clustering of functional data. Journal of Multivariate Analysis, 154, 1–18.

    MathSciNet  Article  Google Scholar 

  4. Giacofci, M., Lambert–Lacroix, S., Marot, G., Picard, F. (2013). Wavelet–based clustering for mixed–effects functional models in high dimension. Biometrics, 69, 31–40.

    MathSciNet  Article  Google Scholar 

  5. Hansen, J., Ruedy, R., Sato, M., Lo, K. (2010). Global surface temperature change. Reviews of Geophysics, 48, RG4004, https://doi.org/10.1029/2010RG000345.

  6. Huang, N. E., & Shen, S. S. P. (2005). Hilbert-Huang transform and its applications. Singapore: World Scientific.

    Google Scholar 

  7. Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., Yen, N. C., Tung, C. C., Liu, H. H. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis. Proceedings of the Royal Society of London A, 454, 903–995.

    MathSciNet  Article  Google Scholar 

  8. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  9. James, G. M., & Sugar, C. A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98, 397–408.

    MathSciNet  Article  Google Scholar 

  10. Jaques, J., & Preda, C. (2013). Functional data clustering: a survey. Advances in Data Analysis and Classification, 8, 231–255.

    MathSciNet  Article  Google Scholar 

  11. Lee, T. C. M. (2004). Improved smoothing spline regression by combining estimates of different smoothness. Statistics & Probability Letters, 67, 133–140.

    MathSciNet  Article  Google Scholar 

  12. Mallat, S. (2009). A wavelet tour of signal processing, 3rd. New York: Academic Press.

    Google Scholar 

  13. Morris, J. S., & Carroll, R. J. (2006). Wavelet-based functional mixed models. Journal of the Royal Statistical Society, Series B, 68, 179–199.

    MathSciNet  Article  Google Scholar 

  14. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.

    Article  Google Scholar 

  15. Ray, S., & Mallick, B. (2006). Functional clustering by Bayesian wavelet methods. Journal of the Royal Statistical Society, Series B, 68, 305–332.

    MathSciNet  Article  Google Scholar 

  16. Tibshirani, R., Walther, G., Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society, Series B, 63, 411–423.

    MathSciNet  Article  Google Scholar 

  17. Wand, M. P. (2000). A comparison of regression spline smoothing procedures. Computational Statistics, 15, 443–462.

    MathSciNet  Article  Google Scholar 

  18. Wakefield, J., Zhou, C., Self, S. (2003). Modelling gene expression over time: curve clustering with informative prior distributions. Bayesian Statistics, 7, 721–732.

    MathSciNet  Google Scholar 

  19. Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105, 713–726.

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgments

We thank the Editor and referees for comments which led to a substantially improved manuscript. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korea government (NRF- 2016R1C1B1006572 and NRF-2018R1D1A1B07042933) and by NIH grants (R01HL111195 and R01MH109496).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hee-Seok Oh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lim, Y., Oh, H. & Cheung, Y.K. Multiscale Clustering for Functional Data. J Classif 36, 368–391 (2019). https://doi.org/10.1007/s00357-019-09313-9

Download citation

Keywords

  • Empirical mode decomposition
  • Functional data
  • High-dimensional data
  • Multiresolution analysis
  • Wavelet transform