Cluster Computing

, Volume 22, Supplement 4, pp 8823–8833 | Cite as

Panel data clustering analysis based on composite PCC: a parametric approach

  • Juan Yang
  • Yuantao XieEmail author
  • Yabo Guo


This paper proposed a panel data clustering model based on Hierarchical Nested Archimedean Copula (HNAC) model and compound PCC models. The method provides a new approach to panel data clustering, which breaks through the limitations of the traditional data clustering and time series clustering. This article makes full use of the dependence structure between the sectional individuals, as well as the degree of correlation between time series data. The similar structure was constructed by HNAC and Pair Copula to reflect the change of the clustering results. The selection of Copula clusters is very flexible giving the clustering results more accurate, robust, and easily interpreted. The computing efficiency is high and the estimation for the goodness-of-fit test are given based on compound PCC method in this paper. In the case study, the clustering results of compound PCC models are excellent. The result shows that the compound PCC models are effective and useful.


Panel data clustering Pair copula construction (PCC) Hierarchical Nested Archimedean Copula 



The research is supported by the National Natural Science Foundation Project of China (No. 71303045)and the Fundamental Research Funds for the Central Universities in UIBE (CXTD9-04), “Research on the growth direction and path of economic new kinetic energy driven by scientific and technological innovation” (ZLY201703) and “Research on the strategic direction and path of modern economic system driven by scientific and technological innovation” (ZLY201734).


  1. 1.
    Smieja, M., Tabor, J.: Entropy approximation in lossy source coding problem. Entropy 17(5), 3400–3418 (2015)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Machine Learning Proceedings of the Nineteenth International Conference (ICML 2002), pp. 27–34 (2002)Google Scholar
  3. 3.
    Chi, J.T., Chi, E.C., Baraniuk, R.G.: K-pod: a method for k-means clustering of missing data. Am. Stat. 70(1), 91–99 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Melnykov, I., Michael, S., Melnykov, V.: Semi-supervised model-based clustering with positive and negative constraints. Adv. Data Anal. Classif. 10, 1–23 (2015)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bilgrau, A.E., Eriksen, P.S., Rasmussen, J.G., Johnsen, H.E., Dybkaer, K., Boegsted, M.: Gmcm : unsupervised clustering and meta-analysis using Gaussian mixture copula models. J. Stat. Softw. 070(2), 1–24 (2016)CrossRefGoogle Scholar
  6. 6.
    Marek Smieja, M.W.: Constrained clustering with a complex cluster structure. Adv. Data Anal. Classif. 11, 493–518 (2017)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Nielsen, J.D., Dean, C.B.: Adaptive functional mixed NHPP models for the analysis of recurrent event panel data. Comput. Stat. Data Anal. 52(7), 3670–3685 (2008)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Shaikh, M., McNicholas, P.D., Desmond, A.F.: A pseudo-EM algorithm for clustering incomplete longitudinal data. Int. J. Biostat. 6(1), 1–15 (2010)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Xie, Y.T., Yang, J.: Panel data clustering analysis based on density. Stat. Inf. Forum 40–53. (2014) In ChineseGoogle Scholar
  11. 11.
    Xie, Y.T., Yang, J., Liu, H.Y.: Agriculture risk regionalization analysis based on panel data clustering with affinity propagation. Stat. Inf. Forum 34–54. (2017) In ChineseGoogle Scholar
  12. 12.
    Guan, X., Zeng, W., Wang, N.: MRI data analysis of affinity propagation clustering based on similarity matrix reduction. Comput. Eng. 117–134 (2016)Google Scholar
  13. 13.
    Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence. Insurance 44(2), 182–198 (2009)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Haff, I.H.: Parameter estimation for pair-copula constructions. Bernoulli 19(2), 462–491 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Shi, P., Yang, L.: Pair copula constructions for insurance experience rating. J. Am. Stat. Assoc. (2017). CrossRefzbMATHGoogle Scholar
  16. 16.
    Czado, C., Schepsmeier, U., Min, A.: Maximum likelihood estimation of mixed C-vines with application to exchange rates. Stat. Model. 12(3), 229–255 (2012)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Czado, C.: Pair-Copula Constructions of Multivariate Copulas: Copula Theory and Its Applications. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Di Lascio, F., Marta, L., Disegna, M.: A copula-based clustering algorithm to analyze EU country diets. Knowl. Based Syst. 132, 72–84 (2017)CrossRefGoogle Scholar
  19. 19.
    Roy, A., Parui, S.K.: Pair-Copula Based Mixture Models and their Application in Clustering. Elsevier, New York (2014)CrossRefGoogle Scholar
  20. 20.
    Smith, M.S., Min, A., Almeida, C., Czado, C.: Modeling longitudinal data using a pair-copula decomposition of serial dependence. J. Am. Stat. Assoc. 105(492), 1467–1479 (2010)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Genest, C., Nešlehová, J.G., Rémillard, B.: Asymptotic behavior of the empirical multilinear copula process under broad conditions. J. Multivar. Anal. 159, 82–110 (2017)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Comprehensive DevelopmentChinese Academy of Science and Technology for DevelopmentBeijingChina
  2. 2.School of Insurance and EconomicsUniversity of International Business and EconomicsBeijingChina

Personalised recommendations