Clustering dependent observations with copula functions

Di Lascio, F. Marta L.; Giannerini, Simone

doi:10.1007/s00362-016-0822-3

Clustering dependent observations with copula functions

Regular Article
Published: 26 August 2016

Volume 60, pages 35–51, (2019)
Cite this article

Statistical Papers Aims and scope Submit manuscript

676 Accesses
5 Citations
Explore all metrics

Abstract

This paper deals with the problem of clustering dependent observations according to their underlying complex generating process. Di Lascio and Giannerini (Journal of Classification 29(1):50–75, 2012) introduced the CoClust, a clustering algorithm based on copula function that achieves the task but has a high computational burden. Moreover, the CoClust automatically allocates all the observations to the clusters; thus, it cannot discard potentially irrelevant observations. In this paper we introduce an improved version of the CoClust that both overcomes these issues and performs better in many respects. By means of a Monte Carlo study we investigate the features of the algorithm and show that it improves consistently with respect to the old CoClust. The validity of our proposal is also supported by applications to real data sets of human breast tumor samples for which the algorithm provides a meaningful biological interpretation. The new algorithm is implemented and made available through an updated version of the R package CoClust.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based clustering using copulas with applications

Article 23 July 2015

A robust model-based clustering based on the geometric median and the median covariation matrix

Article 20 December 2023

Model selection and application to high-dimensional count data clustering

Article 13 November 2018

References

Brechmann E, Schepsmeier U (2013) Modeling dependence with c- and d-vine copulas: the R package CDVine. J Stat Softw 52(3):1–27
Article Google Scholar
Cherubini U, Luciano E, Vecchiato W (2004) Copula methods in finance. Wiley, Chichester
Book MATH Google Scholar
Clarke K (2007) A simple distribution-free test for non-nested model selection. Polit Anal 15:347–363
Article Google Scholar
Di Lascio FML, Giannerini S (2012) A copula-based algorithm for discovering patterns of dependent observations. J Classif 29(1):50–75
Article MathSciNet MATH Google Scholar
Di Lascio FML, Giannerini S (2015) CoClust. R package version 0.3-1
Di Lascio FML, Giannerini S, Reale A (2015) Exploring copulas for the imputation of complex dependent data. Stat Methods Appl 24(1):159–175
Article MathSciNet MATH Google Scholar
Dortet-Bernadet JL, Wicker N (2008) Model-based clustering on the unit sphere with an illustration using gene expression profiles. Biostatistics 9(1):66–80
Article MATH Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868
Article Google Scholar
Fraley C, Raftery A (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Article MATH Google Scholar
Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, kallioniemi OP, Wilfond B, Borg A, Dougherty E, Kononen J, Bubendorf L, Fehrle W, Pittaluga S, Gruvberger S, Loman N, Johannsson O, Olsson H, Sauter G (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548
Article Google Scholar
Joe H, Xu J (1996) The estimation method of inference functions for margins for multivariate models. Technical Report 166, Department of Statistics, University of British Columbia
Nelsen RB (2006) Introduction to copulas. Springer, New York
MATH Google Scholar
Roverato A, Di Lascio FML (2011) Wilks’ \(\lambda \) dissimilarity measures for gene clustering: an approach based on the identification of transcription modules. Biometrics 67(4):1236–1248
Article MathSciNet MATH Google Scholar
Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231
MATH Google Scholar
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E, Golub T (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912
Article Google Scholar
Trivedi PK, Zimmer DM (2005) Copula modeling: an introduction for practitioners. Found Trends Econom 1:1–111
Article MATH Google Scholar
Vuong Q (1989) Likelihood ratio tests formodel selection and non-nested hypotheses. Econometrica 57:307–333
Article MathSciNet MATH Google Scholar
Yeung K, Fraley C, Murua A, Raftery A, Ruzzo W (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Article Google Scholar
Zimmer DM, Trivedi PK (2006) Using trivariate copulas to model sample selection and treatment effects: application to family health care demand. J Bus Econ Stat 24:63–76
Article MathSciNet Google Scholar

Download references

Acknowledgments

F. Marta L. Di Lascio acknowledges the support of Free University of Bozen-Bolzano, Faculty of Economics and Management, via the project “Multivariate analysis techniques based on copula function”.

Author information

Authors and Affiliations

Faculty of Economics and Management, University of Bozen-Bolzano, Piazza Università 1, 39100, Bolzano, Italy
F. Marta L. Di Lascio
Department of Statistical Sciences, University of Bologna, via Belle Arti 41, 40126, Bologna, Italy
Simone Giannerini

Authors

F. Marta L. Di Lascio
View author publications
You can also search for this author in PubMed Google Scholar
Simone Giannerini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. Marta L. Di Lascio.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Di Lascio, F.M.L., Giannerini, S. Clustering dependent observations with copula functions. Stat Papers 60, 35–51 (2019). https://doi.org/10.1007/s00362-016-0822-3

Download citation

Received: 22 April 2016
Revised: 11 August 2016
Published: 26 August 2016
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s00362-016-0822-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering dependent observations with copula functions

Abstract

Access this article

Similar content being viewed by others

Model-based clustering using copulas with applications

A robust model-based clustering based on the geometric median and the median covariation matrix

Model selection and application to high-dimensional count data clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Clustering dependent observations with copula functions

Abstract

Access this article

Similar content being viewed by others

Model-based clustering using copulas with applications

A robust model-based clustering based on the geometric median and the median covariation matrix

Model selection and application to high-dimensional count data clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation