Skip to main content
Log in

Exploring copulas for the imputation of complex dependent data

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

In this work we introduce a copula-based method for imputing missing data by using conditional density functions of the missing variables given the observed ones. In theory, such functions can be derived from the multivariate distribution of the variables of interest. In practice, it is very difficult to model joint distributions and derive conditional distributions, especially when the margins are different. We propose a natural solution to the problem by exploiting copulas so that we derive conditional density functions through the corresponding conditional copulas. The approach is appealing since copula functions enable us (1) to fit any combination of marginal distribution functions, (2) to take into account complex multivariate dependence relationships and (3) to model the marginal distributions and the dependence structure separately. We describe the method and perform a Monte Carlo study in order to compare it with two well-known imputation techniques: the nearest neighbour donor imputation and the regression imputation by EM algorithm. Our results indicate that the proposal compares favourably with classical methods in terms of preservation of microdata, margins and dependence structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Chen J, Shao J (2000) Nearest neighbour imputation for survey data. J Off Stat 16(2):113–131

    MATH  Google Scholar 

  • Cherubini U, Luciano E, Vecchiato W (2004) Copula methods in finance. Wiley, Chichester

    Book  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation for incomplete data via the EM algorithm. J R Stat Soc Ser B Stat Methodol 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  • Hörmann W, Leydold J, Derflinger G (2007) Inverse transformed density rejection for unbounded monotone densities. ACM Trans Model Comput Simul 18(1):16

  • Jhun M, Jeong HC, Koo JY (2007) On the use of adaptive nearest neighbors for missing value imputation. Commun Stat Simul Comput 36:1275–1286

    Article  MATH  MathSciNet  Google Scholar 

  • Joe H (1997) Multivariate models and multivariate concepts. Chapman & Hall, New York

    Book  MATH  Google Scholar 

  • Joe H, Xu J (1996) The estimation method of inference functions for margins for multivariate models. Technical Report 166, Department of Statistics, University of British Columbia

  • Käärik E, Käärik M (2009) Modeling dropouts by conditional distribution, a copula-based approach. J Stat Plan Inference 139:3830–3835

    Article  MATH  Google Scholar 

  • Kalton G, Kasprzyk D (1982) Imputing for missing survey responses. Proceedings of the survey research methods section. Washington DC, American Statistical Association, p 22–31

  • Kalton G, Kasprzyk D (1986) The treatment of missing survey data. Surv Methodol 12:1–16

    Google Scholar 

  • Little RJA (1988) Missing data adjustments in large surveys. J Bus Econ Stat 6(2):287–295

    Google Scholar 

  • Muñoz JF, Rueda M (2009) New imputation methods for missing data using quantiles. J Comput Appl Math 232:305–317

    Article  MATH  MathSciNet  Google Scholar 

  • Nelsen RB (2006) Introduction to copulas. Springer, New York

    MATH  Google Scholar 

  • Rivero C, Castillo A, Zufiria PJ, Valdés T (2004) Global dynamics of a system governing an algorithm for regression with censored and non-censored data under general errors. J Comput Appl Math 166:535–551

    Article  MATH  MathSciNet  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231

    MathSciNet  Google Scholar 

  • Trivedi PK, Zimmer DM (2005) Copula modeling: an introduction for practitioners. Foundations and trends in econometrics, vol 1. Boston, Now Publisher Inc, pp 1–111

  • Wang Y, Wan W, Wang RS, Feng E (2009) Model, properties and imputation method of missing snp genotype data utilizing mutual information. J Comput Appl Math 229:168–174

    Article  MATH  MathSciNet  Google Scholar 

  • Zimmer DM, Trivedi PK (2006) Using trivariate copulas to model sample selection and treatment effects: application to family health care demand. J Bus Econ Stat 24:63–76

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Paola Monari (University of Bologna, Italy) and Antonia Manzari (Italian Statistical Institute, ISTAT) for their support and useful discussions. The first author acknowledges the support of Free University of Bozen-Bolzano, School of Economics and Management via the project “Multivariate analysis techniques based on copula function”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Marta L. Di Lascio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Di Lascio, F.M.L., Giannerini, S. & Reale, A. Exploring copulas for the imputation of complex dependent data. Stat Methods Appl 24, 159–175 (2015). https://doi.org/10.1007/s10260-014-0287-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-014-0287-2

Keywords

Navigation