Advertisement

Journal of Immigrant and Minority Health

, Volume 13, Issue 6, pp 1099–1109 | Cite as

The Advantage of Imputation of Missing Income Data to Evaluate the Association Between Income and Self-Reported Health Status (SRH) in a Mexican American Cohort Study

  • Anthony B. Ryder
  • Anna V. Wilkinson
  • Michelle K. McHugh
  • Katherine Saunders
  • Sumesh Kachroo
  • Anthony D’AmelioJr.
  • Melissa Bondy
  • Carol J. Etzel
Original Paper

Abstract

Missing data often occur in cross-sectional surveys and longitudinal and experimental studies. The purpose of this study was to compare the prediction of self-rated health (SRH), a robust predictor of morbidity and mortality among diverse populations, before and after imputation of the missing variable “yearly household income.” We reviewed data from 4,162 participants of Mexican origin recruited from July 1, 2002, through December 31, 2005, and who were enrolled in a population-based cohort study. Missing yearly income data were imputed using three different single imputation methods and one multiple imputation under a Bayesian approach. Of 4,162 participants, 3,121 were randomly assigned to a training set (to derive the yearly income imputation methods and develop the health-outcome prediction models) and 1,041 to a testing set (to compare the areas under the curve (AUC) of the receiver-operating characteristic of the resulting health-outcome prediction models). The discriminatory powers of the SRH prediction models were good (range, 69–72%) and compared to the prediction model obtained after no imputation of missing yearly income, all other imputation methods improved the prediction of SRH (P < 0.05 for all comparisons) with the AUC for the model after multiple imputation being the highest (AUC = 0.731). Furthermore, given that yearly income was imputed using multiple imputation, the odds of SRH as good or better increased by 11% for each $5,000 increment in yearly income. This study showed that although imputation of missing data for a key predictor variable can improve a risk health-outcome prediction model, further work is needed to illuminate the risk factors associated with SRH.

Keywords

Self-rated health Missing income data Data imputation techniques Mean substitution Multiple imputation Minority health 

Abbreviations

AUC

Area under the curve

CI

Confidence interval

DF

Degree of freedom

HH

Household

LD

Listwise deletion

LR

P value from LR testing

MACS

Mexican American Cohort Study

MAR

Missing at random

MCAR

Missing completely at random

MI

Multiple imputation

MNAR

Missing not at random

MS

Mean substitution

OR

Odds ratio

RB

Regression-based single imputation

RBE

Regression-based single imputation with error term

ROC

Receiver operating characteristic

SD

Standard deviation

SE

Standard error

SES

Socioeconomic status

SRH

Self-rated health

References

  1. 1.
    Raghunathan TE. What do we do with missing data? Some options for analysis of incomplete data. Annu Rev Public Health. 2004;25:99–117.PubMedCrossRefGoogle Scholar
  2. 2.
    Allison PL. Missing data. Newbury Park: Sage Publications; 2002.Google Scholar
  3. 3.
    Little RJ, Rubin DB. Statistical analysis with missing data. 2nd ed. New York: Wiley; 2002.Google Scholar
  4. 4.
    Streiner DL. The case of the missing data: methods of dealing with dropoouts and other research vagaries. Can J Psychiatry. 2002;47:68–75.PubMedGoogle Scholar
  5. 5.
    Graham JW. Missing data analysis: making it work in the real world. Annu Rev Psychol. 2009;60:549–76.PubMedCrossRefGoogle Scholar
  6. 6.
    Schafer JL. Multiple imputation: a primer. Stat Methods Med Res. 1999;8(1):3–15.PubMedCrossRefGoogle Scholar
  7. 7.
    Patrician PA. Multiple imputation for missing data. Res Nurs Health. 2002;25(1):76–84.PubMedCrossRefGoogle Scholar
  8. 8.
    Horton NJ, Kleinman KP. Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat. 2007;207:61–90.Google Scholar
  9. 9.
    Christensen LM, Haug PJ, Fiszman M. Mplus: a probabilistic medical language understanding system. Proceedings from the ACL-02 workshop on Natural language processing in the biomedical domain. Philadelphia: University of Pennsylvania; 2002.Google Scholar
  10. 10.
    Mackenbach JP, et al. The shape of the relationship between income and self-assessed health: an international study. Int J Epidemiol. 2005;34(2):286–93.PubMedCrossRefGoogle Scholar
  11. 11.
    Kim S, et al. Potential implications of missing income data in population-based surveys: an example from a postpartum survey in California. Public Health Rep. 2007;122(6):753–63.PubMedGoogle Scholar
  12. 12.
    Lannin DR, et al. Influence of socioeconomic and cultural factors on racial differences in late-stage presentation of breast cancer. JAMA. 1998;279(22):1801–7.PubMedCrossRefGoogle Scholar
  13. 13.
    Davern M, et al. The effect of income question design in health surveys on family income, poverty and eligibility estimates. Health Serv Res. 2005;40(5 Pt 1):1534–52.PubMedCrossRefGoogle Scholar
  14. 14.
    Ross NA, et al. Relation between income inequality and mortality in Canada and in the United States: cross sectional assessment using census data and vital statistics. BMJ. 2000;320(7239):898–902.PubMedCrossRefGoogle Scholar
  15. 15.
    Subramanian SV, Kawachi I. Income inequality and health: what have we learned so far? Epidemiol Rev. 2004;26:78–91.PubMedCrossRefGoogle Scholar
  16. 16.
    Banks J, et al. Disease and disadvantage in the United States and in England. JAMA. 2006;295(17):2037–45.PubMedCrossRefGoogle Scholar
  17. 17.
    Idler EL, Benyamini Y. Self-rated health and mortality: a review of twenty-seven community studies. J Health Soc Behav. 1997;38(1):21–37.PubMedCrossRefGoogle Scholar
  18. 18.
    Marmot M. The influence of income on health: views from an epidemiologist. Does money really matter? Or is it a marker for something else? Health Aff (Millwood). 2002;21:31–46.CrossRefGoogle Scholar
  19. 19.
    McGee DL, et al. Self-reported health status and mortality in a multiethnic US cohort. Am J Epidemiol. 1999;149(1):41–6.PubMedGoogle Scholar
  20. 20.
    Engels JM, Diehr P. Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol. 2003;56(10):968–76.PubMedCrossRefGoogle Scholar
  21. 21.
    Abrams B, Guendelman S. Nutrient intake of Mexican-American and non-Hispanic white women by reproductive status: results of two national studies. J Am Diet Assoc. 1995;95(8):916–8.PubMedCrossRefGoogle Scholar
  22. 22.
    Wilkinson AV, et al. Effects of nativity, age at migration, and acculturation on smoking among adult Houston residents of Mexican descent. Am J Public Health. 2005;95(6):1043–9.PubMedCrossRefGoogle Scholar
  23. 23.
    US Department of Health, Human Services, C.f.D.C.a. Prevention. Percentage of adults aged ≥ 20 years reporting selected adverse health characteristics by Body Mass Index (BMI) category. MMWR Morb Mortal Wkly Rep. 2006;55(23):656.Google Scholar
  24. 24.
    Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.CrossRefGoogle Scholar
  25. 25.
    Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839–43.PubMedGoogle Scholar
  26. 26.
    Hintz J, NCSS, PASS, and GESS (cited 2008 April 29). 2006. Available from: www.ncss.com.
  27. 27.
    Stewart AL, et al. Race/ethnicity, socioeconomic status and the health of pregnant women. J Health Psychol. 2007;12(2):285–300.PubMedCrossRefGoogle Scholar
  28. 28.
    Bewick V, Cheek L, Ball J. Statistics review 13: receiver operating characteristic curves. Crit Care. 2004;8(6):508–12.PubMedCrossRefGoogle Scholar
  29. 29.
    Farley T, et al. Stress, coping, and health: a comparison of Mexican immigrants, Mexican-Americans, and non-Hispanic whites. J Immigr Health. 2005;7(3):213–20.PubMedCrossRefGoogle Scholar
  30. 30.
    Finch BK, Vega WA. Acculturation stress, social support, and self-rated health among Latinos in California. J Immigr Health. 2003;5(3):109–17.PubMedCrossRefGoogle Scholar
  31. 31.
    Muthén B, Kaplan D, Hollis M. On structural equation modeling with data that are not missing completely at random. Psychometrika. 1987;52:431–62.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Anthony B. Ryder
    • 1
  • Anna V. Wilkinson
    • 1
  • Michelle K. McHugh
    • 1
  • Katherine Saunders
    • 1
  • Sumesh Kachroo
    • 1
  • Anthony D’AmelioJr.
    • 1
  • Melissa Bondy
    • 1
  • Carol J. Etzel
    • 1
  1. 1.Department of EpidemiologyUT MD Anderson Cancer CenterHoustonUSA

Personalised recommendations