Organizing and Analyzing the Activity Data in NHANES

  • Andrew LerouxEmail author
  • Junrui Di
  • Ekaterina Smirnova
  • Elizabeth J Mcguffey
  • Quy Cao
  • Elham Bayatmokhtari
  • Lucia Tabacu
  • Vadim Zipunnikov
  • Jacek K Urbanek
  • Ciprian Crainiceanu


The NHANES study contains objectively measured physical activity data collected using hip-worn accelerometers from multiple cohorts. However, using the accelerometry data has proven daunting because (1) currently, there are no agreed-upon standard protocols for data storage and analysis; (2) data exhibit heterogeneous patterns of missingness due to varying degrees of adherence to wear-time protocols; (3) sampling weights need to be carefully adjusted and accounted for in individual analyses; (4) there is a lack of reproducible software that transforms the data from its published format into analytic form; and (5) the high dimensional nature of accelerometry data complicates analyses. Here, we provide a framework for processing, storing, and analyzing the NHANES accelerometry data for the 2003–2004 and 2005–2006 surveys. We also provide an NHANES data package in R, to help disseminate high-quality, processed activity data combined with mortality and demographic information. Thus, we provide the tools to transition from “available data online” to “easily accessible and usable data”, which substantially reduces the large upfront costs of initiating studies of association between physical activity and human health outcomes using NHANES. We apply these tools in an analysis showing that accelerometry features have the potential to predict 5-year all-cause mortality better than known risk factors such as age, cigarette smoking, and various comorbidities.


Accelerometry Physical activity NHANES Prediction 



We would like to thank the CDC, specifically the National Center for Health Statistics for collecting, organizing, and making public this unique data resource. We would also like to thank them for the permission to repost the publicly available NHANES and NDI data in analytic format. Also, we would like to thank the thousands of anonymous participants in the NHANES, whose data led to the exciting findings in this paper.


This research was supported by National Heart, Lung, and Blood Institute (R 01 HL123407), National Institute of Neurological Disorders and Stroke (R 01 NS060910), and National Institute on Aging Training Grant (T 32 AG000247).


  1. 1.
    Banack HR, Kaufman JS (2014) The obesity paradox: understanding the effect of obesity on mortality among individuals with cardiovascular disease. Prev Med 62:96–102. CrossRefGoogle Scholar
  2. 2.
    Centers for Disease Control and Prevention (2017) About the national health and nutrition examination survey.
  3. 3.
    Cooper R, Huang L, Hardy R, Crainiceanu A, Harris T, Schrack JA, Crainiceanu C, Kuh D (2017) Obesity history and daily patterns of physical activity at age 60-64 years: findings from the MRC national survey of health and development. J Gerontol A Biol Sci Med Sci 72(10):1424–1430CrossRefGoogle Scholar
  4. 4.
    Curtin L, Mohadjer L, Dohrmann S (2012) The national health and nutrition examination survey: sample design, 1999–2006. Vital Health Stat 2(155):1–39Google Scholar
  5. 5.
    Di C, Crainiceanu CM, Caffo BS, Punjabi NM (2009) Multilevel functional principial component analysis. Ann Appl Stat 3(1):458–488MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Di J, Leroux A, Urbanek J, Varadhan R, Spira A, Schrack J, Zipunnikov V (2017) Patterns of sedentary and active time accumulation are associated with mortality in US adults: the NHANES study. bioRxiv.
  7. 7.
    Gellar JE, Colantuoni E, Needham DM, Crainiceanu CM (2015) Cox regression models with functional covariates for survival data. Stat Model 15(3):256–278MathSciNetCrossRefGoogle Scholar
  8. 8.
    Huang L, Scheipl F, Goldsmith J, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, Reiss P (2016) refund: Regression with functional dataGoogle Scholar
  9. 9.
    Klenk J, Srulijes K, Schatton C, Schwickert L, Maetzler W, Becker C, Synofzik M (2016) Ambulatory activity components deteriorate differently across neurodegenerative diseases: a cross-sectional sensor-based study. Neurodegener Dis 16:317–323CrossRefGoogle Scholar
  10. 10.
    Krane-Gartiser K, Henriksen TEG, Morken G, Vaaler A, Fasmer OB (2014) Actigraphic assessment of motor activity in acutely admitted inpatients with bipolar disorder. PLoS ONE 9(2):1–9. CrossRefGoogle Scholar
  11. 11.
    Krane-Gartiser K, Henriksen TEG, Vaaler AE, Fasmer OB, Morken G (2015) Actigraphically assessed activity in unipolar depression: a comparison of inpatients with and without motor retardation. J Clin Psychiatry 76(9):1181–1187CrossRefGoogle Scholar
  12. 12.
    Lee E, Zhu H, Kong D, Wang Y, Giovanello KS, Ibrahim JG (2015) Bflcrm: a bayesian functional linear cox regression model for predicting time to conversion to alzheimer’s disease. Ann Appl Stat 9(4):2153–2178MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Leroux A (2018) rnhanesdata: NHANES accelerometry data pipeline. R package version 1.0.
  14. 14.
    Lohr SL (2009) Sampling: design and analysis, 2nd edn. Duxbury Press, AustraliazbMATHGoogle Scholar
  15. 15.
    Lumley T (2010) Complex surveys: a guide to analysis using R. Wiley series in survey methodology. Wiley, Hoboken, NJCrossRefGoogle Scholar
  16. 16.
    Lumley T (2017) survey: Analysis of complex sample surveys. R package version 3.32Google Scholar
  17. 17.
    Lumley T, Scott A (2015) AIC and BIC for modeling with complex survey data. J Surv Stat Methodol 3(1):1–18. CrossRefGoogle Scholar
  18. 18.
    National Cancer Institute (2018) Risk factor monitoring and methods: SAS programs for analyzing nhanes 2003 2004 accelerometer data.
  19. 19.
    National Center for Health Statistics (2015) Office of analysis and epidemiology, public-use linked mortality file.
  20. 20.
    Preston SH, Stokes A (2014) Obesity paradox: conditioning on disease enhances biases in estimating the mortality risks of obesity. Epidemiology 25(3):454–461CrossRefGoogle Scholar
  21. 21.
    R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  22. 22.
    Ramsay J, Silverman B (2005) Functional data analysis. Springer, New YorkzbMATHGoogle Scholar
  23. 23.
    Robillard R, Hermens DF, Naismith SL, White D, Rogers NL, Ip TK, Mullin SJ, Alvares GA, Guastella AJ, Smith KL, Rong Y, Whitwell B, Southan J, Glozier N, Scott EM, Hickie IB (2015) Ambulatory sleep-wake patterns and variability in young people with emerging mental disorders. J Psychiatry Neurosci 40(1):28–37CrossRefGoogle Scholar
  24. 24.
    Schrack JA, Zipunnikov V, Goldsmith J, Bai J, Simonsick EM, Crainiceanu C, Ferrucci L (2014) Assessing the “physical cliff”: detailed quantification of age-related differences in daily patterns of physical activity. J Gerontol A Biol Sci Med Sci 69(8):973–979CrossRefGoogle Scholar
  25. 25.
    Shou H, Zipunnikov V, Crainiceanu CM, Greven S (2015) Structured functional principal component analysis. Biometrics 71(1):247–257MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Steeves JA, Murphy RA, Crainiceanu CM, Zipunnikov V, Van Domelen DR, Harris TB (2015) Daily patterns of physical activity by type 2 diabetes definition: comparing diabetes, prediabetes, and participants with normal glucose levels in NHANES 2003–2006. Prev Med Rep 2:152–157CrossRefGoogle Scholar
  27. 27.
    Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, Collins R (2015) UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12(3):e1001779CrossRefGoogle Scholar
  28. 28.
    Troiano RP, Berrigan D, Dodd KW, Mâsse LC, Tilert T, McDowell M (2008) Physical activity in the united states measured by accelerometer. Med Sci Sports Exerc 40(1):181–188CrossRefGoogle Scholar
  29. 29.
    Van Domelen DR (2018) accelerometry: Functions for processing accelerometer data. R package version 3.1.2Google Scholar
  30. 30.
    Van Domelen DR, Pittard WS, Harris TB (2014) nhanesaccel: Process accelerometer data from NHANES 2003–2006. R package version 2.1.1/r86Google Scholar
  31. 31.
    Van Domelen DR, Pttard SW (2014) Flexible R functions for processing accelerometer data, with emphasis on nhanes 2003–2006. R J 6:52–62Google Scholar
  32. 32.
    Varma VR, Dey D, Leroux A, Di J, Urbanek J, Xiao L, Zipunnikov V (2018) Total volume of physical activity: tac, tlac or tac(\(\lambda \)). Prev Med 106:233–235. CrossRefGoogle Scholar
  33. 33.
    Wood SN, Pya N (2016) Säfken: smoothing parameter and model selection for general smooth models. J Am Stat Assoc 111(516):1548–1575CrossRefGoogle Scholar
  34. 34.
    Xiao L, Zipunnikov V, Ruppert D, Crainiceanu CM (2016) Fast covariance estimation for high-dimensional functional data. Stat Comput 26(1):409–421MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Yoshida K, Bohn J (2017) tableone: Create ‘Table 1’ to describe baseline characteristics. R package version 0.9.3Google Scholar
  36. 36.
    Zipunnikov V, Caffo B, Yousem DM, Davatzikos C, Schwartz BS, Crainiceanu CM (2011) Multilevel functional principal component analysis for high-dimensional data. J Comput Graph Stat 20(4):852–873MathSciNetCrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2019

Authors and Affiliations

  • Andrew Leroux
    • 1
    Email author
  • Junrui Di
    • 1
  • Ekaterina Smirnova
    • 2
    • 4
  • Elizabeth J Mcguffey
    • 3
  • Quy Cao
    • 4
  • Elham Bayatmokhtari
    • 4
  • Lucia Tabacu
    • 5
  • Vadim Zipunnikov
    • 1
  • Jacek K Urbanek
    • 6
  • Ciprian Crainiceanu
    • 1
  1. 1.Department of BiostatisticsBloomberg School of Public Health, Johns Hopkins UniversityBaltimoreUSA
  2. 2.Department of BiostatisticsVirginia Commonwealth UniversityRichmondUSA
  3. 3.Department of MathematicsUnited States Naval AcademyAnnapolisUSA
  4. 4.Department of Mathematical SciencesUniversity of MontanaMissoulaUSA
  5. 5.Department of Mathematics and StatisticsOld Dominion UniversityNorfolkUSA
  6. 6.Division of Geriatric Medicine and Gerontology, Department of MedicineCenter on Aging and Health, School of Medicine, Johns Hopkins UniversityBaltimoreUSA

Personalised recommendations