Skip to main content
Log in

Demography in the Big Data Revolution: Changing the Culture to Forge New Frontiers

  • Original Research
  • Published:
Population Research and Policy Review Aims and scope Submit manuscript

Abstract

Despite the widespread and rapidly growing popularity of Big Data, researchers have yet to agree on what the concept entails, what tools are still needed to best interrogate these data, whether or not Big Data’s emergence represents a new academic field or simply a set of tools, and how much confidence we can place on results derived from Big Data. Despite these ambiguities, most would agree that Big Data and the methods for analyzing it represent a remarkable potential for advancing social science knowledge. In my Presidential address to the Southern Demographic Association, I argue that demographers have long collected and analyzed Big Data in a small way, by parsing out the points of information that we can manipulate with familiar models and restricting analyses to what typical computing systems can handle or restricted-access data disseminators will allow. In order to better interrogate the data we already have, we need to change the culture of demography to treat demographic microdata as Big. This includes shaping the definition of Big Data, changing how we conceptualize models, and re-evaluating how we silo confidential data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Although I do not discount the tremendous value of theory, I (and others) would argue that a major limitation to taking advantage of so many of the Big Data techniques like machine learning is that they are not designed for theory-driven modeling. As long as program officers at funding agencies and reviewers of journal articles demand theory-driven research, we will never be able to totally engage with Big Data in the way that it has been advanced in other disciplines.

  2. According to Wilcox (2010), a minimum of 599 bootstraps is necessary; however, an exchange by statisticians on the online forum, Cross Validated, reveals that many statisticians consider 100,000 to 1000,000 iterations to be necessary, and decisions are made based on the number a researcher “can afford to wait for.” See https://stats.stackexchange.com/questions/86040/rule-of-thumb-for-number-of-bootstrap-samples.

  3. This Presidential Address was given on the eve of the 2016 Presidential elections. Between the time of the address and this publication, it has become clear that the US Census Bureau is facing a budget crisis. Continuing Resolutions in Congress in 2016 and 2017 froze the overall federal budget at previous levels, which does not provide sufficient funding for the 2020 Census. The Trump administration has asked for additional funding, but the Census Project—a grassroots organization comprised demographers and other stakeholders—believes that the requested additional funds are insufficient for the task, even if given. Thus, the probability that the Census would allocate the funds necessary to upgrade the FSRDC computing environment seems even less likely now than in 2016.

References

  • Austin, P. C., & Stuart, E. A. (2015). Moving toward best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661–3679.

    Article  Google Scholar 

  • Baek Choi, J., & Thomas, M. (2009). Predictive factors of acculturation attitudes and social support among Asian immigrants in the USA. International Journal of Social Welfare, 18(1), 76–84.

    Article  Google Scholar 

  • Bell, B. A., Onwuegbuzie, A. J., Ferron, J. M., Jiao, Q. G., Hibbard, S. T., & Kromrey, J. D. (2012). Use of design effects and sample weights in complex health survey data: a review of published articles using data from 3 commonly used adolescent health surveys. American Journal of Public Health, 102(7), 1399–1405.

    Article  Google Scholar 

  • Bryant, A., & Raja, U. (2014). In the realm of Big Data. First Monday 19(2). http://firstmonday.org/article/view/4991/3822. Accessed 17 Jan 2018.

  • Butler, D. (2008). Web data predict flu. Nature, 456, 287–288.

    Article  Google Scholar 

  • Chantala, K., & Tabor, J. (1999). Strategies to perform a design-based analysis using the Add Health data. Resource document. Carolina Population Center, University of North Carolina at Chapel Hill. http://www.cpc.unc.edu/projects/addhealth/documentation/guides/weight1.pdf. Accessed 17 Jan 2018.

  • Chen, C. L. P., & Zhang, C. (2014). Data-intensive applications, challenges, techniques an technologies: a survey on Big Data. Information Sciences, 275, 314–347.

    Article  Google Scholar 

  • Crowder, J. A., & Carbone, J. A. (2017). Abductive artificial intelligence learning models. In H. R. Arabnia, D. de la Fuente, E. B. Kozerenko, J. A. Olivas, & F. G. Tinetti (Eds.), Proceedings of the 2017 International Conference on Artificial Intelligence (pp. 90–96). Las Vegas: CSREA Press.

    Google Scholar 

  • Cutter, S. L., Emrich, C. T., Mitchell, J. T., Boruff, B. J., Gall, M., Schmidtlein, M. C., et al. (2006). The long road home: race, class, and recovery from Hurricane Katrina. Environment: Science and Policy for Sustainable Development, 4(2), 8–20.

    Google Scholar 

  • Davenport, T. H., & Patil, D. J. (2012). Data scientist—the sexiest job of the 21st century: meet the people who can coax treasure out of messy, unstructured data. Harvard Business Review, 95(5), 70–76.

    Google Scholar 

  • Dinov, I. D. (2016). Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data. Gigascience, 5(1), 1–15.

    Article  Google Scholar 

  • Fiske, S. T., & Hauser, R. M. (2014). Protecting human research participants in the age of big data. Proceedings of the National Academy of Sciences, 111(38), 13675–13676.

    Article  Google Scholar 

  • Fossett, M. (2006). Ethnic preferences, social distance dynamics, and residential segregation: theoretical explorations using simulation analysis. The Journal of Mathematical Sociology, 30(3–4), 185–273.

    Article  Google Scholar 

  • Fuchs, C., & Sandoval, M. (2013). The diamond model of open access publishing: why policy makers, scholars, universities, libraries, labour unions and the publishing world need to take non-commercial, non-profit open access serious. TripleC: Communication, Capitalism & Critique, 11(2), 428–443.

    Article  Google Scholar 

  • Fussell, E., Curran, S. R., Dunbar, M. D., Babb, M. A., Thompson, L., & Meijer-Irons, J. (2017). Weather-related hazards and population change: a study of hurricanes and tropical storms in the United States, 1980-2012. The Annals of the American Academy of Political and Social Science., 669(1), 146–167.

    Article  Google Scholar 

  • Gomes, R., Levinson, H. F., Tsiganis, K., & Morbidelli, A. (2005). Origin of the cataclysmic late heavy bombardment period of the terrestrial plants. Nature, 4353, 466–469.

    Article  Google Scholar 

  • Grace, Kathryn, & Nagle, Nicholas. (2015). Using high resolution remotely sensed data to examine the relationship between agriculture and fertility in a pre-transitional setting: a case study of Mali. The Professional Geographer, 67(4), 641–654.

    Article  Google Scholar 

  • Grace, Kathryn, Nagle, Nicholas N., & Husak, Greg. (2016). Can small-scale agricultural production improve children’s health? examining stunting vulnerability among very young children in Mali, West Africa. Annals of the Association of American Geographers, 106(3), 722–737.

    Article  Google Scholar 

  • Greenough, G., McGeehin, M., Bernard, S. M., Trtanj, J., Riad, J., & Engelberg, D. (2001). The potential impacts of climate variability and change on health impacts of extreme weather events in the United States”. Environmental Health Perspectives, 109(Supp 2), 191–198.

    Article  Google Scholar 

  • Hayden, E. C. (2015). Genome researchers raise alarm over Big Data. Nature: International Weekly Journal of Science. http://www.nature.com/news/genome-researchers- raise-alarm-over-big-data-1.17912. Accessed 17 Jan 2018.

  • Hayward, M. D., Hummer, R. A., Chiu, C., Gonzalez-Gonzalez, C., & Wong, R. (2014). Does the Hispanic paradox in mortality extend to disability? Population Research and Policy Review, 33, 81–96.

    Article  Google Scholar 

  • Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLOS Biology. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106. Accessed 17 Jan 2018.

  • HLG-PCCB (High-level group for partnership, coordination and capacity-building for statistics for the 20130 agenda for sustainable development). (2016). Global action plan for sustainable development data. Report. https://unstats.un.org/sdgs/files/global-consultation-hlg-1/GAP_HLG-20161021.pdf. Accessed 17 Jan 2018.

  • Horrigan, M. W. (2013). Big data and official statistics. presentation for the international year of statistics. Bureau of Labor Statistics, Office of Prices and Living Conditions Washington, DC

  • Iceland, J., Weinberg, D. H., & Steinmetz, E. (2002). Racial and ethnic residential segregation in the United States: 1980–2000. Washington, DC: US Census Bureau, Series CENSR-3.

    Google Scholar 

  • King, G. (2016). Preface: big data is not about the data. In R. Michael Alvarez (Ed.), Computational social science: discovery and prediction. Cambridge: Cambridge University Press.

    Google Scholar 

  • Kitchin, R. (2014a). Big data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 1–12.

    Article  Google Scholar 

  • Kitchin, R. (2014b). The data revolution: big data, open data, data infrastructures & their consequences. Los Angeles: Sage.

    Google Scholar 

  • Kwan, M. (2012). The uncertain geographic context problem. Annals of the Association of American Geographers, 102(5), 958–968.

    Article  Google Scholar 

  • Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203–1205.

    Article  Google Scholar 

  • Letouzé, E. (2015). Demography, meet big data; big data, meet demography: reflections on the data-rich future of population science. In Paper presented at the United Nations EGM on strengthening the demographic evidence base for the post-2015 development agenda. New York, October 5.

  • Leung, M., & Takeuchi, D. T. (2011). Race, place, and health. In L. M. Burton, P. Kemp, M. Leung, S. A. Matthews, & D. T. Takeuchi (Eds.), Communities, neighborhoods, and health: expanding the boundaries of place (pp. 73–88). New York: Springer.

    Chapter  Google Scholar 

  • Lichter, D. T., & Johnson, K. M. (2009). Immigrant gateways and Hispanic migration to new destinations. International Migration Review, 43(3), 496–518.

    Article  Google Scholar 

  • Manovich, L. (2011). Trending: the promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the Digital Humanities 2 (pp. 460–475). Minneapolis: University of Minnesota.

    Google Scholar 

  • Maples, J. N. (2012). Changes in US Ethnic Niches, 2005-2010. Doctoral Dissertation, University of Tennessee. http://trace.tennessee.edu/socioetds/. Accessed 17 Jan 2018.

  • Martin, D. (1996). Geographic information systems: socioeconomic applications. New York: Routledge.

    Book  Google Scholar 

  • Martin, J. A., Hamilton, B. E., Osterman, M. J. K., Driscoll, A. K., & Matthews, T. J. (2017). Births: final data for 2015. National Vital Statistics Reports, 66, 1–70.

    Google Scholar 

  • Massey, D. S., & Denton, N. A. (1988). The dimensions of residential segregation”. Social Forces, 67(2), 281–315.

    Article  Google Scholar 

  • McCoach, D. B., & Adelson, J. L. (2010). Dealing with dependence (Part I): understanding the effects of clustered data. Gifted Child Quarterly, 54(2), 152–155.

    Article  Google Scholar 

  • Metzler, K., Kim, D. A., Allum, N., & Denman, A. (2016). Who is doing computational social science? A white paper. Sage Publishing. https://us.sagepub.com/sites/default/files/compsocsci.pdf. Accessed 17 Jan 2018.

  • Minnesota Population Center. (2016). Terra populus: integrated data on population and environment: version 1. Minneapolis: University of Minnesota.

    Google Scholar 

  • Moretti, S. (2002). Computer simulations in sociology: what contribution? Social Science Computer Review, 20(1), 43–57.

    Article  Google Scholar 

  • Murdoch, T. B., & Detsky, A. S. (2013). The inevitable application of Big Data to health care. JAMA, 309(13), 1351–1352.

    Article  Google Scholar 

  • Nuzzo, R. (2014). Statistical errors: p values, the “gold standard” of statistical validity, are not as reliable as many scientists assume. Nature, 506, 150–152.

    Article  Google Scholar 

  • Pattengale, N. D., Alipour, M., Bininda-Emonds, O. R. P., Moret, B. M. E., & Stamatakis, A. (2010). How many bootstrap replicates are necessary?”. Journal of Computational Biology, 17(3), 337–354.

    Article  Google Scholar 

  • Perreira, K. M., Harris, K. M., & Lee, D. (2006). Making it in America: high school completion by immigrant and native youth. Demography, 43(3), 511–536.

    Article  Google Scholar 

  • Pokhriyal, N., Dong, W., & Govindaraju, V. (2015). Big data for improved diagnosis of poverty: a case study of Senegal. Washington, DC: A report for the brookings institution africa in focus series.

    Google Scholar 

  • Portes, A., & Rumbaut, R. G. (2006). Immigrant America: a portrait. Berkeley: University of California Press.

    Google Scholar 

  • Ramakrishnan, S. K. (2005). Democracy in Immigrant America: changing demographics and political participation. Palo Alto: Stanford University Press.

    Google Scholar 

  • Riosmena, F., & Massey, D. S. (2012). Pathways to El Norte: origins, destinations, and characteristics of Mexican migrants to the United States. International Migration Review, 46(1), 3–36.

    Article  Google Scholar 

  • Ruggles, S. (2014). Big microdata for population research. Demography, 51(1), 287–297.

    Article  Google Scholar 

  • Schwirian, K. P. (1983). Models of neighborhood change. Annual Review of Sociology, 9, 83–102.

    Article  Google Scholar 

  • Singer, A. (2004). The rise of new immigrant gateways. Washington, DC: Brookings Institution, Center on Urban and Metropolitan Policy.

    Google Scholar 

  • Tripathi, R., Sharma, P., Chakraborty, P., & Varadwaj, P. K. (2016). Next-generation sequencing revolution through big data analytics. Frontiers in Life Science, 9(2), 119–149.

    Article  Google Scholar 

  • Tsiganis, K., Gomes, R., Morbidelli, A., & Levinson, H. F. (2005). Origin of the orbital architecture of the giant planets of the Solar system. Nature, 435(7041), 459–461.

    Article  Google Scholar 

  • Udry, J. R. (2003). The national longitudinal study of adolescent health (Add Health), Wave 1, 1994. Chapel Hill: Carolina Population Center, University of North Carolina.

    Google Scholar 

  • Vilhuber, L. (2016). Census research nodes: a progress report. In Presentation at the 2016 FSRDC Research Conference. September 15. College Station, Texas.

  • Vital Wave Consulting. (2012). Big data, big impact: new possibilities for international development. A report for the World Economic Forum. Geneva, Switzerland.

  • Waga, D., & Rabah, K. (2014). Environmental conditions’, big data management, and cloud computing analytics for sustainable agriculture. World Journal of Computer Application and Technology, 2(3), 73–81.

    Google Scholar 

  • Wilcox, R. R. (2010). Fundamentals of modern statistical methods: substantially improving power and accuracy. New York: Springer.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephanie A. Bohon.

Additional information

This paper is a version of the Presidential Address given to the Southern Demographic Association, Athens, Georgia, October 13, 2016.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bohon, S.A. Demography in the Big Data Revolution: Changing the Culture to Forge New Frontiers. Popul Res Policy Rev 37, 323–341 (2018). https://doi.org/10.1007/s11113-018-9464-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11113-018-9464-6

Keywords

Navigation