1 Background

1.1 Policy and Scholarly Debates

Of the many potential determinants of child health development throughout the life course, one of the most contentiously debated has been the role of marriage and nonmarital childbearing. The now-famous Moynihan Report (1965), titled The Negro Family, highlighted family structure, along with the legacy of slavery and sustained discrimination, and inadequate employment opportunities, as key determinants of intransigent poverty among African-Americans of the time (Massey and Sampson 2009). Moynihan’s primary policy recommendation was for increased federal efforts toward economic opportunity, and he was initially praised by black leaders for focusing such high-level attention on the wellbeing of black families. However, a more enduring legacy of his report, particularly in the context of the civil rights and feminist movements of the 1960s, was his reference to the “pathological” nature of mother-headed households (ibid.) and the controversy it engendered.

In the subsequent decades, scholars and policymakers debated whether nonmarital childbearing was the result of a “culture of poverty,” which encouraged choices that undermined African-American children’s life chances, or of structural factors such as racial discrimination, which trapped families in circumstances of intergenerational poverty and social disadvantage (Wilson 1988). The nature and causes of poverty, and the racialized discourse about social welfare programs, single-parent families, and their implications for child wellbeing, played a significant role in policy debates as well. This rhetoric was particularly notable in passage of the 1996 welfare reform law, the Personal Responsibility and Work Opportunity Reconciliation Act (Cherlin et al. 2009).

At the same time, the prevalence of nonmarital births has grown considerably, both among blacks and other racial and ethnic groups (McLanahan 2009). This growth, illustrated in Fig. 1, engendered further debate about the nature of nonmarital childbearing and whether its widespread increase should be cause for concern.

Fig. 1
figure 1

Unmarried births as a percent of US births, 1950–2013 (National Center for Health Statistics; National Vital Statistics System). Note: Hispanics can be of any race. “Black” data from 1950 to 1969 refers to all non-whites

However, in the 1990s, empirical data on unmarried parents and their children was limited (McLanahan 2009). Scholars and analysts alternately posited that children born to unmarried parents were the result of encounters so casual that mothers might not know who the fathers were, or that nonmarital childbearing in the United States might resemble that in other Western industrialized countries in which committed, cohabiting parents raised children within the same stable unions as a married couple, “lacking only the piece of paper.” Still others suggested that while unmarried parents might be committed to each other and their children, the American welfare state did not provide the support of those in other Western industrialized countries, leaving the family units at risk (McLanahan et al. 2010).

At the point when these debates were most salient, the literature on single parenthood and father absence focused predominantly on widows and divorced families (McLanahan and Sandefur 1994), who researchers suspected differed significantly from parents who never married. Some basic demographic information was available about the women who gave birth while unmarried, but very little was known about the fathers of their children or about the social and cultural environments in which their children grew up (McLanahan 2009). Still less was known about the health development outcomes of children born outside of marriage or the role of the family and social environment in their development.

1.2 Development of the Fragile Families and Child Wellbeing Study

To build a sound evidence base on the causes and consequences of nonmarital childbearing , a team of researchers at Columbia and Princeton Universities developed and implemented a large national survey, the Fragile Families and Child Wellbeing Study (FFCWS). The “Fragile Families” moniker was carefully chosen and derived from the Ford Foundation’s “Strengthening Fragile Families” initiative of the 1990s (Sorensen et al. 2002). Although a variety of family transitions carried the potential for resulting instability – divorce and parental death, among others – the term “Fragile Families” referred specifically to families in which parents were not married at the time a child was born (ibid.). The word “fragile” referred to the risks that unmarried parents faced in terms of their economic and relationship stability, while the word “families” was used in recognition that unmarried partners with children represent a cohesive family unit (ibid.).Footnote 1

The FFCWS was designed to address four key questions raised by the Moynihan Report and subsequent public discourse (Reichman et al. 2001).

  1. 1.

    What are the capabilities of unmarried parents (especially fathers) when their child is born? How many of the fathers hold steady jobs? How many want to be involved in raising their children?

  2. 2.

    What is the nature of parental relationships in fragile families at birth? How many couples are involved in stable, long-term relationships? How many expect to marry? How many experience high levels of conflict or domestic violence? How do relationships change over time?

  3. 3.

    What factors push new unwed parents together? What factors pull them apart? How do public policies affect parents’ behaviors and living arrangements or child wellbeing? What are the long-term consequences of new welfare regulations, stronger paternity establishment, and stricter child support enforcement? What roles do childcare and healthcare policies play? How do these policies play out in different labor market environments?

  4. 4.

    How do parents and children fare in fragile families, and how does family structure and stability affect child wellbeing and development?

The study has since expanded to become a key source of information about family relationships and the broader social environment and their effects on child health development and wellbeing. At time of writing, the FFCWS is in the middle of its sixth wave of data collection, and the children born at the study baseline are turning 15 years old. The study is a joint effort by the Princeton University Center for Research on Child Wellbeing (CRCW) and Center for Health and Wellbeing, the Columbia Population Research Center (CPRC), and the National Center for Children and Families (NCCF) at Columbia University. The Principal Investigators of the Fragile Families Study are Sara McLanahan, Dan Notterman, Janet Currie, and Christina PaxsonFootnote 2 at Princeton University and Irwin Garfinkel, Jeanne Brooks-Gunn, Ron Mincy, and Jane Waldfogel at Columbia University. The study’s funding has come from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, as well as a variety of other government and private funders, listed in Appendix A. More than 730 scholarly articles, theses, books, and abstracts have used the FFCWS data, which has also been used to inform testimony to policymakers at the national, state, and local levels (e.g., Sorensen et al. 2002; Geller 2011).

1.3 The FFCWS and Life Course Health Development Research

The FFCWS has several features that make it ideally suited for life course health development research on children. The Life Course Health Development (LCHD) model (see Halfon and Forrest 2017) views health as a “dynamic, emergent capacity that develops continuously over the lifespan, in a complex, nonlinear process” forming health development trajectories that are influenced by physical and social environmental as well as personal and genetic factors. Specifically, the model lays out seven principles of health development (ibid.):

  1. 1.

    Health is an emergent set of developmental capacities.

  2. 2.

    Health develops continuously over the life span.

  3. 3.

    Health development is a complex, nonlinear process occurring in multiple dimensions and at multiple levels and phases.

  4. 4.

    Health development is sensitive to the timing and social structuring of environmental exposures and experience.

  5. 5.

    Health development is an adaptive process that has been engendered by evolution with strategies to promote resilience and plasticity in the face of changing and often constraining environmental contexts.

  6. 6.

    Optimal health development promotes survival, enhances thriving, and protects against disease.

  7. 7.

    Health development is sensitive to the timing and synchronization of molecular, physiological, behavioral, social, and cultural function.

Health is a broad set of developmental capacities that are in constant and dynamic transactions with individuals’ biological, physical, and social environments (National Research Council and Institute of Medicine 2004). The FFCWS measures children’s environments across all of these domains: the study contains biological indicators of children’s wellbeing from the time of their birth, including low birthweight (Reichman and Teitler 2006) and overweight and obesity (Kimbro et al. 2007), among others (e.g., Holt et al. 2013). In many cases the FFCWS is even able to study children’s prenatal environments through their mothers’ hospital records (Reichman and Nepomnyaschy 2008; Smulian et al. 2005). More recently, the study has also started to collect genetic data, to enable examination of gene-environment interactions (Mitchell et al. 2014).

The FFCWS also contains rich information on children’s physical environments, examining factors with significant influence on health development throughout the life course. The study collects data on the homes and neighborhoods in which children live, exposure to lead (Boutwell et al. 2014), access to food (Corman et al. 2014), and other neighborhood conditions.

One of the richest contributions of the FFCWS is its measurement of the social environment in which children are raised. According to the theoretical principles of life course health development, health development is a multilevel construct, and the FFCWS measures children’s social environments at many of the levels that influence their health development throughout the life course: the family (both biological parents and the social unit that comprises the household) (Jackson et al. 2012; Pilkauskas 2014), childcare settings (Rigby et al. 2007), schools (Razza et al. Forthcoming), neighborhoods (Suglia et al. 2013), and broader contexts such as states or local labor markets (Rigby et al. 2007; Pilkauskas et al. 2012).

In addition to the rich multilevel data contained in the FFCWS , another significant contribution the study makes to life course research is its unique sampling strategy (detailed below in Sect. 2.1). Given the study’s focus on unmarried parents and nonmarital childbearing, the study systematically oversamples unmarried couples, who face disproportionate socioeconomic challenges (McLanahan 2011). The FFCWS permits the observation of these challenges on a population level, and enables the identification of factors that might exacerbate risk or promote resilience among children. By nature of the FFCWS sampling plan, the data also contain large samples of racial and ethnic minority families, who are often underrepresented in population-based surveys (Reichman et al. 2001).

Another contribution of the FFCWS is the longitudinal nature of its data collection. The study contains rich information on children’s environments, health status throughout infancy, early childhood, school entry, and middle childhood, which permits the construction of health development trajectories. As important as these periods are for long-term health and wellbeing (Shonkoff and Phillips 2000; Heckman 2006), very few population-based American datasets track birth cohorts or begin in infancy or early childhood (though see Flanagan and West 2004 for information on the Early Childhood Longitudinal Study, a notable exception).

The longitudinal birth cohort structure of the FFCWS , coupled with the rich data it collects on multiple domains of health and health determinants, permits the joint analysis of children’s physical and emotional health development, both of which develop continuously over the life span. Health development and environmental constructs are tracked over time, with indicators either measured repeatedly or updated to be age-appropriate, as needed. These measures are therefore able to capture the time-sensitive nature of child development. Children (and adults) experience several key transitions throughout the life course (e.g., school entry, puberty, etc.), which require new modes of adaptation to biological, psychological, or social changes (Elder 1985; Graber and Brooks-Gunn 1996). Events they experience – both positive and negative – may also serve as turning points that alter their life course trajectories (Elder 1985; Sampson and Laub 1990). The effects of turning points and other key exposures on life course trajectories may further be linked to the timing of these exposures with respect to developmental transitions (Graber and Brooks-Gunn 1996). An important strength of the FFCWS is its observation of exposures to key influences, and measurement of children’s risk and resilience, throughout their health developmental trajectories.

1.4 Road Map

The rest of this chapter proceeds as follows. Section 2 lays out the sampling plan and study design of the FFCWS. Section 3 provides an inventory of the data available at time of writing. Section 4 provides greater detail on how the FFCWS data may be used for life course health development research. Section 5 lays out future plans for the FFCWS. Section 6 concludes with resources available for FFCWS data users.

2 Study Design

The FFCWS was designed to provide information on unmarried couples and their newborn children (McLanahan et al. 2010). Previous efforts to describe these “fragile families” were limited by the challenges in collecting data on unmarried fathers (Reichman et al. 2001). What little was known about men who fathered children outside of marriage suggested that they were younger and less educated than men who father children within marriage, with lower incomes and fewer attachments to the labor market (Reichman et al. 2001; Garfinkel et al. 1998). These fathers also reported more disability, more depression, and a greater prevalence of risky health behaviors such as drinking and substance use (ibid.) . These factors create challenges in recruiting and retaining them in population-based surveys (Aday and Cornelius 2006). Unmarried fathers also are less likely to be their children’s legal guardians and to have fewer formal ties to their families, and may be harder to locate as a result. Accordingly, it is also likely that data obtained from prior household surveys understated the disadvantage faced by families with unmarried fathers.

The FFCWS focused its study design around the objective of obtaining information about previously understudied, unmarried, and often nonresident fathers. Early pilot studies suggested that unmarried fathers were often present at the birth of their child, and that both parents were willing to be interviewed at this time (Reichman et al. 2001). Baseline data collection was designed around this “magic moment,” and both mothers and fathers were initially contacted for interview while in the hospital, shortly after their child’s birth. Sampling from hospitals also provided a spatial clustering of new parents that both improved response rates and kept recruitment and interview costs manageable.

2.1 Sampling Strategy

The FFCWS was based on a multistage stratified random sample: cities were sampled, then hospitals within cities, and then births within hospitals. The sampling of cities was based on all cities in the United States with populations of 200,000 or more. Some of the cities in the sampling frame had a population of over 200,000 only when outlying suburban areas were included, while some cities were considered only in terms of their inner city – the definition of each city was drawn from its vital statistics (Reichman et al. 2001). These cities were then characterized by their policy environments, as defined by three variables: the welfare generosity , the strength of the local labor market, and the strength of local child support enforcement. (See Reichman et al., at 310–311, for further details on the scoring of policy environments.)

Each city’s welfare, labor market, and child support regimes were scored on a three-point scale: low, medium, or high welfare generosity and strong, medium, or weak labor markets and child support enforcement policies. Cities were then sorted into two groups: those with only extreme values (low or high) for all three dimensions (leaving 8 different “extreme cells”) and cities with at least one middle value (19 remaining cells). One city was sampled from each of the 8 extreme cells, and 8 additional cities were randomly sampled from the 19 remaining cities, with selection probability based on city populations. In the 8 “extreme policy” cities, the plan was to collect data on 250 nonmarital births and 75 marital births, for a total of 325. In the other 8 cities, the plan was to sample a total of 100 births. The “large sample” cities provided an ability to understand the social and policy environment and social processes within cities, as well as differences between cities. The eight “small sample” cities, selected from the cells with at least one middle value, improved the ability to observe nonlinearities in the effects of welfare, child support, and labor markets.

The 16 cities randomly selected along the systematic stratified sampling plan comprise the “national sample” of the FFCWS.Footnote 3 Four additional cities were added to the study, given their special substantive interest to funders. Table 1 lists the cities sampled in the FFCWS, whether they were selected as part of the national (i.e., random) sample and their baseline sample sizes.

Table 1 Fragile Families and Child Wellbeing Study cities and sample sizes

The FFCWS is not only designed to be nationally representative of births in large cities but to offer detailed case studies of the included cities themselves. Hospitals were systematically sampled in order to increase coverage of nonmarital births, which led to a full sample of 75 hospitals (see Reichman et al. 2001, for the full list of included hospitals). Within each hospital, mothers of new babies were sampled from maternity ward lists. Once sampled, mothers were asked to complete a screening instrument to determine marital status and eligibility for participation in the study. Quotas were set at each hospital for the number of unmarried and married births, based on sample cities’ 1996/1997 unmarried birth rates. Marital and nonmarital births were randomly sampled until preset quotas were reached for each hospital. If a mother was determined to be above the set quota for a given marital status, the case was coded “over quota,” and the mother was not interviewed. In the rare cases that surveyed parents had had twins or a higher-order multiple birth, one child was designated the “focal child,” the child to whom answers to the FFCWS refer.

The FFCWS oversample of nonmarital births, coupled with the significantly higher rates of nonmarital childbearing among blacks and Hispanics noted in Fig. 1, provided a study sample with a significantly higher representation of non-white families than most household surveys. Racial and ethnic classification of the mothers and fathers in the survey are provided in Table 2.

Table 2 Racial and ethnic composition, FFCWS sample

Several subgroups of parents were not included in the study: those who did not speak English or Spanish well enough to complete the interview, those who planned to place the child for adoption, those for whom the father of the baby was deceased, those whose baby died before the interview could take place, those mothers who were too ill to complete the interview (or whose babies were too ill for the mother to complete the interview), and, in many hospitals, parents who were under 18 and therefore prohibited by the hospital from being interviewed (Reichman et al. 2001). Because the study was designed to examine the roles of mothers and fathers in families, the study is also limited to births to heterosexual couples. As the sample is hospital-based, home births and births in other venues are also not represented. With these and other slight exceptions, the FFCWS sample is representative of the nonmarital births taking place in each city. However, the marital sample is not necessarily representative of marital births in each city because in many cities, births were purposively sampled from hospitals with the highest rates of nonmarital births. It is also important to note that the sample is representative of nonmarital births taking place in each city rather than of births to residents of each city (Reichman et al. 2001), since parents giving birth within the city may live elsewhere.

2.2 Sampling Weights

Due to the complex sampling design of the FFCWS, the data are provided with sampling weights to make the data representative of the sample cities and of urban births nationally. These weights account for the stratified random sample of cities, the sample of hospitals, and the sample of births, as well as the clustering of births by hospital, the oversample of nonmarital births, the underrepresentation of teen births, and mothers’ marital status, education, race/ethnicity, and age. The survey includes six sets of weights: to make the data nationally representative and representative of sample cities, for couples and for mothers and fathers separately. These weights are provided for the five waves of data currently available, to account not only for design but for baseline nonresponse and attrition over time (Carlson 2008; CRCW 2008a, b).

2.3 Timing of Interviews

Baseline interviewing for the FFCWS took place over 3 years, between 1998 and 2000. Couples were subsequently contacted for re-interview in four additional waves, reflecting various developmental stages of the focal child’s life. Interviews took place when the focal children were 1, 3, 5, and 9 years of age, and a sixth wave of the study is currently in the field (interviewing the families while the focal children are 15 years old.) Due to the short time that elapsed between the early interviews, some families were interviewed for their Year 1 (Y1) follow-up before baseline interviewing was completed for all families, for their Year 3 (Y3) follow-up before Y1 interviewing was completed, and for Year 5 (Y5) follow-up before Y3 interviewing was completed.

Because interviews were carried out over approximately 3 years in each wave, many of the early interviews provided valuable information to improve the survey while it was ongoing. Interviews took place in the first two cities (Oakland, CA and Austin, TX) approximately a full year before interviewing began in the remaining 18 cities, and the early interviews served as a form of pilot test. If questions were not well understood by the respondents, or responses did not have sufficient variation to permit meaningful analysis, these questions were revised before interviewing in the remaining 18 cities.

It also bears noting that considerable time may elapse between father and mother interviews within the same family. Of the families in which both mothers and fathers were interviewed at baseline, more than half (55%) of couples were interviewed on the same day. However, in more than 10% of families, parents were interviewed more than 1 month apart. By Year 9 (Y9), only 15% of parents who both participated were interviewed on the same day. More than 30% were interviewed more than 1 month apart from their partners, with a small number interviewed over a year apart. It is quite possible that discrepancies in mothers’ and fathers’ reports of family circumstance reflect differences in timing as well as differences in perspective and other traditional sources of measurement error.

2.4 Attrition and Retention

The FFCWS has been unusually successful in collecting data on its generally hard-to-reach target population. Core response rates are provided in Table 3 for mothers and fathers, by their baseline marital status. Mothers’ baseline response rates are computed as the percent of mothers approached for interview who agreed to participate. Fathers’ results at baseline are computed as the percent of mothers interviewed at baseline whose partners agreed to participate. Response rates in subsequent waves are computed as a percent of the couples interviewed at baseline who participated in subsequent waves. Even among the hardest-to-reach population, unmarried fathers, the majority of the sample was retained in the survey at the 9-year follow-up wave. Nearly all fathers (96% of fathers married at baseline and 87% of unmarried fathers) were interviewed at least once.

Table 3 Response rates, FFCWS core biological parent survey

2.5 Couple Data

Another way in which the FFCWS provides unique information into its hard-to-reach target population is by the collection of couple data . Not only does the “magic moment” of the focal child’s birth provide an opportunity to identify fathers who might not otherwise be located, information on these fathers is not only provided through personal interviews but also through the data collected from their partners. Mothers report not only on their own background and behavior and their personal perception of the couple relationship, but they also report detailed information on their partners’ own background and behavior. This information includes the fathers’ racial and ethnic background, place of residence, and educational attainment, as well as behavioral information such as labor market activity, incarceration history, parenting behavior, and other aspects of father involvement (e.g., child support payment and visitation). Likewise, fathers are asked to report on similar domains of the mothers’ background and behavior.

Couple data provides value in two important ways. When both parents are interviewed, concordance or disagreement between the two parents’ responses can be used to assess many domains of their relationship, subject to differences in the phrasing of some questions and the timing of each parent’s interview. When only one parent is interviewed, responses about their partner provide information not directly obtainable. This is particularly important given the factors likely to influence attrition and retention. If the fathers least involved with their partners and children are the most likely to leave the survey, analyses based only on those retained are likely to be affected by selection bias. Collecting mothers’ reports of father characteristics helps to reduce this bias.

3 Current Data Availability

3.1 Data Access

As noted, five waves of FFCWS data collection have been completed as of time of writing. A sixth wave of data collection is ongoing. Baseline, Y1, Y3, Y5, and Y9 core follow-up data are available to the public through the data archive at the Office of Population Research (OPR) at Princeton University. These data, excluding Y9, are also available through the Inter-university Consortium for Political and Social Research (ICPSR). In-home data from the Y3, Y5, and Y9 waves are also available for a subset of core respondents. Additional data, described below, are available to the public via a restricted use contract.

The FFCWS restricted-use dataset (also known as the “contract data”) contain more sensitive information such as geographic identifiers (i.e., the city where the focal child was born and state of respondent residence), census tract characteristics, local macroeconomic measures, genetic data, medical records data, and school characteristics. More information on these measures is provided in Appendix B. While these indicators are invaluable for researchers who are interested in children’s local or school environments, gene-environment interactions, or detailed information about maternal health, it bears noting that the vast majority of FFCWS research is done using the public-use dataset.

Access to the FFCWS contract data is limited to researchers who agree to the terms and conditions contained in the Contract Data Use License. Access is limited to faculty and research personnel at institutions which have an institutional review board (IRB) or human subjects review committee registered with the US Office for Human Research Protections (OHRP) or the National Institutes of Health (NIH). Researchers must obtain IRB approval of their research and data protection plans. Students may use the FFCWS contract data for dissertation research; however, a faculty advisor must serve as the investigator and complete the application process, with the student signing a Supplemental Agreement with Research Staff form. The faculty advisor and the institution bear full responsibility for ensuring that all conditions of the license are met by the student. Further information about the data access process for both the public- and restricted-use files is available in Appendix B.

3.2 Data Modules

In every wave, participating mothers and fathers each completed a “core” survey, administered in-person for most baseline interviews and by phone in most subsequent interviews. For both mothers and fathers , each core interview included questions on household and family characteristics (including a household roster and parents’ marital and coresidence status, as well as contact between nonresident parents and the focal child); sociodemographic and background characteristics; information on parenting behavior; romantic relationships (with the focal biological child’s other biological parent or, where relevant, with new partners); details about each parent’s health, education, employment, and income; as well as indicators of the focal child’s health development and wellbeing. Many of the indicators in the core are collected using the same survey questions at each wave, enabling longitudinal analyses of family trajectories. Details on the data collected in the FFCWS core are provided in Sect. 4. In addition to the data available in the core, both the FFCWS public- and restricted-use files contain several other modules with additional information. These include in-home observations, supplemental surveys, official records, and genetic data.

3.2.1 In-Home Module (Public and Restricted Use)

In addition to the data collected by phone from both mothers and fathers in the FFCWS core data, both the public- and restricted-use files include an in-home module in the Y3, Y5, and Y9 waves. The in-home module includes several assessments of the child and his or her home environment and a survey of the child’s primary caregiver (PCG) , defined as person the focal child lives with at least half the time. The PCG, who can be either the biological mother or father, or a nonparental figure such as a grandparent, is interviewed in the Y3, Y5, and Y9 survey waves. Interviews include a mix of in-person and self-administered questionnaires conducted in the home and questions asked over the phone.

The PCG survey covers information about the child’s home and neighborhood environment, including questions of neighborhood safety, the availability of household resources, parenting behaviors, and childcare arrangements. The PCG also provides details of the focal child’s health development, which are best reported by the person the focal child spends the most time with. It is notable, however, that which questions are answered in the PCG survey and which are answered by the biological parents in the core survey vary by wave.

In addition, the in-home module in all three waves includes several systematic assessments by the interviewer, including the focal child’s general health and cognitive health development, the PCG’s health and cognitive abilities, and assessments of the home and neighborhood environment. These include measures of the height and weight of both the PCG and child, cognitive tests of both the PCG and child, and recorded observations of interactions between the PCG and child.

In the most recent wave (Y9), the in-home assessment also includes an interview of the focal child. In the “child survey,” the focal child provides his or her perspective on the home environment, including relationships with each of their biological parents, as well as any new partners that the parents are married to or living with. The child also reports on the extent of discipline and monitoring provided by their primary caregiver and discipline provided by each of their biological parents and any new partners. The focal child also reports on relationships with his or her siblings and how and with whom he or she spends time. He or she provides self-reports of personal information that might not be accurately assessed by the parents or PCG, including health, personal safety, his or her school environment, school connectedness, bullying and bully victimization, and several aspects of behavior, including early delinquent behavior.

3.2.2 Childcare Provider and Teacher Surveys (Public and Restricted Use)

In the Y3 survey, families using nonparental childcare for at least 7 hours per week, and one consistent childcare arrangement for at least 5 hours per week, were asked if a representative of the childcare program could be interviewed. A subset of these parents and childcare providers consented to be interviewed and observed. The childcare provider survey included information on the focal child’s behavior, social skills, and learning, as well as characteristics of the program. In many cases the interviewer also observed interactions between the provider and the child.

In Y5 and Y9, the focal child’s teachers were also asked to participate in the survey. Teachers who agreed completed a self-administered questionnaire and provided characteristics of their classroom, the school climate, and their assessment of the child’s behavior, social skills, and learning, as well as the parents’ involvement. The teacher surveys did not include administrative details of the school itself; further information provided on the child’s school is provided in the restricted-use file and described in Sect. 2.4.

3.2.3 Medical Records (Restricted Use Only)

Approximately 75% of the mothers interviewed at the FFCWS baseline gave permission for their medical records to be abstracted for analysis. These records contain detailed quantitative and qualitative data on the mother’s health and healthcare history, including details of her reproductive history, such as how many previous pregnancies she had had before the focal child was born, and how much weight she gained during the pregnancy. The FFCWS medical records data also contain other anthropomorphic measures (such as the mother’s weight and the child’s birth weight), indicators of chemical substances in the body at the time of the birth, and diagnoses, including both physical and mental health. The medical records data provides a rich narrative of the mother’s health at the time of the birth and serves to identify inconsistencies in maternal self-reports.

3.2.4 School Characteristics File (Restricted Use Only)

At the Y9 follow-up survey, data on the focal child’s school was collected by the National Center for Education Statistics (NCES). Although the names and locations of the school are not released in the restricted-use file, a unique identifier is provided for each school in the file, so that children attending the same school can be identified. The NCES data include information on whether the school is a public or private school, the school’s grade span (i.e., the grades taught in the school), the racial composition of the student population, Title I funding and eligibility (United States Department of Education 2014), an indicator of socioeconomic disadvantage among the students, and the percent of the student body eligible for free or reduced price lunches.

3.2.5 “Neighborhood” Contextual Data File (Restricted Use Only)

Neighborhood data for both mothers and fathers are measured at each wave. “Neighborhoods” in the FFCWS are defined as each parent’s census tract of residence, though to protect respondent privacy, the tract identifiers are coded, and respondents cannot be matched to a specific tract in the restricted-use files. In addition, some random noise has been introduced into the data to ensure that respondents’ census tracts cannot be identified from the data. This noise should have no impact on analyses. The contextual data file is based on the US Census and contains information on the tracts’ racial composition, local employment and income levels, housing characteristics and rent, and receipt of public assistance.

3.2.6 Macroeconomic Data File (Restricted Use Only)

In addition to tract-level contextual data, the FFCWS has recently released broader contextual data for restricted use. The macroeconomic data file contains information on the national and local economic climate facing both mothers and fathers in the month of their interview. This file is based on MSA-level data from the US Bureau of Labor Statistics, on employment, unemployment, and labor force participation in the MSA containing their sample city, as well as monthly population data in both the MSA and sample state. The file also contains national-level data from the Survey of Consumers on the Consumer Sentiment Index in the month of interview. These data, coupled with the fact that Y9 data collection in the FFCWS took place between August 2007 and March 2010, permit unique analyses of the Great Recession and the effects of economic downturns on families and children (Pilkauskas et al. 2012).

3.2.7 Genetic Data (Restricted Use Only)

In the Y9 wave of data collection , in addition to the survey data collected from parents, children, and teachers, and the administrative data collected to supplement the surveys, the in-home module included the collection of genetic data. More than half of families in the study – and more than 75% of families participating in the Y9 in-home survey – provided noninvasive saliva samples for genotyping. The goal of the genetic data collection was to allow researchers to directly incorporate genetic information into their models of family relationships and child development and to test hypotheses about the relationships between genes, environment, and child health development. Saliva samples were sealed into containers with a liquid preservative and mailed to the survey subcontractor, where they were in turn shipped to the Princeton University Molecular Biology laboratory, where genetic information was extracted and quantified into genetic markers such as single nucleotide polymorphisms (SNPs) and telomere length (CRCW 2013).

The genetic data file includes results from the genotyping of several candidate polymorphisms hypothesized to influence child health development through their interactions with children’s social environments. These candidate polymorphisms include serotonin transporters, dopamine transporter, dopamine D2 receptor, dopamine D4 receptor, catechol-O-methyltransferase, melanocortin 4 receptor, transmembrane protein, and tryptophan hydroxylase. Further information on the specific polymorphisms included in the data file, the genotyping process and quality control mechanisms for each, and valuable references for the study of gene-environment interactions, are available through the Center for Research on Child Wellbeing (2015). At the time of writing, additional data are being developed in conjunction with active research programs, for future release. Data users are encouraged to check the FFCWS home page for up-to-date information.

3.3 Module Participation Rates

Although not all of the data collection modules were completed for each family at each wave, the majority of modules were completed for a substantial majority of families at the waves they were offered. Table 4 provides the participation rates for each of the data collection modules by wave.

Table 4 Participation rates, FFCWS modules by wave

4 Using the FFCWS

As noted in Sect. 3.1, FFCWS data can be obtained through two different processes: public access and the contract process for accessing restricted-use data. The access details described in this section apply only to the public-use file. The remainder of this section lays out details of the FFCWS data, which apply to both the public- and restricted-use data.

4.1 Data Access

The FFCWS public-use files can be accessed through a two-step process: users must first register with the data archive at the Princeton University Office of Population Research and then sign up for access to the FFCWS. Registration applications are usually reviewed within one business day. Public-use files, not including Y9, are also available through the Inter-university Consortium for Political and Social Research (ICPSR); however, the Princeton archive has the most current data files and is preferential for scientific research. Access protocols for the restricted-use file are available from the Center for Research on Child Wellbeing at Princeton University (CRCW 2009a).

4.2 Data Structure

In both the public- and restricted-use datasets, the FFCWS data are structured with one record per child and identified with a unique family identifier, a string variable called idnum . At each wave, mother and father data are stored in separate files and can be merged using the idnum variable to match members of the same family. The other modules described in Sect. 3 can also be merged to the core data using the idnum variable. Each core dataset has records for all 4898 families, regardless of whether each parent was interviewed. Flag variables indicate whether or not a mother or father was interviewed at a given wave, and cases who didn’t participate in a given interview/wave are coded as “not in wave” on all other variables.Footnote 4

4.3 Variable Documentation

The complete merged FFCWS dataset contains more than 10,000 variables, covering a wide variety of domains used in life course health development research. These include parents’ demographic information, parental relationships , relationships the parents may form with new partners, relationships between each parent and the focal child, relationships between parents’ new partners and the focal child, child wellbeing, health, behavior, and other aspects of health development. The data also contain information on the focal child’s physical and social environments, including parental employment, household income and economic wellbeing, parental health and behavior, parental incarceration, social support and other family relationships, housing and neighborhood quality, and access to government programs. Not all domains are covered in each wave of the FFCWS and where each domain appears in the survey varies across waves. Each topic can be located within the five available waves using the FFCWS Core Question Map (CRCW 2009b), which lists in which waves a given topic appears; whether a topic is covered by mothers, fathers, or both; and in which section of the survey. Additionally, there is an In-Home Questionnaire Map for Y3 and Y5. Complete questionnaires are also available on the documentation page of the FFCWS website.

4.3.1 Variable Structure and Construction

Other than the variable idnum, which identifies families across data modules, core FFCWS variables are named according to a consistent convention. Core variable names begin with either an m or an f, depending on whether it was reported by the mother or father, respectively, followed by an indicator of the wave in which it was asked (1 = baseline, 2 = 1 year, 3 = 3 years, 4 = 5 years, and 5 = 9 years), followed by a letter and number to indicate the section and question where it appears in the survey instrument. For example, the variable m1a12 refers to the mother’s baseline survey and the 12th question of section a. Variables from the in-home survey are stored in separate modules and are numbered only with a letter and number to indicate section and question.Footnote 5 The Y9 dataset includes several additional variable prefixes to accommodate the additional modules described in Sect. 3.2 (CRCW 2011).

Variables in the FFCWS data are also characterized by a consistent schema for responses and missing data. All substantive responses appear in the data as positive numbers. Nonnumeric variables are also coded as positive integers, and their substantive meaning is indicated in the questionnaires and embedded variable labels. (For example, “Yes” answers are usually coded as 1, and “No” answers are usually coded as 2.) Missing responses are indicated with negative numbers, with different negative values indicating different reasons that the data are missing. Details on the missing data codes are provided in Table 5.

Table 5 Missing data codes in the FFCWS

The FFCWS dataset also includes numerous variables constructed by CRCW staff to ensure that commonly used concepts are coded consistently by all users. At baseline, examples of constructed variables include the baby’s sex, whether or not the birth was of multiple children (i.e., twins), whether the baby had low birth weight (i.e., less than 2500 grams), the ethnicity and race of both parents, and each parent’s educational attainment at baseline. At Y1, additional variables were constructed to note the mother’s age when she first had a baby and the total number of biological children she had. At Y3, variables were constructed to indicate each parent’s cognitive ability (Wechsler 1981). Several other variables were constructed to indicate household composition (along with a second variable to indicate whether the child’s grandparents lived in the household), fathers’ incarceration histories, whether each parent reported depression and anxiety, and several details of the interview itself – whether it was conducted in Spanish, whether each parent was interviewed, and whether the primary caregiver participated in the in-home study.

4.3.2 Variable Content

The FFCWS data, collected through multiple methods, contains a rich set of child health and behavioral indicators, measures of parental resources and behavior, as well as indicators of parents’ physical mental health, health behaviors, incarceration histories, and relationship stability before and since the focal birth. A summary of information collected to date is provided in Table 6, along with the modes of collection used for each class of data.

Table 6 Summary of data collected, FFCWS baseline through year 9 waves

4.3.3 Tracking Constructs Over Time

One of the strengths of the FFCWS for life course health development research is the consistent measure of multiple substantive constructs across waves of the study. Many of the questions on child health were asked in each wave: parents or primary caregivers are asked to report the focal child’s general health status, whether he or she has any physical disabilities, and whether he or she has had any experience with asthma – a diagnosis or an attack, with particular attention to whether the attack required a visit to the emergency room. Parents and caregivers are also asked to report children’s medical visits at each wave, including doctor’s visits for preventive care, illnesses, and accidents, as well as any visits to the hospital or emergency room. Many indicators of the family’s social environment were also asked in similar formats across waves, such as parents’ reports of relationship quality and co-parenting.

Other family characteristics differ slightly from one wave to the next; for example, parents are asked to report on between 8 and 14 indicators of material hardship in each of the Y1, Y3, and Y5 waves. These measures may be reconciled by analyzing only the eight indicators that appear in all three waves or can be used in their differing format. Still other characteristics, particularly those related to child health development, would be inappropriate to measure in the same way in all waves (e.g., questions on child behavior, cognitive development, and puberty onset). The FFCWS measures are selected and designed to be age-appropriate at the time they are asked. Researchers maintain considerable flexibility in ways to model child cognitive and behavioral health trajectories over time.

5 Looking Ahead

In addition to the rich data that has been collected in the FFCWS to date, the study is currently in the field for a new round of data collection. This data collection is timed around the focal child’s 15th birthday (hereafter “Year 15” or “Y15”) and designed to improve understanding of how children’s experiences in early and middle childhood influence adolescent behaviors. Adolescence is a critical period in human development when children engage in both positive and negative behaviors with lasting consequences for future health and wellbeing. The new wave of the FFCWS will provide new information about how children’s adolescent outcomes are influenced by their experiences in infancy and early childhood.

Data in the Y15 wave is being collected through a series of new measures, as well as measures from existing waves that have been expanded to capture children’s experiences directly from their self-report. Key areas of expansion include sexual activity (and particularly risky sexual activity), school performance and engagement, delinquency, civic and extracurricular participation, pro-social behaviors, pubertal development, substance use and abuse, sleep, physical activity, eating, multimedia exposure, and a variety of other measures of adolescent health and wellbeing. The Y15 survey is also collecting data on the focal child’s relationships with both his or her biological parents, any new partners, and siblings, in order to understand how family complexity and instability influence family interactions and adolescent wellbeing. Finally, the Y15 wave will include new collection of genetic information through additional saliva samples.

The new data will facilitate research on topics related to adolescent risk; the role of the family in shaping adolescent outcomes; racial and ethnic, gender, and income disparities in adolescent health development; and the role of gene-environment interactions in adolescent development. Data collection is scheduled to be completed by 2017, and the data will be cleaned and released for limited and public use in the coming years.

6 Key Resources for Users

In addition to the rich data resources available in the FFCWS, Princeton’s Center for Research on Child Wellbeing (CRCW) and the Columbia Population Research Center (CPRC), as the home institutions of the study, devote considerable material and intellectual resources to the community of data users. In addition to the data and extensive documentation available online, CRCW staff maintain an online database of working papers and publications using the FFCWS. Data users are encouraged to publish their working papers and submit their publications for inclusion in the database. CRCW also publishes a series of research briefs based on FFCWS publications, which are both available from the FFCWS website and distributed to a broad audience of researchers, policymakers, and advocates. Finally, CRCW staff are available to answer questions about the study. In addition, FFCWS researchers host an annual 3-day workshop at CPRC to train new users of the data on the capabilities and use of the dataset. A complete list of FFCWS resources is available in Appendix C.