FormalPara Key Points for Decision Makers

Health interventions that benefit patients can positively affect family members.

Accurate measurement of these spillover effects is necessary to appropriately value health programs and technologies.

Both the Short Form-6 Dimension (SF-6D) and the three-level EuroQoL-5 Dimension (EQ-5D-3L) can be used to estimate potential spillover effects associated with interventions for children with ASD, but the former performed slightly better in this population.

1 Introduction

A common metric used to quantify the effectiveness of healthcare interventions in cost-effectiveness evaluations is the quality-adjusted life-year (QALY), which incorporates both the quantity and quality of life gained [1, 2]. The QALY has a number of useful properties that led to it being the standard for conducting cost-effectiveness analysis [2]. One area of concern, however, is the fact that, in practice, QALY measurement for cost-effectiveness analysis typically focuses solely on the health effects accruing to patients, as if these were isolated individuals [3]. By now, it has been shown that health effects in patients are typically associated with substantial spillover effects on the health and well-being of caregivers and family members [4,5,6]. Failure to include such spillover effects in economic evaluations can lead to a misrepresentation of the burden of disease and the benefits of health interventions [7]. This, in turn, may lead to suboptimal decisions, both from a healthcare and a societal perspective [8].

Regulatory agencies now recognize the need to incorporate spillover effects in economic evaluations. Both the National Institute for Health and Care Excellence (NICE) and the Second US Panel on Cost-Effectiveness in Health and Medicine recognized the potential for spillover effects to influence estimated cost-effectiveness ratios and recommend including them in a reference case analysis [9,10,11,12]. The Second US Panel also emphasized the importance of increasing research efforts on clarifying how to incorporate family and caregiver spillover effects in economic evaluations [10].

Despite the recognition for the need to include health spillover effects when valuing health interventions, little guidance exists for including spillover effects in cost-effectiveness analysis [8]. For example, there is no guidance for incorporating spillover effects into a cost-effectiveness analysis in the context of clinical trials that could inform regulatory agencies. One option would be to capture these effects by measuring health states across trial arms for patients, caregivers, and family members. This results in a focus on health (rather than well-being), which has the advantage of being the most relevant outcome in most studies and decision-making contexts, making effects comparable across groups and able to be aggregated. In the design of such clinical trials, decisions need to be made about which instruments are able to capture spillover effects in QALY terms. Early research on spillovers in Alzheimer’s disease attempted to estimate effects by comparing caregiver outcomes across clinical characteristics such as stage of disease and setting [13], but likely failed because the instrument was not sensitive or did not discriminate well [7, 14]. Subsequent analysis showed that traditional measures of burden and health changed in the expected direction, but the Health Utilities Index Mark 2 (HUI-2) did not capture these changes [14]. While a large literature has emerged that allows us to understand whether a given instrument is valid for measuring QALYs for different conditions affecting patient populations [15,16,17], research identifying instruments that are valid and responsive in measuring spillover effects in caregivers and family members remains understudied. Indeed, we are aware of only two studies that have compared different generic preference-weighted instruments to measure spillover effects. Payakachat et al. [18] compared three preference-weighted health instruments to measure spillover effects among caregivers of children with craniofacial malformations. Bhadhuri et al. [19] compared two preference-weighted instruments to measure spillover effects among family members of meningitis survivors.

Family and caregiver spillover effects, in terms of health and well-being, may be particularly pronounced in child health interventions [20,21,22] and for mental health conditions where social support systems may be lacking [23, 24]. Interventions for children with autism spectrum disorder (ASD) have the potential for substantial spillover effects in caregivers and family members due to an increased prevalence of psychiatric and medical co-morbidities such as anxiety, behavioral problems, sleep disturbance, and cognitive issues [25,26,27]. Preventing symptoms of ASD in the child is thus likely to improve family and caregiver health and reduce burden [22].

Given the potential for interventions such as medications or applied behavioral therapy to improve the health of children with ASD, a comparison of generic preference-weighted instruments to determine whether they capture spillover effects in QALY terms of health associated with treatment for children with ASD appears warranted. Therefore, the purpose of this study was to assess the ability of two commonly used generic preference-weighted instruments, the three-level EuroQol-5 Dimension instrument (EQ-5D-3L) and the Short Form-6 Dimension (SF-6D) [28, 29], derived from the 12-item Short Form survey version 2.0 (SF-12 v2.0), to value spillover effects in caregivers of children with ASD in order to provide guidance about their use, especially in the context of clinical trials and other approaches where indirect elicitation techniques are warranted.

2 Methods and Participants

2.1 Data Collection

This study is a secondary analysis of data we previously collected from two Autism Treatment Network (ATN) sites (a developmental center in Little Rock, AR, USA and an outpatient psychiatric clinic at Columbia University Medical Center in New York, NY, USA). The dataset consists of clinical registry data and data collected via a postal survey of caregivers of children aged 4–17 years old with an ASD diagnosis, which was clinically determined by Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) criteria [30]. The ATN sites collected clinical registry data of children with ASD that included diagnostic, cognitive, behavioral, and physical assessments. Caregivers of children with ASD registered at the ATN sites who had agreed to be contacted for future research were mailed a postal survey and asked to report on instruments describing their child with ASD and themselves. Data were collected from 2010 to 2012. Around 10% and 5% of families in Little Rock and Columbia opted out of being contacted for future research. The study protocol was approved by all of the institutions involved in the study. A more detailed description of the data collection procedures is outlined in Hoefman et al. [31] and Tilford et al. [32].

2.2 Instruments

2.2.1 Information on Children with Autism Spectrum Disorder

A number of clinical and health-related quality of life (HR-QOL) measures for children with ASD were selected from the ATN assessments, including autism severity scores (Autism Diagnostic Observation Schedule [ADOS] [33]), adaptive behavior scores (Vineland Adaptive Behavior Scales Second Edition [Vineland-II] [34]), cognitive ability (IQ; the Stanford-Binet Intelligence Scales [35], the Mullen Scales of Early Learning [36], or the Bayley Scales of Infant and Toddler Development [37], depending on the child’s age), emotional and behavioral problems (Child Behavior Checklist [CBCL] [38]), sleep behavior (Children’s Sleep Habits Questionnaire [CSHQ] [39]), and pediatric quality of life measures (the Pediatric Quality of Life Inventory™ 4.0 [PedsQL™] [40] and the Health Utilities Index Mark 3 (HUI-3) [41]). Higher Vineland-II and IQ scores indicate better child adaptive behavior and cognitive ability, respectively. Higher ADOS, CBCL, or CSHQ scores indicate increased autism severity, maladaptive behavior, or worse sleep behaviors, respectively. Higher scores on the PedsQL™ and HUI-3 indicate better child HR-QOL. Responses for the PedsQL™, HUI-3, Vineland-II, CBCL, and CSHQ were reported by the caregiver; other child data came from clinical assessment. Further details on the measures can be found in the Electronic Supplementary Material.

We additionally obtained information on the child’s age and gender. The child’s age was included in the analysis given the evidence that maladaptive behaviors among individuals with ASD improve with age, suggesting that younger children with ASD may require increased caregiving relative to older children with ASD [42, 43]. Child behavior and health conditions can be expected to influence caregiver burden and ultimately caregiver HR-QOL following the logic in previous studies [13, 18, 31, 44, 45].

2.2.2 Information on Caregivers

Caregivers reported information on their demographic characteristics, depressive symptoms (Center for Epidemiologic Studies Depression Scale [CES-D] [46]), care-related quality of life (Care-related Quality of Life instrument [CarerQol-7D] [47, 48]), family-related quality of life (Family Quality of Life Scale [FQLS] [49]), and the EQ-5D-3L and SF-12 v2.0 instruments measuring HR-QOL. All of the caregiver information was provided through the postal survey.

The EQ-5D-3L (range − 0.109 to 1) measures health utility using five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) with three response options each [28, 50]. The SF-12 v2.0 was used to derive the SF-6D (range 0.3–1), which contains six dimensions of health (physical functioning, role limitations, social functioning, pain, energy, and mental health) [29]. Both the EQ-5D-3L and SF-6D measures provide a health utility score where 0 equals the score related to the health state of dead, 1 represents the state of perfect health, and scores less than 0 represent states worse than death.

Higher scores on the CarerQol-7D, FQLS, EQ-5D-3L, and SF-6D indicate higher levels of quality of life. Higher scores on the CES-D indicate worse depressive symptoms. Details on each of the measures can be found in the Electronic Supplementary Material.

3 Statistical Analysis

3.1 Descriptive Statistics

Domain and health utility scores were calculated using weights from the US general population for the EQ-5D-3L and SF-6D [51, 52]. Descriptive statistics were calculated for child and caregiver demographics, health, and HR-QOL.

3.2 Validation of the Three-Level EuroQoL-5 Dimension (EQ-5D-3L) and the Short Form-6 Dimension (SF-6D) to Assess Spillover Effects

To study the ability of the two generic preference-weighted instruments to assess spillover effects in caregivers of children with ASD, we first compared the convergent validity and clinical validity of the SF-6D and EQ-5D-3L in measuring caregiver HR-QOL. Failure to meet the criteria for convergent validity and clinical validity for measuring caregiver HR-QOL would discredit the ability of the SF-6D and EQ-5D-3L for measuring spillover effects. We then assessed the discriminative power and clinical validity of the SF-6D and EQ-5D-3L in specifically measuring spillover effects related to caring for a child with ASD. In the discriminative power and clinical validity analyses, comparisons were made using both caregiver and child measures to assess the two preference-weighted instruments in their ability to measure spillover effects. Statistical significance was assumed at p < 0.05, using Bonferroni correction when multiple comparisons were performed simultaneously.

3.2.1 Convergent Validity of EQ-5D-3L and SF-6D

Convergent validity is the agreement between two instruments that are measuring the same theoretical construct, namely caregiver health effects [53]. In this study, we analyzed the agreement between the EQ-5D-3L and the SF-6D and agreement between these HR-QOL measures with validated measures of outcomes in caregivers (i.e., CES-D and CarerQol-7D) using Spearman’s ρ correlation. Both the CES-D and CarerQol-7D measure quality-of-life outcomes in caregivers; thus, a strong correlation between the EQ-5D-3L and SF-6D with each other and with other caregiver measures will indicate the ability of the EQ-5D-3L and SF-6D to capture HR-QOL effects. Following convention, Spearman’s ρ between 0.10 and 0.29, 0.30 and 0.49, and > 0.50 were classified as weak, moderate, and strong effect sizes, respectively [54, 55].

Further, to evaluate the agreement between the EQ-5D-3L and SF-6D, a Bland–Altman plot was created, which uses a scatter plot to display the difference between the two health utility scores for a given caregiver and the average of the two scores for that caregiver [56]. This allows a comparison of whether the scores are similar across the range of HR-QOL.

Based on findings in other patient populations, we hypothesized that positive agreement would exist between the health utility scores of the two preference-weighted measures of caregiver health [15, 57]. Similarly, we hypothesized that both of the health utility measures will be positively correlated with the CarerQol-7D and negatively correlated with the CES-D [18, 58].

3.2.2 Clinical Validity of EQ-5D-3L and SF-6D

Clinical validity describes how differences in clinical- or behavioral-related characteristics are reflected in an individual’s instrument score [59]. Clinical validity was assessed using a one-way analysis of variance (ANOVA) to test differences in the average health utility scores of caregivers (EQ-5D-3L and SF-6D) with different characteristics. First, we compared mean scores on the EQ-5D-3L and SF-6D among subgroups of caregivers based on the caregiver’s number of hours sleep per night, depressive symptoms (CES-D), care-related quality of life (Carer-Qol-7D), and family-related quality of life (FQLS) to assess the clinical validity of the EQ-5D-3L and SF-6D. In this analysis, the criteria for clinical validity was met if higher health utility scores measured by the EQ-5D-3L and SF-6D were associated with increased hours of sleep, caregiver quality of life (Carer-Qol-7D), and family-related qualify of life (FQLS), and decreased depressive symptoms (CES-D).

Next, we compared the average health utility scores for caregivers in relation to the child’s IQ level, autism severity (ADOS score), HR-QOL (PedsQL™ and HUI-3), behavioral characteristics (Vineland-II and CBCL), and sleep behavior (CSHQ) to evaluate the ability of the EQ-5D-3L and SF-6D to capture differences in caregiver HR-QOL in relation to measures of child health and behavioral problems. In this analysis, the criteria for clinical validity was met if higher health utility scores measured by the EQ-5D-3L and SF-6D were associated with lower levels of autism severity (ADOS scores) and maladaptive behavior (Vineland-II and CBCL), better sleep habits (CSHQ), and higher levels of child IQ and HR-QOL (PedsQL™ and HUI-3).

It was expected that caregivers who were caring for children with higher ASD severity or worse behavioral problems would have lower scores on both preference-weighted instruments, suggesting spillover effects of caring for a child with ASD [44]. It was also expected that caregivers of children who were younger in age would have lower HR-QOL, given the increased burden of caring for younger children with ASD [42, 43].

3.2.3 Discriminative Power of EQ-5D-3L and SF-6D

Discriminative power quantifies whether an instrument is sensitive to differences in comparator outcomes across different response levels of the preference-weighted instruments [60, 61]. Discriminative power was tested by comparing the mean values of outcomes in children for each level of the EQ-5D-3L and SF-6D dimensions in caregivers using two-way ttests. To quantify differences in discriminative power between the SF-6D and the EQ-5D-3L, we compared the percentage of statistically significant t tests of the total number of t tests calculated for each of the two preference-weighted instruments. This differs from our analysis of clinical validity, which compared average caregiver health utility scores among different classifications of health- or behavior-related characteristics of the child or caregiver. Responses to each domain were categorized as having “no problems” related to the given domain or as having “at least some” of the indicated problem, with the exception of the vitality domain of the SF-6D, which was categorized as “no or some problems” and “moderate or severe problems” given the limited number of responses of “no problems” for this domain.

It was anticipated that caregivers who reported “no problems” on a given domain would be associated with better child outcomes, including higher health utility scores (PedsQL™ and HUI-3), better adaptive behavior scores (Vineland-II), and fewer behavioral and emotional problems (CBCL).

4 Results

The study sample contained 224 caregivers of children with ASD. The response rate for the completion of the postal survey components was 115 of 220 (52%) at one ATN site and 109 of 179 (61%) at the second ATN site. In our sample, 90% of caregivers were female, around 60% had at least a college education, and 76% were married (Table 1). The average caregiver age was 39.4 years, and the average child age was 8.4 years. Of the children included in the study sample, 87% were male (Table 1). A complete description of the demographic characteristics of the study sample can be found in Hoefman et al. [31].

Table 1 Characteristics of children with autism spectrum disorder and their caregivers, n = 224a

4.1 SF-6D and EQ-5D-3L Scores

The EQ-5D-3L had a higher average health utility index score than the SF-6D (mean 0.847 vs. 0.741) and greater variation (standard deviation 0.139 vs. 0.119). EQ-5D-3L scores ranged from 0.308 to 1.000, and SF-6D scores ranged from 0.378 to 1.000.

4.2 Validity of the EQ-5D-3L and SF-6D in Measuring Health-Related Quality of Life of Caregivers

4.2.1 Convergent Validity

Health utility index scores of the EQ-5D-3L and the SF-6D were strongly correlated (ρ = 0.617, p < 0.001) (Table 2). The Bland–Altman plot (Fig. 1) illustrates that the difference in a caregiver’s EQ-5D-3L and SF-6D scores decreased as the average of the two scores increased, suggesting a higher level of agreement (i.e., more similar scores) for caregivers with better health than in caregivers with poorer health.

Table 2 Convergent validity: Spearman’s ρ correlation between preference-weighted instruments and other caregiver measures for caregivers of children with autism spectrum disorder, n = 224a
Fig. 1
figure 1

Bland Altman plot demonstrating the relationship between the different between each health utility score with the average between the two scores for each parent (Combination Art, created in STATA® and edited in Adobe Photoshop). EQ-5D EuroQoL-5 Dimension, SF-6D Short Form-6 Dimension

4.2.2 Clinical Validity

Both instruments demonstrated clinical validity with respect to caregiver HR-QOL. As expected, health utility scores differed significantly among caregivers with fewer hours of sleep per night, more depressive symptoms, lower caregiver-related quality of life, and lower family-related quality of life, indicating that both the SF-6D and EQ-5D-3L were sensitive to differences among caregivers (Table 3).

Table 3 Clinical validity: one-way analysis of variance comparing mean EQ-5D-3L and SF-6D health utility index scores of caregivers with different demographic characteristics, n = 224

4.3 Ability of the SF-6D and EQ-5D-3L to Measure Spillover Effects in Caregivers

4.3.1 Discriminative Power

Caregivers who responded as having “no problems” on all six of the SF-6D domains and on two of the EQ-5D-3L domains (usual activities and anxiety/depression) had children with higher quality of life measured by the PedsQL™ (Table 4). For example, caregivers who reported “at least some” mental health problems on the SF-6D or “at least some” anxiety/depression problems the EQ-5D-3L had children with lower average HR-QOL (PedsQL™) than parents with “no problems” (61.7 vs. 69.8, p = 0.01, and 59.5 vs. 66.3, p < 0.001, respectively). Overall, there was a greater percentage of significant t tests for the SF-6D (63%) than for the EQ-5D-3L (25%). There was only one domain (SF-6D role limitations) with significant differences in child adaptive behavior scores (Vineland-II) (Table 4).

Table 4 Discriminative power: t tests comparing the mean scores of child outcomes (PedsQL™, HUI-3, Vineland-II, CBCL) for domain responses on caregiver preference-weighted instruments, n = 224

4.3.2 Clinical Validity

Results related to clinical validity favored the SF-6D. For caregivers of children with increased behavior problems (indicated by higher CBCL or lower Vineland-II scores) or with lower quality of life (indicated by lower PedsQL™ or HUI-3 scores), the SF-6D captured significantly different caregiver HR-QOL scores for two of the child measures (PedsQL™ and CBCL). In addition, the EQ-5D-3L and SF-6D both captured significantly different caregiver quality of life for one of the child measures (HUI-3) (Table 5). For example, caregivers with a child whose HUI-3 score was below the sample average of 0.659 (indicating worse child HR-QOL) had a significantly lower SF-6D score (0.712 vs. 0.762, p = 0.003) and a significantly lower EQ-5D-3L score (0.819 vs. 0.867, p = 0.013) than caregivers with a child whose HUI-3 score was above the sample average. Although the average SF-6D and EQ-5D-3L scores differed, the difference between health utility scores for caregivers of children with above- or below-average HR-QOL (HUI-3) were of similar magnitude for the SF-6D and EQ-5D-3L. There was not a significant difference in HR-QOL among caregivers of children with an above- or below-average age, IQ, or autism severity (ADOS) score or for children with different autism diagnoses using either caregiver preference-weighted instrument (Table 5).

Table 5 Clinical validity: one-way analyses of variance comparing mean EQ-5D-3L and SF-6D health utility index scores of caregivers with children who have different demographic characteristics, n = 224

5 Discussion

The Second US Panel on Cost-Effectiveness Analysis in Health and Medicine and other government agencies around the world have emphasized that research on valuing spillover effects in family and caregivers is warranted [9,10,11,12]. While spillover effects can also be studied in a broader context, increasing knowledge of the health effects among caregivers and family members in QALY terms is highly relevant and consistent with a societal perspective [4] and a healthcare perspective [8]. A number of approaches have been adopted to value spillover effects in QALY terms, including direct and indirect elicitation [62,63,64,65]. In the context of clinical trials and other intervention studies, indirect elicitation techniques are likely to be favored, as is the case with measuring patient QALYs, and require guidance about appropriate instruments consistent with the large literature devoted to identifying the most appropriate instrument for patient conditions. Despite the obvious appeal of indirect elicitation techniques for capturing spillover QALYs, research on comparing different instruments is lacking.

The need for guidance on different approaches for measuring spillover effects in QALY terms is especially important given the recent change in recommendations by the Second US Panel. Traditionally, spillover effects were included in cost-effectiveness analyses using monetary costs, often measured by the additional time devoted by a caregiver to caring for the patient. Inclusion of non-monetary values, or QALYs of caregivers and other family members, along with monetary costs raised concerns about double-counting [65]. Indeed, a recent review of methods for valuing informal care offered guidance for including spillover effects in either monetary or non-monetary terms because of issues with double counting and other concerns [63]. The Second US Panel now recommends the inclusion of both monetary and non-monetary spillover effects in economic evaluations [10].

This study compared the EQ-5D-3L and SF-6D with respect to their ability to capture spillover effects in caregivers of a child with ASD. To compare the instruments, we first assessed whether they would provide similar results for similar caregivers. In particular, if the two instruments were correlated with each other and with other measures of caregiving quality of life or health, it would suggest the measures were valid instruments. Both measures demonstrated convergent validity as they were strongly correlated with each other, the CarerQol, and the CES-D. While the SF-6D exhibited a stronger correlation with the CES-D, the EQ-5D-3L exceeded criteria for a strong correlation.

Second, we assessed whether the instruments would provide similar results in relation to the characteristics of the child with ASD. In particular, scores on measures of child health were compared in relation to the top and bottom of the distributions for the two instruments as well as the difference in instrument scores in response to changes in the child health measures. Significant differences in child health scores in relation to the distributions of the two instruments demonstrates discriminative power while differences in instrument scores in relation to differences in child health measures demonstrates clinical validity. Both instruments demonstrated discriminative ability; however, the SF-6D had a greater percentage of significant findings than the EQ-5D-3L. With respect to clinical validity, the SF-6D similarly performed slightly better on the measures of child health, with significant differences in average health utility scores relative to scores on four of the five child measures (PedsQL™, HUI-3, CBCL, and CSHQ) compared with significant differences for two child measures (HUI-3 and CSHQ) for the EQ-5D-3L. Neither measure was associated with the child’s age, IQ, autism severity, or diagnosis. Significant differences in average scores across the child health measures, but not the child’s age, IQ, or autism diagnosis, indicates that it is differences in child health and behavior that drive spillover effects among caregivers.

Based on the comparison of the two instruments in this study, some guidance can be offered for those interested in developing clinical studies to measure caregiver spillover effects associated with caring for a child with autism. Either the SF-6D or the EQ-5D-3L are likely to capture health effects among caregivers in QALY terms for interventions or changes in the clinical characteristics of children with autism that are associated with measurable health effects for the child. Interventions such as new molecules for the treatment of behavior problems that produce meaningful changes in the CBCL are likely to have spillover effects for the caregiver that can be captured by standard preference-weighted instruments such as the SF-6D or the EQ-5D-3L, and we recommend their inclusion in clinical trials and other research designs that can identify causal effects.

Researchers such as Hoefman et al. [48] suggest that the effects of caregiving on caregivers can be measured with the same preference-weighted instruments used to measure HR-QOL in patients. Surprisingly, few studies have assessed preference-weighted instruments to determine whether they are sensitive or responsive for measuring caregiver or family spillover effects. Given that regulatory agencies recommend indirect elicitation with preference-weighted instruments to measure patient QALYs [9,10,11,12], more research appears warranted to compare preference-weighted instruments for measuring spillover effects in other contexts, such as adult children caring for their parents, and other conditions in children, including somatic and mental health conditions.

Several limitations to the study should be noted. First, we limited the caregiver quality-of-life and health measures to two previously validated instruments: the CarerQoL-7D and CES-D. It can be argued that the CarerQoL-7D captures a different construct (burden of caregiving) than the health utility instruments (HR-QOL) and the CES-D is limited to mental health problems. Still, both of these measures should be correlated with health utility measures as greater caregiver burden translates into worse HR-QOL. This was the case in this study and, more importantly, the comparison of the EQ-5D-3L and the SF-6D demonstrated strong correlations with our measures of caregiver burden. The information produced with caregiver-specific instruments such as the CarerQol-7D may be more appropriate in evaluations of interventions targeted at caregivers specifically given that the CarerQol-7D measures the impact of caregiving beyond health effects [31, 48].

Second, we relied on caregiver self-reports regarding health states for themselves and their children. This approach may lead to problems of endogeneity, especially in study designs where treatment effects cannot be identified. Alternative designs, where the child is rated by a family member other than the primary caregiver, may provide an indication of the extent to which caregivers project their own health states onto the rating of their children [63]. Clinical studies based on exogenous instruments, such as randomization or disease states, are likely to limit problems with endogeneity and can identify spillover effects using indirect elicitation techniques. In addition, there have been considerable methodological advances in using direct elicitation techniques to measure spillover effects [64]. Direct elicitation techniques may be particularly sensitive to a given population and expanding research on direct elicitation techniques to include caregivers of children with ASD could supplement the findings from our study.

Third, the comparisons were based on the EQ-5D-3L, which has only three response levels per construct, rather than the EQ-5D-5L which has five response levels. The EQ-5D-5L may have increased validity and discriminative power [66] and has been suggested to have greater responsiveness than SF-6D in other caregiver contexts [19]. Our finding that both the SF-6D and EQ-5D-3L can be used to capture spillover effects for interventions involving children with autism remains and likely translates to the use of the EQ-5D-5L. Finally, we limited this investigation to health-related spillover effects in primary caregivers. Broader investigations, including observing effects in other family members and effects beyond health in an extra-welfarist context, remain important as well [8].

6 Conclusion

Capturing spillover effects in cost-effectiveness analyses is necessary to ensure accurate valuations of healthcare interventions and programs. Our comparison of the SF-6D and EQ-5D-3L health utility instruments indicated that both can capture health-related spillover effects in terms of health among caregivers of children with ASD, although the SF-6D had stronger discriminative power and clinical validity in this context. The findings provide useful information for researchers and practitioners interested in developing protocols for measuring caregiver spillover effects in QALY terms. It is feasible to use indirect assessment in clinical studies to measure caregiver spillover effects associated with interventions to improve the health of children with autism. Regulatory agencies recommend indirect assessment to measure patient QALYs, and this study provides evidence for recommending a similar approach to capture caregiver spillover effects associated with child health conditions such as ASD.