Background

The Recovering Quality of Life (ReQoL) is a self-report instrument to measure health outcomes for people with mental health difficulties [1]. Two versions of the measures, ReQoL-10 and ReQoL-20, have been constructed for use in routine practice as well as in research including clinical trials. They are self-report measures suitable for use by individuals aged 16 and over experiencing a wide spectrum of mental health conditions and levels of severity. Traditionally, mental health outcomes have tended to be symptom-based rather than reflecting the service users’ recovery in their quality of life. While there are measures focusing on the process of recovery [2], a recent review identified the need for a patient-reported outcome measure (PROM) that measures the outcomes of recovery in terms of those aspects of quality of life that matter to mental health service users [3]. Hence the ReQoL measures were developed. The initial psychometric analysis of the measures demonstrated acceptable scaling properties, reliability, validity and responsiveness [1].

The two ReQoL measures tap seven domains that are relevant to the recovery of people with mental health issues: activity (meaningful and/or structured); belonging and relationships; choice, control and autonomy; hope; self-perception; well-being and physical health. These domains were identified in a systematic review of qualitative research on Quality of Life (QoL) in mental health and from interviews with service users [4,5,6]. Service users reported both positive and negative aspects of the themes (e.g. hope/hopelessness), which either enhanced or diminished their quality of life. Connell et al. [6] presented the justification that it was important for both positive and negative aspects to be reflected in any QoL measure in mental health. Similar themes had been found elsewhere in the literature. Leamy et al. [7] identified the following themes which map onto the ReQoL themes identified in parentheses: connectedness (belonging and relationships), hope (hope), identity (self-perception), meaning (meaningful activity) and empowerment (choice, control and autonomy). Therefore, the six mental health themes and the one physical theme formed the conceptual basis for the ReQoL measures (Fig. 1).

Fig. 1
figure 1

Development process of the ReQoL

The items used in the ReQoL measures were generated in two stages. In stage I, items were adapted from existing measures, from qualitative interview manuscripts and new items were also generated. In all cases, the criteria suggested by Streiner and Norman were used to shortlist and refine items to enhance comprehension and avoid misinterpretation [8]. In stage II, a total of 88 items were presented to 76 service users who were asked to assess their suitability. This produced 61 items that were used in subsequent psychometric analyses [9].

In this paper, we present the results of factor analyses carried out to establish the latent structure of the ReQoL item sets that would be used to construct the final ReQoL measures. This was an important step for the following reasons. First, establishing a reproducible understanding of the latent structure of the ReQoL items addresses internal construct validity—the degree to which the measure assesses the themes of interest—as well as facilitates a higher fidelity and more accurate perspective when it comes to external scale validation [9]. Second, assessment of dimensionality is important for informing the scoring procedures. Third, an assumption of sufficient unidimensionality typically needs to be satisfied before undertaking item response theory (IRT) model estimation for the purpose of item reduction. Furthermore, unidimensionality is useful for interpreting statistics such as internal consistency. Accordingly, the main aim of this paper is to describe results regarding the factorial structure of the ReQoL item sets.

Mental health questionnaires often report a factor solution involving two correlated factors: positive mental health and negative mental health [10]. It is unclear to what extent such two-factor structure reflects a clear separation of positive and negative mental health or rather a methodological artefact of intermingling positive and negative items. With such intermingling, careless answering would result in a two-factor structure, even in the case of a unidimensional mental health domain. Thus, a secondary aim of the paper is to investigate the impact of the ordering of positively worded and the negatively worded items on the factor structure.

Methods

Study participants

Two sequential studies were conducted. In study 1, 1762 participants were recruited from four different sources: (i) 13 provider organisations, (ii) 3 GP surgeries (n = 3), (iii) 2 charities and (iv) from a trial cohort. An additional further 520 participants were recruited from an online panel, 200 of whom that had no mental health problem(s). As shown in Table 1, the total number of participants recruited in study 1 was 2262. In study 2, 4266 participants were recruited in the following broad diagnostic groups: depression, anxiety, psychotic disorders, bipolar disorders, personality disorders, eating disorders and others (severity ranged from mild to severe). Participants in study 2 were recruited from similar sources as in study 1; both samples are discussed in greater detail elsewhere [1]. The mean age of participants was 48 and 47 years, and the percentage rates of females were 58% and 55% in studies 1 and 2, respectively. Data were collected at one time-point only in study 1 but study 2 participants were followed-up between 6 and 12 weeks. In this paper, only baseline data are analysed from study 2. The raw scores and endorsement frequencies are presented in Supplementary Tables S1 and S2 showing the heterogeneity of the samples.

Table 1 Description of the samples

ReQoL item sets

Participants in study 1 were presented with 61 items (57 that address mental health and four addressing physical health). The aim was to reduce the number of items to 40 for presentation in a shorter questionnaire in subsequent study 2. The item reduction process combined insights from qualitative research and quantitative analysis and is discussed elsewhere [18]. The 40-item ReQoL item set presented in study 2 comprised 39 mental health questions and one physical health question including a mix of both positively worded (e.g. I felt happy) and negatively worded items (e.g. I felt unable to cope). The themes of the ReQoL and the number of items contributing to each theme in both items sets are described in Table 2. Participants responded on a frequency-based 5-point scale with the following verbal anchors for each ReQoL item: none of the time, only occasionally, sometimes, often and most or all of the time.

Table 2 Themes with positive and negative sub-themes with number of items

Presentation of the ReQoL items

Participants in study 2 were presented with one of the two formats of the item set: (i) a format where the positively worded and negatively worded questions were intermingled, that is not grouped in any way (henceforth referred to as format I) and (ii) a format where all the negatively worded questions were presented as a block followed by the positively worded questions (henceforth referred to as format B). If a two-factor structure is to some extent due to intermingling of positive and negative items, we would expect a clearer two-factor structure in the intermingled format than in the block format. Each recruiting organisation used only one of the two formats. A total of 2447 (from 16 recruiting organisations) and 1819 participants (from 12 organisations) completed format I and B, respectively.

Statistical analyses

The ReQoL was constructed using a conceptual framework comprising six mental health domains and a physical health domain [5, 6]. Therefore, confirmatory factor analysis (CFA) was undertaken first to test an a priori model consistent with the number of themes underpinning ReQoL item content. In Fig. 2a, this is shown diagrammatically, with items in each theme loading only on their respective (intended) latent factor and with all factors allowed to correlate.

Fig. 2
figure 2

a Six-factor correlated traits model. b Two-factor correlated traits model. c Bi-factor model

Second, exploratory factor analysis (EFA) was also carried out using Geomin rotation to identify other potential factors present in the data. Finally, informed by the initial CFA and EFA, single factor, two-factor (Fig. 2b) and bi-factor (Fig. 2c) CFA models were estimated. A key advantage of CFA in this context is that it allows the comparison of competing models. Exploratory, confirmatory factor and bi-factor analyses were performed treating the items as ordinal categorical, using the robust-weighted least squares means and variance-adjusted (WLSMV) estimator [11] in Mplus 7.4 [12]. Model fit was assessed by the root mean square error of approximation (RMSEA) [13] and the comparative fit index (CFI) [14] where a value of ≤ 0.08 and > 0.95 was assumed to provide a good fit, respectively. In the bi-factor models that have all items loading on a common factor, we calculated the explained common variance for the global factor. This gauges the importance of the global factor relative to others [15]. The sequence of factor analysis models described above was then repeated in the study 2 dataset for the smaller item set of 40 items. To obtain finer factor solutions, residual correlations and modification indices (MI) were inspected to identify potentially redundant items [16, 17]. In the final models, local dependence between items (where the latent variables are not sufficient to explain the association between items) was introduced guided by:

  1. (i)

    the highest MI (> 100) [17] or

  2. (ii)

    whether the pairs of items had been identified as conceptually similar at the time of item generation and from qualitative evidence [18], by constraining the pair of items as free parameter estimates in model revisions, one at a time.

Impact of ordering of items on the factorial structure

The design of study 2 enabled us to compare latent factor structure for datasets collected using different item ordering formats of the item set. We used multigroup CFA approach, where the persons answering the block item order constituted one group and the persons answering the intermingled format, the other group. We estimated a two-factor CFA model with the correlation between the two factors constrained to being equal in across groups and compared this to a model where this correlation was allowed to vary between groups. A significant result would indicate that the correlations between the positive and negative factors in the two formats were different which, in turn, would indicate that the ordering of the question impacted on the results. In these analyses, factor loading and item thresholds were constrained to be equal across groups. We evaluated global fit of the model, but since our research question was operationalized as a test of the equality of factor correlations, we did not test whether each item loading or item threshold differed between the groups. Finally, we sought to establish to what extent factor solutions suggested invariance across groups within studies and across samples [19].

Results

Initial confirmatory factor analyses results (studies 1 and 2)

Results and model fit statistics obtained for CFA factor solutions for both studies (study 1; study 2) are shown in Table 3. The hypothesised six-factor model did not achieve adequate fit (study 1: RMSEA = 0.087 CFI = 0.900; study 2: RMSEA = 0.095 CFI = 0.937). In study 1, the standardised factor loadings ranged from 0.493 (I had choices about what I did) to 0.903 (My life seemed pointless). The correlations between factors were very high ranging from 0.840 (belonging and well-being) to 0.954 (choice and activity). In study 2, the standardised factor loadings ranged from 0.645 (I had problems with my sleep) to 0.909 (I felt like a failure). The correlations between factors were very high ranging from 0.878 (activity and belonging) to 0.974 (choice and activity).

Table 3 Fit statistics from confirmatory factor analytic models (without model revisions)

Exploratory factor analyses results (studies 1 and 2)

Eigenvalue analysis in (study 1, study 2) identified a strong first factor and a weak second factor as shown in Table 4 with eigenvalues for the first two factors at (31.4 and 3.4; 23.9 and 2.3). In the Geomin-rotated two-factor EFA solution, the first factor comprised the negatively worded items and the second factor the positively worded items. The inter-correlation between factors was .789 in study 1 and .80 in study 2.

Table 4 Eigenvalues from exploratory factor analysis

Estimating two-factor and bi-factor CFA models (studies 1 and 2)

The two-factor CFA model with the factors for positively worded items and negatively worded items returned the following model fit results: (study 1: RMSEA = 0.068 CFI = 0.940; study 2: RMSEA = 0.079 CFI = 0.955). The bi-factor model also had acceptable fit (RMSEA = 0.070 CFI = 0.940; RMSEA = 0.077 CFI = 0.960). ReQoL item loadings on the general/global factor were substantially higher than the two specific/group factors. Explained common variance (ECV) values were 78.9% in study 1 and 84.5% in study 2. (Tables S3, S4 in the supplementary materials show the factor loadings.)

CFA models and modelling of local correlations (studies 1 and 2)

It was possible to identify and therefore extend our models to account for areas of strain within factor solutions, through local correlations. The model results are presented in Table 5. The two-factor models in both studies had very similar and acceptable fit (RMSEA = 0.066 CFI = 0.945; RMSEA = 0.064 CFI = 0.971). In study 1, the standardised factor loadings ranged from 0.495 (There were people I could turn to for help) to 0.896 (My life seemed pointless). The correlation between the two factors was .833. This fit was obtained after allowing three pairs of items to correlate with each other (see Supplementary Online Materials—Table S3). In study 2, the standardised factor loadings ranged from 0.656 (I had problems with my sleep) to 0.901 (I felt hopeless; Everything in my life felt bad). The correlation between the two factors was .835 (Table 5 for study 2 factor loadings).

Table 5 Fit statistics from confirmatory factor analytic models (with local correlations)

The bi-factor models in both studies achieved acceptable fit (RMSEA = 0.061 CFI = 0.954; RMSEA = 0.066 CFI = 0.971) at very similar levels to the two-factor models. ECVs were 79.2% and 80.9% for general factors for study 1 and 2, respectively. In study 1, the standardised factor loadings for the 61-item set general factor were substantially higher than the two group/specific factors. The factor loadings of the general factor ranged from 0.424 (There were people I could turn to for help) to 0.892 (I felt confident in myself) while the factor loadings for the “negatively worded items-specific factor” ranged between 0.285 (I had problems with my sleep) and 0.518 (I felt panic), and 18 out of the 23 “positively worded items-specific factor” had loadings less than 0.3 (see Table S3). As shown in Table 6, in study 2, the factor loadings for the ReQoL general factor ranged from 0.571 (People around me cause me distress) to 0.873 (I felt confident in myself). The loadings for the “negatively worded items-specific factor” were between 0.300 (I found it difficult to get started with everyday task) and 0.483 (I felt terrified). 11 out of the 15 loadings on the “positively worded items-specific factor” were lower than 0.3.

Table 6 Parameter estimates for the two-factor and bi-factor models for study 2

Potential redundant items

In study 1, three pairs of items (total 5 items) and 12 pairs of items (total 21 items) were identified as potential local dependence residual correlations in the two-factor and bi-factor solutions, respectively. In study 2, 10 pairs of items (total 17 items) and 12 pairs of items (total 21 items) were identified as potential local dependence residual correlations in the two-factor and bi-factor models, respectively (see Table 6). These results, qualitative evidence and other quantitative evidence informed reduction of the item sets even further. These decisions are discussed more fully elsewhere [20].

Comparing the format I and B of the 40 item set in terms of association between positive and negative item factors

The model fit for the intermingled and block formats was acceptable (RMSEA = 0.054 and CFI = 0.976). The common correlation of the positive and negative factors across formats was estimated at .829. A χ2 test for difference in correlations between factors across forms was not significant (p = .378). Thus the two formats did not differ.

Discussion

We used a combination of confirmatory and exploratory factor analyses to assess the dimensionality of the ReQoL item sets at the stage of instrument development. The results from study 1 were replicated in the study 2 adding to the robustness of results. We found that a bi-factor model with the positively worded and negatively worded questions as separate factors provided an acceptable fit. The factor loadings were substantially higher on the general factor than on the group factors and the ECV was 78.9% in study 1, and 84.5% in study 2. We have not found commonly agreed threshold for interpreting the ECV, but previous studies have concluded that scales were sufficiently unidimensional if they obtained ECV values in this range [21]. Thus, these results support the assumption of sufficient unidimensionality for IRT analyses [22,23,24]. Furthermore, the results did not suggest that deviation for perfect unidimensionality is caused by intermingling of positive and negative items which meant that we could consider an intermingled format for the final versions of the ReQoL.

The initial CFA raised questions around the hypothesis of six separate mental health themes identified in earlier work about what mattered to service users with mental health difficulties. This result and subsequent CFA results from a bi-factor model were driven by the high correlation among the themes. From the qualitative work, service users considered the themes separately but the factor analyses suggested that they all amount to one concept of quality of life. To retain the face validity of the ReQoL measure, a decision was taken by the Psychometrics advisory team to select at least one item from each of the themes despite most of the items being highly correlated.

In order to capture the breadth of patient’s journey, the construction of outcome measures requires the inclusion of both positively and negatively worded items. Understandably, there has been a recent focus on framing patients’ experiences positively as exemplified, for example, in the Warwick–Edinburgh mental well-being scale (WEMWBS) [25]. However, patients’ experiences of mental health difficulties are negative. Simply turning a negative item into a positive one does not ensure capturing the same rating. Nor can it be guaranteed that scoring very low on an item phrased as ‘I am feeling happy’ is equivalent to a patient feeling depressed. Such items cannot be viewed as mirror opposites. In addition, while a measure comprising all negative items (for example, Patient Health Questionnaire-9 [26]) might be viewed as being too negative, framing all items as positive could also be viewed by patients as unrealistic and out of touch with the current experiences. Given the bandwidth of patients’ lived experiences, the inclusion of both positive and negative items would appear to be more appropriate for capturing patients’ experiences, an argument used in the development of other measures (for example, Clinical Outcomes in Routine Evaluation [27, 28]).

At the time of development, we searched the literature for guidance on the ordering of positive and negative items. While there was an important literature on positive and negative items in mental health (for example see [26,27,28]), we could not find any research on how best to order the items. There was no evidence in our study that blocking or intermingling the ordering of the items had an impact on the responses provided by participants. Given this result, on a relatively large item set of 40, we followed the general custom and practice of ordering the items in an intermingled fashion although we used pragmatic rules of not starting and finishing on an extreme negative item and also not having more than three negative items running consecutively. Hence, we were able to incorporate greater flexibility in utilising the intermingled order while also not presenting participants with what might be construed as a short set of negative items followed by a short set of positive items. Our view was that this was a false dichotomy and may appear an artificial separation.

The main limitation of this paper is the fact that participants in study 2 were not randomly assigned to the intermingled and block formats of the ReQoL items. The random assignment of both formats within trust would have made gathering follow-up data needed to assess responsiveness of items reported elsewhere [1] logistically very complicated and prohibitive in terms of resources. However, the results obtained provide robust evidence that the ordering of the items in the questionnaire has not impacted on the questionnaire. This informed the format of the final measures and the main attraction of intermingling the positive and negative items rests in the fact that acquiescence bias can be prevented.

In summary, the factorial structure of the items sets used to construct the ReQoL-10 and ReQoL-20 measures were found to be sufficiently unidimensional as confirmed by the good fit of the bi-factor models and therefore rendering the use of IRT methods to inform further item reduction possible.