Background

Depression and anxiety are common and disabling disorders [1], and most people diagnosed with depression and anxiety are treated in primary care [2]. Research suggests that collaborative care can be a useful organizational model for treating depression and anxiety disorders in this setting [3,4,5]. In collaborative care interventions, a primary care provider and one or more professionals are involved in providing care and proactive follow-up based on structured and evidence-based care plans [3]. At the same time, mechanisms to enhance communication between providers are introduced [3]. A meta-analysis from 2012 found that collaborative care was associated with larger short-, medium- and long-term improvements in symptoms compared with usual care for people with depression and anxiety [3]. However, most trials were conducted in the United States, and few included participants with anxiety disorders. The authors emphasized a need for more research in collaborative care for anxiety disorders, and that the findings should be interpreted more cautiously in settings different from that of the United States [3]. A subsequent systematic review and meta-analysis, including depression trials in European countries, showed that collaborative care also seems to be more effective than usual care in improving depression scores outside the United States [4]. In 2016, a meta-analysis focusing solely on collaborative care for anxiety disorders also found that collaborative care showed greater effects than usual care [5]. Until recently, no collaborative care trials have been conducted in Scandinavia. However, in 2018, a Swedish cluster-randomized collaborative care trial for depression showed a reduction in depression scores at 3- and 6-months’ follow-up, which was significantly greater in the intervention group vs. the control group when measured by MADRS-S but not by BDI-II [6].

In order to evaluate the effects of collaborative care in a Danish setting, the Collabri-model for collaborative care was developed in 2014 and subsequently tested. In this paper, we present results from 6- and 15-months’ follow-up of four cluster-randomized trials aiming at people with depression, panic disorder, generalized anxiety disorder, and social anxiety disorder in general practice. The hypothesis was that collaborative care would be superior to treatment-as-usual in reducing symptoms of depression in the depression trial and reducing anxiety symptoms in the anxiety trials. Ultimately, the trials failed because of failure to include participants and potential selection bias, despite randomization. Thus, we also provide insights into the lessons learned while conducting these trials.

Methods

Design

The Collabri trials were designed as four cluster-randomized, researcher-blinded, superiority trials evaluating the effects of collaborative care according to the Collabri-model compared to treatment-as-usual for patients with depression, generalized anxiety disorder, panic disorder, and social anxiety disorder. The design is described in more detail in two study design publications [7, 8]. The study adheres to CONSORT guidelines, and the Regional Ethics Committees in the Capital Region of Denmark approved the trial protocol.

Recruitment of general practitioners and randomization

The random cluster allocation sequence was externally computer-generated by The Research Centre for Prevention and Health in the Capital Region of Denmark. One cluster consisted of a provider number in general practice, corresponding to one or more general practitioners (GPs). Patients were allocated after cluster-randomization to the same group as their GP/GPs. Cluster-randomization was chosen to avoid the risk of contamination bias. GPs in the Capital Region of Denmark (except the island of Bornholm) were invited to join the study through letters.

A total of 53 clusters were randomized during three rounds using simple randomization and an allocation ratio of 1:1 in the two first rounds, and an allocation ratio of 3:1 (control:collaborative care) in the third, including four clusters. The randomization was stratified by two geographical areas in the first round and three in the second. Randomization details are updated from previous descriptions [8]. A sub-study (nested study) investigated two methods of depression detection within the depression trial. Hence, GPs were additionally randomized into one of these detection methods. Findings from this study will be presented elsewhere.

Recruitment of patients

GPs recruited participants and referred them to the study. GPs were encouraged to identify participants with depression according to their detection allocation and to use assessment tools in line with guidelines [9] when identifying participants with anxiety. GPs provided written and verbal information to patients and obtained oral and written consent. The GPs’ referral diagnosis was validated by a research assistant at a telephone interview with the patient using the MINI International Neuropsychiatric Interview (MINI) for DSM IV [10] and ICD-10 specific questions. In-and exclusion criteria were assessed by the GP and/or research assistant, and those included were sent a baseline questionnaire. If written consent was not received before the telephone interview, this was subsequently obtained. In case of a discrepancy between referral diagnosis and the research assistant’s assessment, the GP and project psychiatrist reached an agreement based on a discussion.

Population

Patients were included in one of the four trial populations if they were registered at a participating GP, met the International Classification of Diseases 10th edition (ICD-10) diagnostic criteria for depression (F32–33), generalized anxiety disorder (F41.1), panic disorder (F41.0) or social anxiety disorder (F40.1), were at least 18 years old, spoke Danish and provided written consent. Patients were excluded if they had a dementia diagnosis or an unstable medical condition. Further exclusion criteria were pregnancy, medical/psychological treatment for anxiety or depression within the past 6 months, a pending disability pension application, referral to secondary mental health care, bipolar disorder, current psychotic condition, obsessive-compulsive disorder, high suicide risk, post-traumatic stress disorder, or substance abuse that would hinder participation. Additionally, patients of GPs allocated to the collaborative care intervention were excluded if they preferred treatment through the publicly subsidized psychologist program rather than collaborative care.

Blinding

While conducting eligibility interviews and during the data collection phase, researchers were blinded to the participants’ and GP’s allocation. Researchers were also supposed to be blinded in the analysis- and concluding phase. However, due to a heavily skewed distribution between allocation groups, it was not possible to maintain this blinding. Furthermore, intervention staff in the collaborative care group, patients and GPs could not be blinded to the intervention, which is a general challenge when investigating psychosocial interventions.

Interventions

The Collabri-model of collaborative care

While building on recommendations from a systematic literature review [11], the Collabri intervention further met four criteria often used to define collaborative care [3], but originally proposed to describe complex system-level interventions [12]: a multi-professional approach to care; enhanced inter-professional communication; scheduled follow-ups; and a structured management plan. GPs collaborated with a team of mental health specialists, including two psychiatrists and eight care managers employed by Mental Health Services in the Capital Region of Denmark. The group of care managers had a bachelor-level health care education and included nurses and an occupational therapist. They all had experience from working in mental health services and had taken a one-year or equivalent education of cognitive behavioral therapy (CBT). Care managers, psychiatrists, and GPs in the collaborative care group were trained in the model principles. Psychiatrists provided planned and ad hoc supervision of care managers and GPs. CBT supervision of care managers was introduced twice a month after trial commencement, as care managers requested this.

In around half of the GP practices, care managers had access to a consultation room in the practice. If not, care managers and patients met at facilities in the municipality or at a mental health center. Care managers’ caseload was predicted to be around 25; however, this was rarely reached because of lacking referrals. Each care manager collaborated with 3–5 GPs to provide appropriate treatment and close follow-up to assess progress. Treatment modalities (psychoeducation, CBT, and medication) were suggested according to disease-specific stepped-care algorithms, where care managers provided psychoeducation and CBT. The GP had the overall treatment responsibility and prescribed medication if this was indicated. For different reasons, group-based psychoeducation was only available initially in the trial period, whereas one-on-one psychoeducation and psychoeducation as part of CBT were offered throughout the trial period. A fidelity scale was developed to ensure the internal validity of the Collabri-model, and evaluations were carried out twice during the intervention period.

Treatment-as-usual

GPs in the treatment-as-usual group managed the participants’ care as they usually did. Clinical guidelines from the Danish Health Authority and the Danish College of General Practitioners were available for guidance, including recommendations on detection, diagnosis, treatment, and referral to specialized care [9, 13,14,15]. Treatment could vary between GPs as the guidelines only provide recommendations. As an example, interventions could include GPs managing care by providing psychoeducation and support, talking therapy, medication, or a combination. GPs could refer patients to a psychiatrist or mental health services free of charge for the patient or a psychologist, partly publicly subsidized.

Outcome measurements and other data

All self-reported outcomes were assessed at baseline and after 6 and 15 months. Interviewer-rated measures were obtained at the eligibility interview and after 6 and 15 months. Participants were assessed at baseline using the Standardised Assessment of Personality: Abbreviated Scale (SAPAS) [16], while other baseline demographic data were obtained from Statistics Denmark [17]. The primary outcome was depression symptoms (Beck Depression Inventory (BDI-II)) [18] at 6-months’ follow-up in the depression trial. The primary outcome in the anxiety trials was self-reported anxiety symptoms (Beck Anxiety Inventory (BAI)) [19] at 6-months’ follow-up. See Table 1 for an overview of secondary-, explorative-, and safety measures.

Table 1 Overview of data

Specifications and corrections should be made to the two study design papers [7, 8]. For explorative subgroup analyses, we intended to obtain information about somatic comorbidities from GPs. Due to inadequate data and lack of statistical power analyses were, however, not made. Additionally, planned subgroup analyses for personality disorder were not performed because of a lack of statistical power. Sick leave benefits were by mistake included as safety measures. Sick leave is reported only as an explorative outcome. No other social services than sick leave benefits are included as an explorative outcome. Besides being reported as a safety measure, the number of psychiatric outpatient services is also reported as an explorative outcome. As the risk of suicide was only examined for all participants at baseline, this was not applicable as a safety measure as otherwise described. Instead, we distinguish between deaths from suicide and other reasons, using deaths from suicide as a safety measure. Medication use for anxiety and depression was used to describe treatment during the intervention period and was not reported at 15 months.

Sample size calculations

Sample size calculations for primary outcomes using the program PS: Power and Sample Size Calculation showed that 364 participants should be included in each of the three anxiety trials, and 328 should be included in the depression trial. The sample size was adjusted to 480 in the depression trial because an additional study described elsewhere [7] was nested in the trial. Calculations were based on: a clinically relevant difference between groups of 4 points on BDI-II and BAI [37,38,39]; a standard deviation of 11 for BDI-II [37, 38, 40,41,42,43] and 12 for BAI [42,43,44]; a probability of type I error of 0.05 and a power of 0.8. The formula: 1 + (cluster size – 1) x ICC was used to estimate the design effect. The cluster size was 8 in the anxiety trials and 10 in the depression trial. The ICC was set at 0.04 [45]. Cluster-corrected sample sizes were found by multiplying the design effect with the sample size found via PS.

Statistical analyses

Outcomes were assessed as differences between groups at follow-up, were based on intention-to-treat analysis [46], and all analyses accounted for cluster-randomization. As planned, we used linear mixed models to compare questionnaire-based effects: cluster level and participants were considered as random effects while time was set as a fixed effect. Generalized linear models were not used as wrongly stated elsewhere [8]. The stratification variable of the geographical area was included in all analyses. Methods for analyzing register-based data were not prespecified. However, continuous data were analyzed using Poisson regression, and logistic regression was used for dichotomous measures. As planned, 15-months’ follow-up data were analyzed using repeated measures with an unstructured covariance matrix. Because of missing data at all time points, questionnaire data were imputed (m = 100) using multivariate normal regression imputation (MCMC) under the assumption that data were “missing at random”.

Post hoc analyses were also performed. Due to smaller than expected sample sizes, we pooled results from the three anxiety populations to increase statistical power. We estimated statistical power and effect sizes (Cohen’s d) based on the primary outcomes at 6-months’ follow-up. In the depression trial, we used sample size simulation to estimate what the mean BDI-II score of the lacking participants in the control group should have been for the clinically and statistically significant differences to disappear. We assessed change from baseline to 6-months’ follow-up for all outcomes in the collaborative care groups. Finally, we estimated the proportion of participants in symptom remission, defined by a score of 13 or less on BDI-II in the depression trial and nine or less on BAI in the anxiety trial [18, 47].

Health economic evaluation

The collaborative care intervention’s cost-effectiveness was assessed for a pooled group of participants with anxiety and depression, consisting of participants who had filled out the EQ-5D-3L questionnaire: 627 in the collaborative care group and 80 in the control group. We calculated costs from a public expense perspective with a time horizon of 6 months.

In both groups costs concerning healthcare usage and social benefits were calculated using the following information: hospital contacts and mental health outpatient services obtained from the National Patient Registry [33]; contacts with privately practicing health professionals in primary care obtained from the Danish National Health Service Register [48]; use of prescription drugs derived from the Danish National Prescription Registry [36]; and use of social benefits obtained from the DREAM database [32]. Collaborative care-related costs were estimated using data from the trials.

The cost development in the collaborative care group was calculated as the costs from baseline to 6-months’ follow-up minus the costs 6 months prior to inclusion. A similar measure was computed for the control group. The difference between the two differences was considered as the additional cost of the collaborative care intervention.

The health-related effects of the groups were measured in Quality Adjusted Life Years (QALYs). QALYs were based on the EQ-5D-3L questionnaire [28] completed by participants at baseline and 6-months’ follow-up. Means were calculated using the Danish preference weighting [49]. QALYs were estimated using complete case analysis adjusted for baseline differences. Robust T-test was used to assess differences between QALYs. Finally, the Incremental Cost-Effectiveness Ratio (ICER) was calculated as additional costs in the collaborative care group divided by the difference in QALY between groups.

Results

Characteristics of participating general practitioners

A total of 53 clusters (GPs) was recruited from May 2014 to July 2015. Most clusters were located in Copenhagen or surrounding areas and had one GP participating in the study. During the trial period, 7 clusters dropped out without having referred any participants. A total of 17 clusters did not refer patients included in the depression trial, and 22 clusters did not refer patients included in the anxiety trials (Fig. 1).

Fig. 1
figure 1

Flow chart

Characteristics of participating patients

Recruitment of patients was started in November 2014 and ended in January 2017. In the depression trial, 325 participants were included; 272 in the collaborative care group and 53 in the control group. In the pooled anxiety trial, 406 participants were included; 369 in the collaborative care group and 37 in the control group (Fig. 1). Only around half of the expected total sample size was achieved, especially in the control group participants were lacking. Baseline assessment of the primary outcome was completed for 90% (n = 291) in the depression trial and 95% (n = 384) in the pooled anxiety trial. Most of the participants were women, and the mean age was 39 years and 36 years for participants with depression, respectively, anxiety disorders. Baseline characteristics are shown in Table 2.

Table 2 Baseline characteristics

Treatment in the collaborative care group

Participants in the collaborative care group met with their care manager, on average, 8.7 times during an average period of 4.4 months. Most received psychoeducation alone or as part of CBT (Table 3). Respectively, 68 and 86% in the depression- and pooled anxiety trial received CBT as initial treatment. Around a third intensified treatment (stepped up) in the depression trial, equivalent to around a fourth in the pooled anxiety trial. Respectively, 21 and 16% in the depression and pooled anxiety trial were referred to specialist care. According to fidelity reports, the Collabri-model showed good implementation capability (Additional file 1).

Table 3 Treatment provided in the collaborative care group

The depression trial

In the depression trial mean BDI-II scores decreased at 6-months’ follow-up from 28.4 (95% CI 27.2–29.6) at baseline to 13.3 (95% CI 12.0–14.6) in the collaborative care group and from 27.8 (95% CI 25.0–30.6) at baseline to 19.2 (95% CI 15.6–22.7) in the control group (Tables 4 and 5). The 6-months difference of − 5.9 points was statistically significant (p = 0.002), leading to an effect size of 0.52. The power was 66%. At 15-months’ follow-up, BDI-II scores decreased further to 11.8 (95% CI 10.5–13.1) in the collaborative care group and 14.7 (95% CI 11.0–18.4) in the control group. The 15-months difference of − 2.9 points was not statistically significant (p = 0.138).

Table 4 Questionnaire-based outcome means at baseline
Table 5 Questionnaire-based outcomes in the depression trial

There were statistically significant differences between groups at 6-months’ follow-up on several self-reported outcomes favoring collaborative care (Table 5). In Tables 6 and 7, results from register-based explorative analyses are displayed. From baseline to 6-months’ follow-up, the collaborative care group showed statistically significant improvements on all self-reported outcomes (Additional file 2). Post hoc analyses showed that the 111 participants in the control group, who were missing to achieve the planned number of participants in the control group, should have had a mean BDI-II score below 16.4 at 6-months’ follow-up to nullify the clinically relevant difference of 4 points. The statistically significant difference would disappear if the missing participants had a BDI-II mean score below 14.9.

Table 6 Explorative outcomes and measures of harms in the depression and pooled anxiety trial
Table 7 Deaths, use of medication for anxiety/depression and sick leave benefits

The pooled anxiety trial

In the pooled anxiety trial, the mean BAI scores decreased from 21.8 (95% CI 20.8–22.8) to 11.5 (95% CI 10.6–12.4) at 6 months in the collaborative care group and from 21.2 (95% CI 17.9–24.6) to 14.6 (95% CI 9.9–19.3) in the control group. The − 3.1 points difference at 6-months’ follow-up was not statistically significant (p = 0.206) (Tables 4 and 8). This difference was equivalent to an effect size of 0.33. The statistical power was 72%. At 15-months’ follow-up, the mean BAI score was 11.0 (95% CI 10.2–11.9) in the collaborative care group and 12.7 (95% CI 10.2–15.3) in the control group. This difference was not statistically significant (p = 0.209).

Table 8 Questionnaire-based outcomes in the pooled anxiety trial

In Tables 6 and 7, results from register-based explorative analyses are displayed. From baseline to 6-months’ follow-up, the collaborative care group showed statistically significant improvements in all outcomes (Additional file 2). Results for each of the three anxiety trials are included in Additional files 3 and 4.

Harms

There were no statistically significant differences indicating that the collaborative care group had more deaths, psychiatric bed-days, or -admissions, or more somatic outpatient visits than the treatment-as-usual-group (Tables 6 and 7). It was not possible to conduct analyses regarding life-threatening conditions, somatic admissions, and somatic bed-days because of too few cases.

Health economic evaluation

The additional costs of the collaborative care intervention were estimated to be 1457 Euro (Additional file 5). This difference in costs was not statistically significant. Analyses showed a statistically significant difference between groups of 0.025 QALY (p = 0.006) in favor of the collaborative care group (Additional file 6). The ICER was estimated to be 58,280 Euro per QALY, suggesting that collaborative care is not cost-effective within a 6 months’ timeframe.

Discussion

In the present depression study, the estimated effect size of 0.52 based on BDI-II is comparable to or even higher than standardized mean differences (SMDs) found in meta-analyses ranging from 0.19 (0–3-months’ follow-up) [4] to 0.34 (0–6-months’ follow-up) [3]. The effect size of 0.33 based on BAI in the pooled anxiety trial is also comparable to SMDs found in meta-analyses of 0.30 (0–6-months’ follow-up) [3] and 0.35 (0–12-months’ follow-up) [5]. However, because of the high risk of selection bias in our study, effect sizes could be inflated.

Lessons learned

While monitoring recruitment data, we found different referral patterns across GPs. Some GPs, regardless of allocation, referred numbers close to what was expected, some referred less, and in the control group, several GPs never got started referring. Generally, GPs referred patients with anxiety to a lesser extent than predicted. Consequently, even though baseline data did not reveal major differences between groups, we assume that there is a risk of selection bias. Baseline validation of participants’ diagnoses helped ensure eligibility regardless of allocation; however, we did not have any procedure to ensure that all eligible individuals were asked to participate. Different aspects of the design and its underlying assumptions could have influenced the referral pattern: a recruitment strategy resting solely on GPs to invite patients to participate, GPs’ different perceptions of obstacles to refer, and lower than expected disease prevalence or disease detection in GPs’ practices.

We chose the strategy of GPs referring patients to the study as this was acceptable by GPs. However, other recruitment strategies were considered in the design phase, such as waiting room screening and implementation of pop-up windows in GPs’ medical records to remind them of potentially eligible participants. GPs objected to the method of waiting room screening, and even though we worked on making pop-up windows available, this strategy was at the last minute made impossible due to reasons unrelated to the project. While some cluster-randomized collaborative care studies have used a similar recruitment strategy to ours [50,51,52], other trials have recruited participants through annual health screenings or searches in medical records [53,54,55]. In our study, medical record screening would have required access to the GPs’ electronic record systems, which unfortunately was not an opportunity. Muntingh et al. used a combination of GP identification and medical record identification [56]. They found that participants in the collaborative care group were more often selected for the trial by their GP than in the control group, where a larger proportion was recruited from medical records. This, similarly to our study, suggests difficulties in recruiting participants from GPs allocated to a control group.

During the trial period, we attempted to improve intake rates by continuously encouraging GPs to refer to the project and prolonged the recruitment period. GPs received newsletters sharing updates and successes, posters were hung in GP’s waiting rooms, and project information was shared in newspapers. Primo 2016, we conducted an informal telephone survey with a sample of GPs across intervention groups to assess perceived obstacles for referring patients. Difficulties remembering to refer, concerns that it would be stressful for patients to participate (e.g., in the eligibility interview), issues related to the referral process, and presence of specific exclusion criteria were some of the obstacles mentioned. Similarly, other literature has found time constraints and clinicians’ concern for their patients as barriers for recruitment [57]. Also, narrow inclusion criteria have been reported to be associated with poor recruitment [58]. While we sought to respond to GPs’ needs and attempted to solve any uncertainties affecting their referral pattern, queries about removing exclusion criteria were not accommodated as the associated methodological disadvantages were considered larger than the anticipated benefits.

Strengths and limitations

While many trials build their intervention on collaborative care principles, models can differ in additional content. Strengths of this study are that we provide an elaborate description of the collaborative care model components [7, 8] to ensure transparency, and we report on harms. Further, we developed a collaborative care model including elements shown to be associated with improved outcomes in previous studies [59, 60]. These are elements such as specialist supervision of care managers, recruiting care managers with experience from working in mental health services, and integrating the provision of a psychological intervention into the model [59, 60]. Twice during the project period, we monitored fidelity to the model to ensure that care managers, psychiatrists, and GPs delivered the intervention as intended. Other strengths were the externally conducted computer-based cluster-randomization, which ensured random and concealed allocation of GPs. The use of blinded assessment of the secondary outcome Global Assessment of Functioning and application of intention-to-treat analyses also decreased the risk of biased effect estimates.

Besides lacking statistical power and possible selection bias, there are other limitations to this study. It is a limitation that we have no information on the treatment given by providers such as private practicing psychologists or psychiatrists in the control group. From this, we could have assessed whether the type and amount of treatment differed between the collaborative care- and treatment-as-usual group. We were not able to blind participants, care managers, psychiatrists, or GPs to the allocation, and due to the skewed distribution between groups, we could not blind researchers when analyzing data or when writing the conclusion. Another limitation was that primary outcomes were self-reported and, therefore, not blinded, which could lead to overestimation of treatment effects. However, self-report measures mirror participants’ own perceptions of symptoms, which is also valuable seen from a recovery perspective. Although participants were recruited by GPs throughout the Capital Region of Denmark, there may be reduced external validity as GPs signed up voluntarily to participate. This could indicate an interest in common mental disorders or inter-sectoral collaboration, which may not be representative of the general GP population.

Implications for research and practice

Feasibility- or pilot testing of the trial prior to commencement might have helped us identify recruitment problems at an earlier stage. A feasibility study aims to provide information about different trial processes [61]. A pilot study is frequently referred to as a small-scale version of the study one wishes to conduct and seek to test how the various processes work together [61]. Conducting pilot studies is no guarantee that recruitment will proceed successfully [62]. However, if we had completed pilot- or feasibility studies, some of the theory-based assumptions made while writing the protocol, such as prevalence estimations, could have been empirically validated.

There are examples of successfully completed cluster-randomized controlled trials [53, 63]; however, it is recognized that many cluster-randomized trials and RCTs, in general, have problems recruiting the predefined study sample [62, 64,65,66]. This is a problem of concern, as research questions consequently remain to be answered, or there can be delays in demonstrating important effects [67]. If data from unsuccessful trials are never published, this can further lead to publication bias within the specific research field. Because of the limitations of this study, there is still a need to examine the effects of collaborative care in a Danish setting. Therefore, two new trials, referred to as the Collabri Flex trials, have been initiated [68], and recruitment goals are reached. The Collabri Flex trials are based upon the knowledge gained from the Collabri Trials, and to achieve an equal distribution between groups, we randomized at the individual level. The effect results and a health economic evaluation of these trials will be reported elsewhere.

Conclusion

Regrettably, due to limitations of the cluster-randomized design, we failed to carry out the effect trials as planned. For people with depression, we found a statistically significant difference between collaborative care and treatment-as-usual at 6-months’ follow-up in favor of collaborative care. For people with anxiety disorders, a non-significant difference between groups was found. Nevertheless, these results are limited by a lack of statistical power and possible selection bias. However, we succeeded in implementing the Collabri collaborative care model to provide patients with evidence-based treatment in line with guidelines in Danish general practices. Based on the results, we cannot rule out that collaborative care may be an effective way of organizing treatment in the Danish setting, but this hypothesis remains to be verified. Therefore, the Collabri trials act as the background for the Collabri Flex trials, and hence for improvement of future treatment of depression and anxiety disorders in primary care in Denmark.