6840 out of 9930 (69%) patients had 12-month follow-up data. Among those lost to follow-up were more smokers, a higher number of sickness benefits recipients, and more patients who had been operated previously (Table 2). Furthermore, they had a lower level of education, and fewer were operated on for paresis. Except for back pain, there was no statistical significant difference in PROMs at baseline. Patients who did not respond to the follow-up scored slightly higher for back pain than those who responded.
During surgery, an operating microscope or loupes were used in 5936 of 6840 (87%) cases. A total of 885 (13%) had a reoperation on the same level, 466 (7%) on a different level, and 66 (1%) on both the same and a different level between L1 and S1. The perioperative complication rate was 169 (3%) with 115 (2%) dural tears, 21 (0.3%) nerve root injuries, 24 (0.4%) hematomas requiring transfusion or reoperation, and 9 (0.1%) cardiorespiratory complications.
Few data points were missing for the baseline PROMs: ODI (13, 0.2%), EQ-5D (252, 3.7%), NRS back pain (170, 2.5%), and NRS leg pain (159, 2.3%). At 12-month follow-up, 40 (0.6%) were missing data on GPE, 11 (0.2%) on ODI, 520 (7.6%) on EQ-5D, 47 (0.7%) on NRS back pain, and 66 (1%) on NRS leg pain. GPE scores for the entire population are shown in Table 3. Mean improvement (95% CI) for each PROM from baseline to 12-month follow-up for the total sample was 28.7 (28.2–29.2) for the ODI, 0.45 (0.44–0.46) for EQ-5D, 3.2 (3.1–3.3) for back pain, and 4.4 (4.3–4.5) for leg pain, p < 0.001.
The Spearman rank correlation coefficients between the GPE and the change scores of the instruments were high for mean % changes with 0.8 for the ODI, 0.7 for NRS back pain and leg pain, and moderate for mean changes with 0.6 (ODI), 0.5 (NRS back pain), 0.6 (NRS leg pain), and 0.5 (EQ-5D). The Pearson correlation coefficients were high for all the final raw scores with 0.8 (ODI), 0.7 (NRS leg pain), 0.8 (NRS back pain), and 0.7 (EQ-5D). All correlation coefficients were statistically significant (p < 0.001).
ANOVA with post hoc analysis (Tukey, α = 0.05) indicated that the mean changes of all of the PROMs were significantly different between GPE categories 1–3 and 4. The mean of the final raw scores for all of the PROMs, as well as the mean change in ODI, EQ-5D, and NRS leg pain, and the mean ODI% change score at 12 months were able to differentiate between “no change” (4) and “much worse” (6) with statistical significance. Mean changes in NRS back pain, as well as mean % change in NRS back- and leg pain were not statistically significant different between those “unchanged” (4) and those reporting to be “much worse” (6).
After evaluating the mean score differences of all PROMs across the categories of the GPE, the study group concluded that the definition of a score range of 4–7 for “failure” and 6–7 for “worsening” was appropriate (Table 3). Figures illustrating these differences are shown in the appendix (Figs. 1x–4x).
For each GPE outcome group, the baseline adjusted mean scores of the PROMs (ANCOVA) after 12 months are shown in Table 3.
For differentiation between “failure” vs no failure in the whole cohort, all PROMs had an acceptable AUC of >0.70 (Table 4). The PROM with the highest accuracy was the mean ODI% change score with an AUC of 0.93 and a correct classification rate of 86% (Fig. 1).
For differentiation between “worsening” vs unchanged and slightly worse, the AUCs were poor (<0.70) for score changes of all outcome measures. The final raw scores of all four PROMs showed acceptable AUCs. The PROM with the highest accuracy was the ODI raw score with an AUC of 0.76 and a correct classification rate of 69% (Fig. 2). The ROCs for all of the PROMs are illustrated in the appendix (Figs. 5x–9x).
Based on these cutoff values, the ODI change classified 26%, the ODI% change score 23%, and the ODI raw score at 12 months 27% of lumbar disc surgeries as failure. Failure rates assessed by cutoffs of the less accurate PROMs are shown in the appendix (Table 4x).
The percentages of patients classified as worsening by the cutoffs on the final PROM raw scores were 7% for ODI, 8% for EQ-5D, 7% for NRS leg pain, and 8% for NRS back pain.
When comparing patients operated for the first time with those who had been operated previously, values for cutoff, sensitivity, and specificity were similar (Tables 2x and 3x in appendix). When investigating the effect of low and high baseline disability (based on the 25th and 75th percentile of the baseline score for ODI), the cutoffs for “failure” and “worsening” in the PROMs varied considerably, both for change scores, % change scores, and the final raw score (Table 1x, appendix). For example, in the group with high disability at baseline, the failure cutoff for the mean % change in ODI was 30% higher than in the low disability group.
Compared to elective surgery, emergency cases had statistically significant worse baseline PROM scores and experienced a greater score improvement at 12 months. Accordingly, no statistically difference in any of the 12-month PROM raw scores was found between these two groups. Furthermore, they reported the same GPE after 12 months, with a median score of 2 (Table 5x, appendix).
Floor and ceiling effects
No floor or ceiling effects were detected. Only 9 (0.1%) patients scored 0 and 7 (0.1%) patients scored 100 on the baseline ODI. Furthermore, 107 (1.6%) scored 0 and 590 (8.8%) scored 10 in the NRS back-pain scale. For the NRS leg pain, scale numbers were 55 (0.8%) for 0 and 728 (10.9%) for 10. In the EQ-5D, only 12 (0.2%) patients scored the minimum and 20 (0.3%) the maximum at baseline.