Introduction

Injuries of the articular cartilage of the knee joint are common and remain challenging due to the avascular nature of articular cartilage and its limited regenerative potential [1,2,3]. Left untreated, symptomatic cartilage defects impair the quality of life and put affected patients at risk for the development of secondary osteoarthritis [4]. As a result, a variety of different surgical cartilage repair techniques have been developed over the last few decades with differing indications, outcomes, and associated costs.

Treatment options include bone marrow stimulation (BMS) such as microfracture (MFX) [5] and Pridie drilling [6], osteochondral repair techniques such as osteochondral autologous transplantation (OAT) [7], or acellular osteochondral scaffolds [8, 9] and cell-based repair techniques with various generations of autologous chondrocyte implantation (ACI) [10].

Even though arthroscopy remains the standard of reference for the evaluation of cartilage defects with the Outerbridge Classification [11] and the International Cartilage Repair Society Score [12], due to its invasiveness, it is rarely indicated.

Magnetic resonance imaging (MRI), however, remains the gold standard for the non-invasive assessment of articular cartilage and subchondral bone. However, meta-analyses show incoherent findings regarding the correlation of postoperative MRI examinations with clinical outcome [13,14,15].

In an effort to standardize the assessment after cartilage repair, the Magnetic Resonance Observation of Cartilage Repair Tissue (MOCART) [16] score has been introduced to facilitate reproducible, longitudinal assessments and comparability across studies.

The original MOCART score was already used for follow-up studies investigating OAT [17], ACI [18], and MFX [19] as well as in prospective studies that compared different repair techniques [20, 21].

To encompass the technological advancements and developments regarding surgical treatment options of cartilage defects of the knee joint and magnetic resonance imaging technology since the publication of the original MOCART score 15 years ago, the MOCART 2.0 knee score [22] has been recently introduced. In addition to the elimination of the variables “subchondral lamina,” “effusion,” and “adhesion,” the variable “subchondral bone” was renamed “subchondral changes” and the variable “bony defect or bony overgrowth” was introduced. To increase reproducibility and comparability across trials, a color-coded atlas depicting all variables of the MOCART 2.0 knee score was established as well. For expert readers, the MOCART 2.0 knee score demonstrated an almost perfect overall interrater (ICC = 0.84, p < 0.001) as well as intrarater (ICC = 0.88, p < 0.001) reliability. Access to the aforementioned atlas improved the overall interrater reliability of inexperienced readers from poor (ICC = 0.34, p < 0.019) to moderate (ICC = 0.59, p = 0.001) [22]. However, until now, the reliability has been assessed only for patients after ACI. Hence, it remains unclear whether the MOCART 2.0 knee score allows for the postoperative assessment after different cartilage repair techniques with a comparable interrater and intrarater agreement. Furthermore, it has not yet been evaluated whether the lesion type (chondral versus osteochondral) influences the applicability or the reproducibility of the MOCART 2.0 knee score.

Therefore, the aim of this study was to (a) evaluate the reliability of the MOCART 2.0 knee score in the postoperative assessment of patients after different surgical cartilage repair techniques; (b) compare it to the intrarater and interrater reliability of the original MOCART score; and (c) assess whether the intra- and interrater reliability differs between different surgical repair techniques and between patients with chondral and osteochondral defects.

Materials and methods

This retrospective, single-center study was approved by the Institutional Review Board. Patients who underwent surgical cartilage repair of a femoral cartilage lesion in the knee joint and received at least one MRI follow-up examination at a 3-T system at our institution were selected and one postoperative MRI examination per patient was retrospectively included in the study. Selection of the follow-up time was random by drawing of lots to offer a representative distribution of a mixed patient cohort in terms of cartilage maturation. Patients were allocated to three different types of repair groups: an ACI group; an osteochondral repair technique group, which included patients after OAT or MaioRegen® (Fin-ceramica) implantation; and an MFX group.

ACI was performed as a two-step procedure. In the first procedure, a cartilage biopsy was obtained arthroscopically from a non-weight-bearing area of the knee. After cell extraction, cells were cultivated and subsequently transferred onto a scaffold. For the second procedure of ACI, a mini-arthrotomy was used as a surgical approach. First, debridement of the cartilage defect to the subchondral bone was performed. In case of osteochondral lesions, additional bone grafting was performed using autograft spongiosa cylinders that were harvested from the iliac crest or the ipsilateral tibia using an OATS harvester; then, the cell matrix implants were cut to size, implanted, and held in place using fibrin glue.

OAT was performed as a single-step procedure for smaller lesions. Autograft spongiosa cylinders were harvested from the trochlea using an OATS harvester (OATS®, Arthrex) and transferred into the defect.

Microfracture (MFX) was performed arthroscopically after debridement and the establishment of stable cartilage shoulders using ChondroPick® (Arthrex) as a one-step procedure as well [23].

Magnetic resonance imaging

All imaging studies were conducted on 3-T MR systems (MAGNETOM Tim Trio, MAGNETOM Verio, MAGNETOM Prisma, Siemens Healthineers) using a dedicated eight-channel or 15-channel knee coil. The evaluated imaging studies were part of the clinical routine follow-up. Therefore, the parameters differed slightly between patients.

An exemplary routine MRI protocol for knee cartilage assessment after cartilage repair in the femoral condyle is presented in Table 1. The protocol included a three-dimensional localizer followed by a sagittal non-fat-saturated high-resolution proton-density-weighted turbo spin-echo (sag PDw TSE) sequence; a sagittal fat-saturated (Fs) PDw TSE sequence; a sagittal T1-weighted (T1w) TSE; and a coronal Fs PDw TSE sequence. For patients who underwent cartilage repair of the patellofemoral joint, the imaging protocol is to be complemented with an axial Fs PDw TSE sequence [24].

Table 1 Exemplary MRI protocol that fulfills the recommended requirements in terms of sequences and resolution for adequate assessment of the MOCART 2.0 knee score at 3 T

Image analysis

Image analysis was performed on a picture archiving and communication system (PACS) workstation (IMPAX EE R20, Agfa Healthcare N.V.) by an orthopedic resident (reader 1) and a radiology resident (reader 2), each with 4 years of experience in musculoskeletal MR imaging studies.

For the assessment of the MOCART 2.0 knee score, both readers had access to the atlas, which was published as supplemental material alongside the MOCART 2.0 knee score [22].

Imaging studies were assessed under supervision of the study coordinator in random order, and both readers were completely blinded to all patient details. First, both readers assessed the original MOCART score as well as the MOCART 2.0 knee score for a training dataset of twenty imaging studies that were not included into the study. After a 1-week interval, both readers assessed the original MOCART score for all patients of the study cohort. After a 4-week interval, to diminish recall bias, both readers assessed the MOCART 2.0 knee score for all patients. After another minimum interval of 4 weeks, readers assessed the MOCART 2.0 knee score a second time to allow for the assessment of intrarater reliability of the MOCART 2.0 knee score. After another minimum interval of 4 weeks, both readers assessed the original MOCART score a second time to allow for the assessment of intrarater reliability of the original MOCART score. Both readers did not receive feedback on their scorings between the different readings.

Statistical analysis

All statistical calculations were performed using IBM SPSS Statistics for Windows version 25 (IBM). Continuous data are described using mean ± standard deviation. Descriptive statistics in addition to univariate ANOVA were used to compare defect size, time to follow-up, and age at examination between treatment groups, with Bonferroni’s correction for comparison of more than two groups. Linear-weighted kappa statistics and their 95% CI were calculated as an index for inter- and intrarater reliability of each ordinal scoring domain of the original MOCART score and the MOCART 2.0 knee score. Weighted kappa statistics were interpreted according to the criteria of Landis and Koch [25]. A kappa value of ≤ 0.20 indicated poor agreement, a kappa value of 0.21–0.40 indicated fair agreement, a kappa value of 0.41–0.60 indicated moderate agreement, a kappa value of 0.61–0.80 indicated substantial agreement, and a kappa value of ≥ 0.81 indicated almost perfect agreement.

Two-way mixed, absolute agreement, single-measure intraclass correlation coefficients (ICCs), and their 95% confidence intervals (95% CI) were calculated as an index of intra- and interrater reliability of the total resultant continuous MOCART and MOCART 2.0 knee scores.

ICCs were interpreted according to Koo and Li [26].

An ICC of < 0.50 indicated poor agreement, an ICC of 0.50–0.75 moderate agreement, an ICC of 0.75–0.90 good agreement, and an ICC > 0.90 excellent agreement.

p values equal or below 0.05 are considered to indicate statistically significant results.

Results

Patients

One hundred twenty patients, who fit the inclusion criteria, were retrospectively identified. Six patients, who received their follow-up MRI at our institution but had undergone surgery elsewhere, had to be excluded from the analysis due insufficient information regarding the surgical procedure. In total, 114 patients (34 females and 80 males) with a mean age of 32.5 ± 9.6 years at the time of surgery were included. The average defect size was 3.1 ± 2.2 cm2. The right knee was affected in 64 patients, whereas the left knee was affected in 50 patients. Seventy-seven defects were located in the medial femoral condyle, 32 in the lateral femoral condyle, and five in the trochlea. The study cohort included 48 patients after ACI, 34 patients after MFX, and 32 patients who were allocated to the osteochondral repair technique group (27 patients after OAT; five patients after MaioRegen®, Fin-ceramica). Age at surgery ranged from 30.2 ± 8.8 years in the ACI group to 31.2 ± 10.6 years in the osteochondral repair technique group and to 37.0 ± 8.4 years in the MFX group. The mean postoperative follow-up interval after cartilage repair was 40.5 ± 43.6 months, ranging from 20.2 ± 24.2 months in the MFX group to 35.1 ± 34.6 months in the osteochondral repair technique group and to 57.2 ± 51.7 months in the ACI group. Mean defect size ranged from 2.1 ± 1.0 cm2 in the osteochondral repair technique group to 2.1 ± 1.5 cm2 in the MFX group and to 4.5 ± 2.5 cm2 in the ACI group (Figs. 1, 2, and 3).

Fig. 1
figure 1

a, b A 42-year-old male patient 8 years after ACI with a major hypertrophic filling, complete integration, an intact surface, homogeneous structure, minor hyperintense signal intensity (anteriorly), a bony overgrowth ≥ 50% of adjacent cartilage thickness, and no subchondral changes, with a resultant MOCART 2.0 knee score of 80 points. c, d A 46-year-old male patient 6 years after ACI with minor hypertrophy, complete integration, an irregular surface < 50% of the tissue diameter, a homogeneous structure, minor hyperintense signal, a bony overgrowth ≥ 50% of adjacent cartilage, and a subchondral cyst of less than 5 mm diameter, which resulted in a MOCART 2.0 knee score of 85 points. Panels a and c are acquired with a sagittal proton-density-weighted turbo spin-echo sequence, whereas panels b and d are acquired with a coronal proton-density-weighted turbo spin-echo sequence with fat suppression. The white arrows indicate the repair tissue borders

Fig. 2
figure 2

a, b A 31-year-old male patient 12 months after OAT with complete filling, a split-like integrational defect on the medial transplant border (depicted in image b), an intact surface, homogeneous structure, normal signal intensity of the repair tissue, no bony defect or overgrowth, and no subchondral changes, which resulted in an overall MOCART 2.0 knee score of 95 points. Also, the donor region can be appreciated in the anterior medial condyle in image a. c, d A 26-year-old male patient 9 years after OAT with an underfilling of 75–99% of the defect volume, complete integration, an irregular surface < 50% of repair tissue diameter, normal signal intensity, a bony defect ≥ transplant thickness, and no subchondral changes, which resulted in a MOCART 2.0 knee score of 75 points. Panels a and c are acquired with a sagittal proton-density-weighted turbo spin-echo sequence, whereas panels b and d are acquired with a coronal proton-density-weighted turbo spin-echo sequence with fat suppression. The white arrows indicate the repair tissue borders

Fig. 3
figure 3

a, b A 29-year-old male patient 7 months after MFX with an underfilling of 50–74% of the defect volume, complete integration, surface irregularities < 50% of repair tissue diameter, a homogeneous structure, minor hypointense signal intensity, a bony overgrowth < 50% repair tissue thickness, and no subchondral changes, which resulted in a MOCART 2.0 knee score of 60 points. c, d A 39-year-old male patient 3 months after MFX with a complete filling, complete integration, surface irregularities < 50% of repair tissue diameter, inhomogeneous structure, minor hyperintense signal intensity, a bony defect < 50% repair tissue thickness, and subchondral edema-like signal changes ≥ 50% of the repair tissue diameter, which resulted in a MOCART 2.0 knee score of 75 points. Panels a and c are acquired with a sagittal proton-density-weighted turbo spin-echo sequence, whereas panels b and d are acquired with a coronal proton-density-weighted turbo spin-echo sequence with fat suppression. The white arrows indicate the repair tissue borders

For ACI, the following matrices were used in this study population: IGOR.CHONDRO-SYSTEMS® (Institute for Tissue and Organ Reconstruction); Hyalograft C® (Anika Therapeutics); Novocart 3D® (Braun Medical); and CaReS® (Arthro-Kinetics). Additional autologous bone grafting was performed in 16 patients.

Overall, 48 patients were treated for osteochondral lesions (27 OAT cases, five MaioRegen® cases, and 16 ACI with additional autologous bone grafting) and 66 patients were treated for chondral lesions (34 MFX cases and 32 ACI cases).

Age at surgery differed significantly between the ACI and the MFX groups (30.2 ± 8.8 vs. 37.0 ± 8.4 years; p = 0.008), but no significant differences were found between the ACI and osteochondral repair technique groups (30.2 ± 8.8 vs. 31.2 ± 10.6 years; p = 1.000) or the MFX and osteochondral repair technique groups (p = 0.070). The postoperative follow-up interval differed significantly between the ACI and the MFX groups (4.8 ± 4.3 vs. 1.7 ± 2.0 years; p = 0.001), but no significant differences were found between the ACI and osteochondral repair technique groups (4.8 ± 4.3 vs. 2.9 ± 2.9 years; p = 0.110) or the MFX and osteochondral repair technique groups (p = 0.574).

With regard to the defect size, the ACI group (4.5 ± 2.5 cm2) differed significantly from the osteochondral repair technique group (2.1 ± 0.98 cm2) and the MFX group (2.1 ± 1.5 cm2) (both p < 0.001) while the osteochondral repair technique group and the MFX group did not differ significantly (p = 1.000).

Interrater and intrarater reliability of the MOCART 2.0 knee score compared to that of the original MOCART score

The overall MOCART 2.0 knee score showed higher interrater reliability than the original MOCART score, with an ICC of 0.875 (95% CI 0.823 to 0.912) versus 0.759 (95% CI 0.660 to 0.831), ranging from 0.863 (95% CI 0.741 to 0.930) in the MFX group to 0.874 (95% CI 0.753 to 0.938) in the osteochondral repair group and to 0.878 (95% CI 0.792 to 0.93) in the ACI group.

Based on the ICC interpretation according to Koo and Li, the majority of the individual variables of the MOCART 2.0 knee score showed substantial agreement with linear-weighted kappa values, ranging from 0.549 (95% CI 0.365 to 0.733) for the variable “surface of the repair tissue” to 0.797 (95% CI 0.713 to 0.882) for the variable “subchondral changes” for all patients.

The intrarater reliability was good, with an overall ICC of 0.866 (95% CI 0.811–0.906) for reader 1 and 0.860 (95% CI 0.803 to 0.901) for reader 2, and showed substantial agreement for the majority of individual variables for all patients. Overall, linear-weighted kappa for reader 1 ranged from 0.463 (95% CI 0.331–0.594) for the variable “surface of the repair tissue” to 0.792 (95% CI 0.705–0.878) for subchondral changes. For reader 2, linear-weighted kappa ranged from 0.585 (95% CI 0.462 to 0.707) for the variable “surface of the repair tissue” to 0.804 (95% CI 0.718 to 0.890) for the variable “subchondral changes” with overlapping 95% confidence intervals for almost all variables and all subgroups.

For a detailed illustration of the interrater and intrarater reliability and a depiction of the agreement for all subgroups, see Tables 2 and 3, respectively.

Table 2 Interrater reliability of the original MOCART and MOCART 2.0 knee scores for cartilage repair after ACI versus MFX versus the osteochondral repair technique group (OAT, MaioRegen®) given as linear-weighted kappa statistics for individual variables and two-way mixed absolute agreement, single-measure intraclass correlation coefficient (ICC), and their 95% confidence intervals (95% CI) for the resulting total MOCART and MOCART 2.0 knee scores
Table 3 Intrarater reliability of the original MOCART and MOCART 2.0 knee scores for cartilage repair after ACI versus MFX versus the osteochondral repair technique group (OAT, MaioRegen®) given as linear-weighted kappa statistics for individual variables and two-way mixed absolute agreement, single-measure intraclass correlation coefficients (ICCs), and their 95% confidence intervals (95% CI) for the resulting total MOCART and MOCART 2.0 knee scores for both readers

Inter- and intrarater reliability of the MOCART 2.0 knee score for chondral versus osteochondral cartilage lesions

Overall, the interrater reliability was higher for osteochondral lesions than for chondral lesions with ICCs of 0.906 (95% CI 0.838 to 0.947) for osteochondral lesions versus 0.786 (95% CI 0.674 to 0.863) for chondral lesions, respectively. The same was true for the intrarater reliability for both readers with ICCs of 0.921 (95% CI 0.862–0.955) and 0.869 (95% CI 0.778–0.924) for osteochondral lesions versus 0.804 (95% CI 0.699–0.876) and 0.839 (95% CI 0.736–0.902) for chondral lesions.

For chondral lesions, linear-weighted kappa statistics showed mostly substantial agreement for both interrater and intrarater agreements. Interrater agreement ranged from 0.588 (95% CI 0.428 to 0.747) for the variable “surface of the repair tissue” to 0.821 (95% CI 0.711 to 0.930) for the variable “subchondral changes.” Intrarater agreement ranged from linear-weighted kappa values of 0.486 (95% CI 0.321 to 0.651) for reader 1 for the variable “surface of the repair tissue” to 0.861 (95% CI 0.771 to 0.952) for reader 2 for the variable “subchondral changes.”

For osteochondral lesions, linear-weighted kappa statistics showed mostly substantial agreement for both interrater and intrarater agreements as well. Interrater agreement ranged from 0.534 (95% CI 0.362 to 0.706) for the variable “surface” to 0.891 (95% CI 0.782 to 1.000) for the variable “signal intensity.” Intrarater agreement ranged from 0.435 (95% CI 0.224 to 0.646) for the variable “surface of the repair tissue” for reader 1 to 0.871 (95% CI 0.777 to 0.966) for reader 1 for the variable “volume of cartilage defect filling compared to native cartilage” (for a detailed overview, see Table 4).

Table 4 Interrater and intrarater reliability of the MOCART 2.0 knee score for cartilage repair after chondral (MFX and ACI) versus osteochondral lesions (OAT, MaioRegen®, or ACI with autologous bone grafting) given as linear-weighted kappa statistics for individual variables and two-way mixed absolute agreement, single-measure intraclass correlation coefficients (ICCs), and their 95% confidence intervals (95% CI) for the resulting total MOCART 2.0 knee score

Discussion

The aim of this study was to evaluate the reliability of the recently introduced MOCART 2.0 knee score for the assessment of the radiological outcome after different cartilage repair procedures and in different lesion types.

The main finding of this study is that the ICCs of the total resultant MOCART 2.0 knee score showed good or excellent agreement, regardless of treatment modality. Furthermore, the majority of the categorical variables of the MOCART 2.0 knee score showed substantial agreement in the inter- and intrarater reliability testing, again regardless of treatment modality. When compared to the original MOCART score, higher interrater reliability was observed for almost all individual variables and the overall scoring, independent of surgical treatment (Table 2). The same was true when comparing the intrarater reliability between the original MOCART score and the MOCART 2.0 knee score (Table 3). This difference might be attributed in part to the modification of variables as well as to the atlas that was introduced alongside the MOCART 2.0 knee score, which has demonstrated to positively impact the reliability of less-experienced readers [22].

Interestingly, whereas overall interrater reliability was higher for osteochondral lesions than for chondral lesions, the individual variables “bony defect” and “subchondral changes” demonstrated worse interrater reliability for osteochondral lesions. We attribute this finding to the overall higher number of pathological findings in the osteochondral lesion group with regard to these two variables.

Since its introduction, a handful of studies have employed the MOCART 2.0. knee score for the assessment of radiological outcome after all arthroscopic matrix-encapsulated autologous chondrocyte implantation [27] or MACI [28] in the knee and AMIC in the treatment of osteochondral lesions of the talus [29]. However, none of these studies assessed intra- or interrater reliability. Casari et al [29] employed the original MOCART score and the MOCART 2.0. knee score for the assessment of AMIC in the repair or osteochondral lesions of the talus. Whereas they did not assess intra- or interrater reliability, they found a significant correlation between preoperative lesion size and postoperative MOCART scores, but no correlation with clinical outcome. When interpreting these results, it has to be considered that low inter- and intrarater reliability has been reported for the assessment of cartilage repair of the talus with the original MOCART score [30].

For individual variables, the reproducibility was consistently lowest in this study for “surface of the repair tissue.” This was true for the overall population, as well as most subpopulations for the original MOCART score as well as for the MOCART 2.0 knee score. Interestingly, this was not the case in the first publication on interrater variability of the original MOCART score [16]. We attribute this finding partly to the increased quality and resolution of the imaging protocol, when compared to the study of Marlovits et al [16], which employed a 1-T system for imaging. Whereas high-resolution imaging is deemed necessary to visualize discrete fissuring, it is not necessarily associated with increased reliability, as minor fissures, which would have been underappreciated by a lower resolution sequence, might be seen by one reader but not the other.

For most variables of the MOCART 2.0 knee score, the inter- and intrarater reliabilities observed in this study were slightly lower than the inter- and intrarater reliabilities for the expert readers, but higher than the inter- and intrarater reliabilities for the inexperienced readers in the study that first introduced the MOCART 2.0 knee score [22]. This might be due to the fact that, while being residents, both readers specialized in the morphological and quantitative assessment of knee MRIs and supports the interpretation of the results of this study. Because even if the intra- and interrater reproducibility for these readers were significantly lower than for experienced readers, it can be assumed that it would be similarly lower regardless of the type of lesion or surgical treatment strategy. Furthermore, inter- and intrarater reliabilities for most variables of the MOCART 2.0 knee score reported in this study demonstrated less variability than previously reported [22]. This might be mainly attributed to the higher number of patients in this current study.

Interrater reliability of the individual ordinal variables of the original MOCART score was consistently lower in our study than previously reported [16]. However, in addition to employing expert readers, ICCs were used for the assessment of reliability of the individual ordinal variables. Whereas the use of linear-weighted kappa values is more adequate for ordinal variables, it is more rigorous.

No adhesions were observed in the study cohort, which corroborated the obsolescence of the variable “adhesions” in the original MOCART score and its consequential removal in the MOCART 2.0 knee score.

There are several limitations in this study that have to be mentioned.

A direct numerical comparison of the overall MOCART 2.0 knee score, as well as its categorical variables between treatment groups, and also chondral and osteochondral defects, would be highly interesting. However, considering the limitations of the retrospective design of the study, this would be a biased comparison. Due to the retrospective nature of the study, patients were not randomly allocated to different surgical treatment strategies. Treatment decisions were rather based on demographics and disease-specific factors. Hence, there were systematic differences regarding age, as well as lesion size between the different repair techniques. These differences are to be expected when retrospectively evaluating a cohort that was allocated to their respective treatment based on current guidelines [31].

However, even though the defect size in the ACI group was significantly different than in the OAT and MFX groups, size did not show a significant correlation with the resultant MOCART 2.0 knee score.

Also, sequence parameters of the MRI examinations differed slightly between patients, since they were part of the clinical routine on different scanners. However, all included imaging studies were conducted on 3-T systems with dedicated knee coils. Furthermore, with 48 ACI patients, 32 patients in the osteochondral repair technique group, and 34 MFX patients, group sizes were unequally distributed. However, for the assessment of inter- and interrater reliability, the number of patients seemed to be adequate.

This study demonstrates comparable intra- and interrater reliability of the MOCART 2.0 knee score for the radiological assessment of different cartilage repair techniques (ACI vs. osteochondral repair techniques vs. MFX), as well as for the treatment of chondral versus osteochondral defects. To allow for a comparison of the absolute values of the MOCART 2.0 knee score between different repair techniques, a matched pair analysis or a randomized trial would be necessary. For the identification of a potential correlation with clinical outcome or even more importantly a potential predictive value regarding clinical outcome, additional longitudinal studies and correlation with clinical data would be necessary.