Interobserver agreement of whole-body magnetic resonance imaging is superior to whole-body computed tomography for assessing disease burden in patients with multiple myeloma
- 446 Downloads
Whole-body MRI (WB-MRI) is recommended by the International Myeloma Working Group for all patients with asymptomatic myeloma and solitary plasmacytoma and by the UK NICE guidance for all patients with suspected myeloma. Some centres unable to offer WB-MRI offer low-dose whole-body CT (WB-CT). There are no studies comparing interobserver agreement and disease detection of contemporary WB-MRI (anatomical imaging and DWI) versus WB-CT. Our primary aim is to compare the interobserver agreement between WB-CT and WB-MRI in the diagnosis of myeloma.
Consecutive patients with newly diagnosed myeloma imaged with WB-MRI and WB-CT were prospectively reviewed. For each body region and modality, two experienced and two junior radiologists scored disease burden with final scores by consensus. Intraclass correlation coefficients (ICC), median scores, Wilcoxon signed-rank test and Spearman’s correlation coefficients were calculated.
There was no significant difference in overall observer scores between WB-MRI and WB-CT (p = 0.87). For experienced observers, interobserver agreement for WB-MRI was superior to WB-CT overall and for each region, without overlap in whole-skeleton confidence intervals (ICC 0.98 versus 0.77, 95%CI 0.96–0.99 versus 0.45–0.91). For inexperienced observers, although there is a trend for a better interobserver score for the whole skeleton on WB-MRI (ICC 0.95, 95%CI 0.72–0.98) than on WB-CT (ICC 0.72, 95%CI 0.34–0.88), the confidence intervals overlap.
WB-MRI offers excellent interobserver agreement which is superior to WB-CT for experienced observers. Although the overall burden was similar across both modalities, patients with lower disease burdens where MRI could be advantageous are not included in this series.
• Whole-body MRI is recommended by the International Myeloma Working Group for patients with multiple myeloma and solitary plasmacytoma and by the NICE guidance for those with suspected multiple myeloma.
• Some centres unable to offer whole-body MRI (WB-MRI) offer low-dose whole-body CT (WB-CT).
• This prospective study demonstrates that contemporary WB-MRI (with anatomical sequences and DWI) provides better interobserver agreement in assessing myeloma disease burden for the whole skeleton and across any individual body region in myeloma patients when compared with low-dose whole-body CT.
KeywordsMultiple myeloma Diffusion magnetic resonance imaging Whole-body imaging Computed tomography
95% confidence intervals
Intraclass correlation coefficients
International Myeloma Working Group
Incremental net monetary benefit
National Institute for Clinical Excellence
Radiographic skeletal survey (SS), which has been in widespread use for decades, only offers a very crude assessment of bone involvement in multiple myeloma. More recently, some centres have replaced skeletal survey with low-dose whole-body CT (WB-CT), which has been shown to have greater sensitivity [1, 2, 3]. However, because both skeletal survey and CT predominantly detect the destructive and/or reactive effects of myeloma disease on trabecular and cortical bone rather than disease within the bone marrow space, the sensitivity is inherently limited [4, 5]. The excellent soft tissue contrast of whole-body MRI (WB-MRI) allows direct imaging of the bone marrow, resulting in higher sensitivity and earlier detection. More recently, the superiority of WB-MRI over FDG PET-CT for disease detection has been demonstrated [6, 7, 8]. Consequently, the International Myeloma Working Group (IMWG) recommends WB-MRI for all patients with suspected solitary plasmacytoma or asymptomatic myeloma  and in the UK, the National Institute for Clinical Excellence  recommends WB-MRI for all patients with a suspected new diagnosis of myeloma.
A cost-effectiveness analysis of imaging strategies for myeloma diagnosis has reported that WB-CT and WB-MRI give the highest incremental net monetary benefit (INMB) under differing prevalence levels when compared with skeletal survey. Perhaps surprisingly, a negative INMB was reported for FDG PET-CT . This suggests an approach using either WB-CT or WB-MRI could be cost-saving and health-improving. However, there is evidence that WB-CT has a lower lesion detection rate and understages patients compared with WB-MRI using conventional T1-weighted spin-echo and STIR MR sequences . This margin of superiority is likely to be further improved with the addition of DWI, which is superior to conventional MRI sequences for detecting focal marrow lesions . Nonetheless, as WB-CT is inexpensive and relatively quick to perform, many centres limited by MRI capacity may gravitate towards using non-contrast-enhanced WB-CT as standard imaging in this patient cohort.
However, sensitivity forms only part of the assessment of a diagnostic test. Diagnostic assessment tools for clinical or research applications must also be reliable. One of the measures of reliability is the interobserver agreement. To date, the interobserver agreement for WB-MRI and WB-CT is unknown in the context of disease detection in myeloma. Hence, the primary aim of this study is to compare the interobserver agreement and diagnostic sensitivity for disease detection of WB-MRI and WB-CT in patients with multiple myeloma.
Materials and methods
Study design and population
The study design was a prospective observational diagnostic test accuracy study, approved by the institutional review board. Patients with an established new diagnosis of myeloma as per IMWG criteria  who were planned to be imaged with both WB-CT and WB-MRI examinations before starting treatment between 2013 and 2017 were prospectively included following written informed consent. Patients with second malignancies were excluded.
Low-dose WB-CT (mean radiation dose 5 mSv) was acquired with a 128-slice CT scanner (Somatom Definition Flash, Siemens Healthineers), 120 kV, 50 mAs, and 0.5-s pitch. Axial images were reconstructed to 3 mm for review. All subjects were scanned supine with arms by their sides and the images were acquired from the skull vertex to the toes. No intravenous iodinated contrast was administered. Axial images of 1-mm slice thickness were reconstructed to secondary coronal and sagittal images for review. Axial images for bone and soft tissue assessment were reconstructed from the raw data obtained during scanning: for bone assessment using sharp (B50f) kernel and for soft tissue assessment using soft (B20f) kernel. Secondary coronal and sagittal reconstructions were generated using a slice thickness of 2 mm and slice increment of 1.5 mm. The typical duration for WB-CT examination was less than 5 min. The dose-length products (DLP) for the WB-CT examinations were recorded.
WB-MRI studies were performed using an Avanto 1.5-T system (Siemens Healthineers). All subjects were scanned supine with arms by their sides. Coil elements were positioned from the skull vertex to the knees. Sagittal T1-weighted images (TR 590 ms, TE 11 ms, FOV 400 mm, slice thickness 4 mm) and T2-weighted images (TR 2690 ms, TE 93 ms, FOV 400 mm, slice thickness 4 mm) of the spine were acquired, followed by axial diffusion-weighted sequences (single-shot double spin-echo echo-planar technique with STIR fat suppression in free breathing) using b values of 50 and 900 s/mm2 applied in 3 orthogonal directions and combined to the isotropic trace images. Diffusion-weighted images were acquired in multiple contiguous stations of 50 slices per station (slice thickness 5 mm, no gap, FOV 430 mm, phase direction AP, parallel imaging (GRAPPA) factor 2, TR 14800 ms, TE 66 ms, inversion time (TI) 180 ms, voxel size 2.9 mm × 2.9 mm × 5 mm, number of signal averages 4, matrix 150 × 150, bandwidth 1960 Hz per pixel). Axial T1-weighted Vibe Dixon 3D gradient echo breath-hold sequences (52 slices per slab, FOV 470 mm, TR/TE 7/2.38, 4.76 ms, flip angle 30, matrix 192 × 192) were also acquired, matching the acquisition stacks and partition thickness to the DWI. No intravenous gadolinium contrast was used. The typical duration for WB-MRI examination was 45 min.
For each body region (skull, cervical spine, thoracic spine, lumbar spine, pelvis, ribs/other, long bones), two radiologists each with > 10 years of experience, blinded to clinical information and the MRI findings, made a categorisation of disease burden on WB-CT with a previously described scoring system [4, 13]. This allowed the assessment of the number of lesions (> 20, 10–20, < 10, 0) and largest lesion dimension (> 10, 5–10, < 5, 0 mm) for each body region, assigning a score from 3 to 0 for each characteristic (lesion number and size), i.e. score 3 for > 20 lesions, score 2 for 10–20 lesions, score 1 for < 10 lesions and score 0 for 0 lesions; score 3 for > 10 mm, score 2 for 5–10 mm, score 1 for < 5 mm and score 0 for 0 lesions. The maximum lesion dimension was measured on the window setting in which the lesion was the most readily appreciated. A total score was then calculated for the whole skeleton. To achieve the final observer scores, discrepancies were resolved by a consensus reading facilitated by a third experienced radiologist. At a different time, the image reading was repeated for the WB-MRI data with readers blinded to the clinical information and CT findings. The maximum lesion dimension was measured on the sequence in which the lesion was the most readily appreciated. The image reading for WB-CT and WB-MRI was subsequently repeated by another pair of junior radiologists (< 1-year experience as a consultant).
Intraclass correlation coefficient (ICC) estimates and their 95% confidence intervals (95%CI) were calculated using a two-way random absolute single measures model. Statistical analyses were performed using IBM SPSS Statistics for Windows Version 25.0 (SPSS Inc.). ICC values less than 0.5 are considered to be indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability . With two observers, to detect the smallest possible value of 0.5 for ICC, using a two-sided test, with a pre-specified 5% significance level test (α = 0.05) and a power of 80% (β = 0.2), the required sample size is approximately 22 . The median and interquartile ranges (IQR) of the consensus observer scores on WB-MRI were compared with those on WB-CT, and the Wilcoxon signed-rank test was used to test the null hypothesis that the average signed rank of the two samples is zero. Spearman’s rank correlation coefficients were used to evaluate whether the WB-MRI and WB-CT scores of one observer correlated with the analogous scores of the other observer on a per-region and per-patient basis. A value of p < 0.05 was taken to be statistically significant in all tests.
A total of 22 patients with treatment-naïve symptomatic myeloma (mean age 61, range 36–72, 12 female and 10 male) were included. (Please see supplemental information for clinical details of the patient population.) A total of 154 body regions were scored for the presence of disease on CT and MRI. The interval between WB-CT and WB-MRI studies ranged from 0 to 26 days (mean 3 days, median 0 days). Mean bone marrow infiltration was 45% (range 15–80%). The mean DLP for WB-CT studies was 426.7 mGy/cm.
Interobserver agreement as demonstrated by intraclass correlation coefficient (ICC) for scoring WB-CT for individual body regions and the whole skeleton
ICC between experienced observers (95% confidence interval)
ICC between junior observers (95% confidence interval)
0.36 (scale not reliable–0.72)
0.30 (scale not reliable–0.70)
0.58 (scale not reliable–0.83)
0.35 (scale not reliable–0.66)
0.52 (scale not reliable–0.80)
Rib and other bones
0.56 (scale not reliable–0.82)
Interobserver agreement as demonstrated by ICC for scoring WB-MRI for individual body regions and the whole skeleton
ICC between experienced observers (95% confidence interval)
ICC between junior observers (95% confidence interval)
Rib and other bones
Observer scores for disease burden assessment
Median and interquartile ranges (IQR) for consensus observer scores for WB-CT and WB-MRI. Wilcoxon signed-rank test (two-tailed) showed no statistically significant difference between the scores for WB-CT and WB-MRI
Rib and other bones
Correlation between consensus WB-CT and WB-MRI scores for individual body regions and the whole skeleton, with p values for a two-tailed test of the null hypothesis rho = 0
Correlation coefficient (Spearman’s rho)
Rib and other bones
The study is the first to compare interobserver agreement between WB-MRI and WB-CT. Higher interobserver agreement of WB-MRI compared with WB-CT was demonstrated across the entire skeleton and for individual body regions by experienced and junior radiologists. Although higher scores were derived from WB-MRI compared with that from WB-CT for disease detection in long bones, there was no difference in the overall observer score. However, the study was limited by the inclusion of patients with confirmed myeloma and a high mean percentage marrow infiltration. The study therefore did not include a significant number of patients with lower disease burdens where MRI could be advantageous.
The clinical interpretation of whole-body cross-sectional studies is based on essentially visual assessment of the graphical representation of digital data, which is subject to variations due to observer experience, image interpretation and reading conditions. Hence, the interobserver agreement can be influenced by whether there is a distinct contrast between the normal and abnormal that can be easily and readily detected by the human eye. Inconsistency in interpretation is exacerbated in modalities where the difference between normal and abnormal is subtle, or when there is considerable overlap. As with any diagnostic test, to reach a valid diagnosis and judgement, it is desirable for the imaging used to assess myeloma to have a high degree of agreement. While agreement does not imply accuracy, a test with poor interobserver agreement cannot be reliably used in the clinical setting for the management decision-making .
The challenges many centres face in providing WB-MRI services are predominantly capacity issues. Although WB-MRI has been shown to be well tolerated by patients with myeloma in a tertiary referral centre , acutely unwell patients may not be able to tolerate WB-MRI scans which can last for up to 45 min. Achieving faster scanning times is a priority for WB-MRI researchers but until that is achieved, shortened MRI protocols (i.e. spine and pelvis coverage only) and WB-CT are reasonable alternatives . The position of WB-MRI is shifting from a state-of-the-art imaging technique to standard practice in oncological imaging, for disease detection, characterisation and therapy response in multiple myeloma.
WB-MRI is an increasingly deployed imaging technique in cancer imaging. It can offer excellent interobserver agreement in quantifying disease burden for the whole skeleton and across any individual body region in myeloma patients. Future larger-scale multi-centre studies are anticipated to provide further evidence to support this practice.
We acknowledge the Cancer Research UK and Engineering and Physical Sciences Research Council support to the Cancer Imaging Centre at the Royal Marsden Hospital and the Institute of Cancer Research in association with Medical Research Council and Department of Health C1060/A10334, C1060/A16464 and the National Health Service (NHS) funding to the National Institute for Health Research (NIHR) Biomedical Research Centre (BRC), Clinical Research Facility in Imaging and the Cancer Research Network. This report is independent research funded by the NIHR. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
This study has received funding by the National Institute for Health Research Biomedical Research Centre at The Royal Marsden.
Compliance with ethical standards
The scientific guarantor of this publication is Christina Messiou.
Conflict of interest
The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Statistics and biometry
No complex statistical methods were necessary for this paper.
Written informed consent was obtained from all subjects (patients) in this study.
Institutional Review Board approval was obtained.
• Diagnostic or prognostic study
• Performed at one institution
- 10.National Institute for Health and Care Excellence (2016) Myeloma: diagnosis and management. NICE Guideline 35. Available via https://www.nice.org.uk/guidance/ng35/chapter/recommendations#imaging-investigations. Accessed 17 May2018
- 11.National Institute for Health and Care Excellence (2016) Myeloma: diagnosis and management. NICE Guideline 35. Available via Appendices A-F. Available via https://www.nice.org.uk/guidance/ng35/evidence/appendices-af-pdf-2306487278. Accessed 17 May2018
- 15.Bujang MA, Baharum N (2017) A simplified guide to determination of sample size requirements for estimating the value of intraclass correlation coefficient: a review. Arch Orofac Sci 12:1–11Google Scholar
- 16.Kyriacou DN (2008) Reliability and validity of diagnostic tests. Acad Emerg Med 8:404–405. https://doi.org/10.1111/j.1553-2712.2001.tb02125.x CrossRefGoogle Scholar
- 21.Chilla GS, Tan CH, Xu C, Poh CL (2015) Diffusion weighted magnetic resonance imaging and its recent trend—a survey. Quant Imaging Med Surg 5:407–422. https://doi.org/10.3978/j.issn.2223-4292.2015.03.01 CrossRefPubMedPubMedCentralGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.