25-Hydroxyvitamin D [25(OH)D] assay variation is thwarting attempts to develop evidenced-based criteria for defining clinical states of vitamin D status, i.e., deficiency, insufficiency, sufficiency, and toxicity. Since 1968 when 25(OH)D was discovered through 2016, some 60,000 papers have been published with only a handful based on standardized 25(OH)D measurements [1]. This lack of standardized 25(OH)D measurements makes it difficult if not impossible to conduct meta analyses of the relationship of 25(OH)D to vitamin D clinical states in order to develop 25(OH)D cut-points.

The challenge for the vitamin D field is to select key studies among those completed studies and, where stored samples exist, standardize the 25(OH)D measurements by calibrating the old values to values which are traceable to the Reference Measurement Procedures (RMPs) of the National Institute for Standards and Technology (NIST), Ghent University, and the Centers for Disease Control and Prevention (CDC). This is referred to as Retrospective Standardization. To accomplish retrospective standardization, the Vitamin D Standardization Program (VDSP) has developed two options for conducting calibration studies [2]. Option 1 has three steps: [1] Use results from the measurement of 40–50 single donor serum samples with RMP target values to develop an equation to convert values based on the current assay to the NIST-Ghent-CDC RMPs, [2] re-measure a statistically defined subsample of the stored sera from the completed study and develop an equation to convert past values to the current assay, and [3] merge the two sets of equations to form a single “Master Equation” that is used to convert past study 25(OH)D values to the NIST-Ghent University-CDC RMPs. A fourth step in this option might be to send a duplicate set of samples to a certified traceable laboratory for confirmation.

Option 2 is similar except for the fact that all new 2(OH)D measurements are made by a laboratory which is certified to be traceable to the RMPs [2]. The traceable laboratory measures a statistically representative sample of all the stored sera and then, using regression analysis, a Master Equation is developed to predict NIST-Ghent-CDC traceable laboratory values from the original assay values. The Master Equation is then used to convert all of the old data to NIST-Ghent-CDC traceable values.

Developing an affordable modified Option 1 is the goal of Drs. Jakab et al. in a thoughtful paper published recently in Osteoporosis International [3]. The authors propose using values from the analysis of samples distributed as part of the Vitamin D External Quality Assessment Scheme (DEQAS) to standardize the 206 25OHD measurements from the HunMen study by calibrating them to the RMPs.

The problem with Option 1 is its cost. A set of 40 single donor serum samples with RMP target values can cost up to $3000–4000. That along with the costs associated with the normal use of an assay can make the calibration of the old 25(OH)D values prohibitively expensive. DEQAS is an accuracy-based EQAS which distributes five samples per quarter. All the samples distributed have target values assigned to them by one of the RMPs—currently by NIST. In addition, all the distributed samples originate from donations that are collected and prepared by Solomon Park, Seattle, Washington, USA using exacting procedures to guarantee their commutability [4]. Finally, DEQAS’s yearly subscription (for 20 samples) is £200–250 per year depending on participation location. This makes it affordable to laboratories around the world.

The authors followed Option 1 perfectly. They, with their DiaSorin Liaison assay, participate in DEQAS. The master equation was developed from the DEQAS results for the five samples submitted shortly after re-measuring all 206 samples with their assay. Moreover, rather than re-measuring a statistically defined subsample, of all the samples, Jakab et al. re-measured all the samples from the data set. We have found that the minimal number of samples to be re-measured is approximately 100–150 [5, 6]. In light of those results, re-measuring all the samples was a reasonable decision.

So, why the editorial? The reason for this editorial is to update Option 1 based on our experience which suggests that five samples may not be enough. If the results from the plot used to develop the Master Equation are linear, as they appear to be from the HunMen study, then five samples may be sufficient. The problem is that the Master Equation may not be linear. We have found using Option 2 that it is often the case that the performance characteristics of the laboratory’s assay change at some point over the concentration range. In that case, the Master Equation can be characterized being a piecewise regression model where the two lines intersect at the point where the performance characteristics change [57]. The departures from linearity vary from the very subtle to the quite dramatic. The point of intersection in published research has varied from 49 to 121 nmol/L or indeed at any point over much of the 25OHD distribution.

An essential question is whether departures from linearity can be detected with only 5 DEQAS samples as the basis for developing a Master Equation in Option 1? Our fear is that many times the departures from linearity will be missed when the Master Equation has a foundation of only five samples. Then how many samples should be used? Ideally at least 100 samples, but we know that to be impractical so as a compromise we suggest that no fewer than 40 samples (2 years of DEQAS distributions) be used in developing the Master Equation [8]. These 40 samples are in addition to the re-measurement of a statistically defined subsample or, as Jakab et al. did, re-measurement of all the stored study samples. DEQAS participants can use samples from previous distributions and/or purchase additional samples from the DEQAS archive. These are also available to non-participants in DEQAS. This is just another excellent reason for all research and clinical laboratories to participate in DEQAS or the other accuracy-based performance testing (PT) program conducted by the College of American Pathologists (CAP), i.e., Accuracy-based Vitamin D (ABVD) Survey.

In conclusion, the paper by Drs. Jakab et al. is a major advance in promoting the standardization of completed studies. Suggesting the use of DEQAS samples to develop the Master Equation for VDSP Option 1 was a master stroke.