Applying Time-Series Analysis to Detect Scale Drift
This chapter focuses on applying the method of regression with autoregressive moving-average (ARMA) errors to monitor equated scores over time. This method can provide a whole picture of equated scores without the use of any additional equating designs. Depending on how a test is scored (e.g., scored by number correct, formula scored, item response theory [IRT] scored), the raw score of an examinee on a test will look different. In order to aid the interpretability of the scores provided to test users and test takers, the raw scores are transformed to scale scores. The scale scores are the reported scores received by test users and therefore are the most visible and important part of an assessment. Typically, scaling is established by mapping raw scores from a single test form to scale scores. Establishing the scale for reporting scores is a process that is both statistically and policy based, and it should support the purpose of the assessment. The reporting scale should (a) have an established mean and variance, (b) allow for a good representation of easier or more difficult subsequent test forms, (c) avoid (misleading) comparisons with different and already established assessments, and (d) incorporate score precision (such as reflecting a special relationship of the standard error of measurement across the score points, or deciding about the number of score scale points). It is common to talk about equating and scaling as a two-step process. In practice, the scaling of the scores from a new test form is accomplished as follows: Raw scores on the new test form are equated back to the raw scores of the previous (old) form for which the scaling has been established.