Abstract
This chapter extends the concepts of measurement introduced in the previous chapter to addressed the more practical matters of how measurement processes are actually improved through measurement studies. The specific topics addressed are how to identify observations that are not well-behaved, standard errors of measurement, and how to take into account the effects of measurement errors on the results of demonstration studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Some texts refer to part-whole correlations as item-total correlations.
- 2.
When the purpose of computing the coefficients is to inspect them to determine if the observations are “well behaved,” the Pearson coefficient is widely used and is the only coefficient discussed explicitly here. The Pearson coefficient assumes that the variables are both measured with interval or ratio properties and normally distributed. Even though both assumptions are frequently violated, the Pearson coefficient provides useful guidance to the study team performing measurement studies.
- 3.
This is a useful approximation for computing the 95% confidence interval. A more exact formula is 95%CI = Mean ± (1.96 × SEmean).
- 4.
For those experienced in inferential statistics, a t-test performed on the case with the larger standard errors reveals t = 1.77, df = 198, p = .08. With the reduced standard errors, t = 2.21, df = 198, p = .03.
- 5.
The authors are grateful to Johan van der Lei and his colleagues for sharing the original data from their study.
- 6.
Readers familiar with theories of reliability might observe that coefficient alpha, which does not take into account panelists’ stringency or leniency as a source of error, might overestimate the reliability in this example. In this case, however, use of an alternate reliability coefficient had a negligible effect.
References
Frey BB, editor. The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage; 2018.
Mourougan S, Sethuraman K. Enhancing questionnaire design, development and testing through standardized approach. IOSR J Bus Manage. 2017;19:1–8.
Streiner DL, Norman GR, Cairney J. Health measurement scales: a practical guide to their development and use. 5th ed. Oxford: Oxford University Press; 2015.
Van der Lei J, Musen MA, van der Does E, Man in’t Veld AJ, van Bemmel JH. Comparison of computer-aided and human review of general practitioners’ management of hypertension. Lancet. 1991;338:1504–8.
Walsh CG, Sharman K, Hripcsak G. Beyond discrimination: a comparison of calibration methods and clinical usefulness of predictive models of readmission risk. J Biomed Inform. 2017;76:9–18.
Watson JC. Establishing evidence for internal structure using exploratory factor analysis. Measure Eval Counsel Develop. 2017;50:232–8.
Author information
Authors and Affiliations
Corresponding author
Answers to Self-Tests
Answers to Self-Tests
Self-Test 8.1
-
1.
Stages 1–5 are explicitly represented in the example. Stage 6 is not represented because no validity study is described. This makes Stage 7 only partially complete and Stage 8 would be premature.
-
2.
Ratings based on one judge, as prognosticated by the Prophesy Formula, have a reliability of .36, which is not acceptable.
-
3.
Content validity could be explored by review of the credentials of the selected judges. Criterion validity could be explored by locating other cases similar to those used in the study, and then examining whether cases where the behavior recommended by the resource was taken (for whatever reason since they would not have had access to the resource) by clinicians caring for those patients. The system’s advice would be valid to the extent that the cases where the recommended actions were taken exhibited better outcomes.
Self-Test 8.2
-
1.
It would affect calibration, by altering the difference between the scores for Observation 5 and the other observations. The change reduces all values of Observation 5 by one scale point. This would not affect the correlation with the other observations.
-
2.
More observations are needed to increase the reliability. The observations in the set are generally well behaved. There are not enough of them.
-
3.
It appears that two attributes are being measured. Items 1–3 are measuring one attribute, and items 4–6 are measuring the other.
Self-Test 8.3
-
1.
(a) Attribute is the acuity of each skin lesion.
Objects are the lesions, presumably images of them.
An independent observation is a diagnostic assessment of one lesion by one dermatologist.
(b) Because the machine learning approach uses a computer algorithm, as long as the code is stable and the hardware is functioning properly, the assessments made by machine would be expected to be completely consistent and reproducible.
(c) Using the attenuation formula, the “true” (unattenuated) correlation would be 0.54. (The two reliabilities are: 0.7 for the dermatologists’ assessments and 1 for the machine learning assessments.)
-
2.
The answer may be obtained by substituting rcorrected ≤ 1 into the formula:
-
to obtain the inequality:
Self-Test 8.4
-
1.
SEmeas (eight judges) = 1.10.
-
2.
Judge H displays the highest corrected part–whole correlation (0.55) and thus can be considered the “best” judge. Judge E is a close second with a part–whole correlation of 0.50. Judge C may be considered the worst judge, with a part–whole correlation of −0.27. Removing Judge C raises the reliability from 0.29 to 0.54 in this example. Such a large change in reliability is seen in part because the number of objects in this example is small. Judges B and D can in some sense be considered the worst, as they rendered the same result for every object and their part–whole correlations cannot be calculated.
-
3.
Reliability (4 judges) = 0.17; reliability (10 judges) = 0.34.
-
4.
Hypercritic
Judges
Valid
Not valid
Generated
3
2
Not generated
4
3
-
5.
Corrected correlation is 0.17.
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Friedman, C.P., Wyatt, J.C., Ash, J.S. (2022). Conducting Measurement Studies and Using the Results. In: Evaluation Methods in Biomedical and Health Informatics. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-030-86453-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-86453-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86452-1
Online ISBN: 978-3-030-86453-8
eBook Packages: MedicineMedicine (R0)