Conducting Measurement Studies and Using the Results

Friedman, Charles P.; Wyatt, Jeremy C.; Ash, Joan S.

doi:10.1007/978-3-030-86453-8_8

Charles P. Friedman⁴,
Jeremy C. Wyatt⁵ &
Joan S. Ash⁶

Part of the book series: Health Informatics ((HI))

Abstract

This chapter extends the concepts of measurement introduced in the previous chapter to addressed the more practical matters of how measurement processes are actually improved through measurement studies. The specific topics addressed are how to identify observations that are not well-behaved, standard errors of measurement, and how to take into account the effects of measurement errors on the results of demonstration studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Some texts refer to part-whole correlations as item-total correlations.
2.
When the purpose of computing the coefficients is to inspect them to determine if the observations are “well behaved,” the Pearson coefficient is widely used and is the only coefficient discussed explicitly here. The Pearson coefficient assumes that the variables are both measured with interval or ratio properties and normally distributed. Even though both assumptions are frequently violated, the Pearson coefficient provides useful guidance to the study team performing measurement studies.
3.
This is a useful approximation for computing the 95% confidence interval. A more exact formula is 95%CI = Mean ± (1.96 × SE_mean).
4.
For those experienced in inferential statistics, a t-test performed on the case with the larger standard errors reveals t = 1.77, df = 198, p = .08. With the reduced standard errors, t = 2.21, df = 198, p = .03.
5.
The authors are grateful to Johan van der Lei and his colleagues for sharing the original data from their study.
6.
Readers familiar with theories of reliability might observe that coefficient alpha, which does not take into account panelists’ stringency or leniency as a source of error, might overestimate the reliability in this example. In this case, however, use of an alternate reliability coefficient had a negligible effect.

References

Frey BB, editor. The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage; 2018.
Google Scholar
Mourougan S, Sethuraman K. Enhancing questionnaire design, development and testing through standardized approach. IOSR J Bus Manage. 2017;19:1–8.
Google Scholar
Streiner DL, Norman GR, Cairney J. Health measurement scales: a practical guide to their development and use. 5th ed. Oxford: Oxford University Press; 2015.
Book Google Scholar
Van der Lei J, Musen MA, van der Does E, Man in’t Veld AJ, van Bemmel JH. Comparison of computer-aided and human review of general practitioners’ management of hypertension. Lancet. 1991;338:1504–8.
Article Google Scholar
Walsh CG, Sharman K, Hripcsak G. Beyond discrimination: a comparison of calibration methods and clinical usefulness of predictive models of readmission risk. J Biomed Inform. 2017;76:9–18.
Article Google Scholar
Watson JC. Establishing evidence for internal structure using exploratory factor analysis. Measure Eval Counsel Develop. 2017;50:232–8.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
Charles P. Friedman
Department of Primary Care, Population Sciences and Medical Education, School of Medicine, University of Southampton, Southampton, UK
Jeremy C. Wyatt
Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR, USA
Joan S. Ash

Authors

Charles P. Friedman
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy C. Wyatt
View author publications
You can also search for this author in PubMed Google Scholar
Joan S. Ash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charles P. Friedman .

Answers to Self-Tests

Self-Test 8.1

1.
Stages 1–5 are explicitly represented in the example. Stage 6 is not represented because no validity study is described. This makes Stage 7 only partially complete and Stage 8 would be premature.
2.
Ratings based on one judge, as prognosticated by the Prophesy Formula, have a reliability of .36, which is not acceptable.
3.
Content validity could be explored by review of the credentials of the selected judges. Criterion validity could be explored by locating other cases similar to those used in the study, and then examining whether cases where the behavior recommended by the resource was taken (for whatever reason since they would not have had access to the resource) by clinicians caring for those patients. The system’s advice would be valid to the extent that the cases where the recommended actions were taken exhibited better outcomes.

Self-Test 8.2

1.
It would affect calibration, by altering the difference between the scores for Observation 5 and the other observations. The change reduces all values of Observation 5 by one scale point. This would not affect the correlation with the other observations.
2.
More observations are needed to increase the reliability. The observations in the set are generally well behaved. There are not enough of them.
3.
It appears that two attributes are being measured. Items 1–3 are measuring one attribute, and items 4–6 are measuring the other.

Self-Test 8.3

1.
(a) Attribute is the acuity of each skin lesion.

Objects are the lesions, presumably images of them.

An independent observation is a diagnostic assessment of one lesion by one dermatologist.

(b) Because the machine learning approach uses a computer algorithm, as long as the code is stable and the hardware is functioning properly, the assessments made by machine would be expected to be completely consistent and reproducible.

(c) Using the attenuation formula, the “true” (unattenuated) correlation would be 0.54. (The two reliabilities are: 0.7 for the dermatologists’ assessments and 1 for the machine learning assessments.)
2.
The answer may be obtained by substituting r_corrected ≤ 1 into the formula:

$$ {r}_{\mathrm{corrected}}=\frac{r_{\mathrm{observed}}}{\sqrt{\uprho_1{\uprho}_2}} $$

to obtain the inequality:

$$ 1\ge \frac{r_{\mathrm{observed}}}{\sqrt{\uprho_1{\uprho}_2}} $$

Self-Test 8.4

1.
SE_meas (eight judges) = 1.10.
2.
Judge H displays the highest corrected part–whole correlation (0.55) and thus can be considered the “best” judge. Judge E is a close second with a part–whole correlation of 0.50. Judge C may be considered the worst judge, with a part–whole correlation of −0.27. Removing Judge C raises the reliability from 0.29 to 0.54 in this example. Such a large change in reliability is seen in part because the number of objects in this example is small. Judges B and D can in some sense be considered the worst, as they rendered the same result for every object and their part–whole correlations cannot be calculated.

3.
Reliability (4 judges) = 0.17; reliability (10 judges) = 0.34.

4.
Hypercritic
Judges
Valid
Not valid
Generated
3
2
Not generated
4
3

5.
Corrected correlation is 0.17.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Friedman, C.P., Wyatt, J.C., Ash, J.S. (2022). Conducting Measurement Studies and Using the Results. In: Evaluation Methods in Biomedical and Health Informatics. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-030-86453-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-86453-8_8
Published: 10 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86452-1
Online ISBN: 978-3-030-86453-8
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics

Hypercritic	Judges
Hypercritic	Valid	Not valid
Generated	3	2
Not generated	4	3

Conducting Measurement Studies and Using the Results

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Answers to Self-Tests

Answers to Self-Tests

Self-Test 8.1

Self-Test 8.2

Self-Test 8.3

Self-Test 8.4

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation