Skip to main content

Conducting Measurement Studies and Using the Results

  • Chapter
  • First Online:
Evaluation Methods in Biomedical and Health Informatics

Abstract

This chapter extends the concepts of measurement introduced in the previous chapter to addressed the more practical matters of how measurement processes are actually improved through measurement studies. The specific topics addressed are how to identify observations that are not well-behaved, standard errors of measurement, and how to take into account the effects of measurement errors on the results of demonstration studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Some texts refer to part-whole correlations as item-total correlations.

  2. 2.

    When the purpose of computing the coefficients is to inspect them to determine if the observations are “well behaved,” the Pearson coefficient is widely used and is the only coefficient discussed explicitly here. The Pearson coefficient assumes that the variables are both measured with interval or ratio properties and normally distributed. Even though both assumptions are frequently violated, the Pearson coefficient provides useful guidance to the study team performing measurement studies.

  3. 3.

    This is a useful approximation for computing the 95% confidence interval. A more exact formula is 95%CI = Mean ± (1.96 × SEmean).

  4. 4.

    For those experienced in inferential statistics, a t-test performed on the case with the larger standard errors reveals t = 1.77, df = 198, p = .08. With the reduced standard errors, t = 2.21, df = 198, p = .03.

  5. 5.

    The authors are grateful to Johan van der Lei and his colleagues for sharing the original data from their study.

  6. 6.

    Readers familiar with theories of reliability might observe that coefficient alpha, which does not take into account panelists’ stringency or leniency as a source of error, might overestimate the reliability in this example. In this case, however, use of an alternate reliability coefficient had a negligible effect.

References

  • Frey BB, editor. The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage; 2018.

    Google Scholar 

  • Mourougan S, Sethuraman K. Enhancing questionnaire design, development and testing through standardized approach. IOSR J Bus Manage. 2017;19:1–8.

    Google Scholar 

  • Streiner DL, Norman GR, Cairney J. Health measurement scales: a practical guide to their development and use. 5th ed. Oxford: Oxford University Press; 2015.

    Book  Google Scholar 

  • Van der Lei J, Musen MA, van der Does E, Man in’t Veld AJ, van Bemmel JH. Comparison of computer-aided and human review of general practitioners’ management of hypertension. Lancet. 1991;338:1504–8.

    Article  Google Scholar 

  • Walsh CG, Sharman K, Hripcsak G. Beyond discrimination: a comparison of calibration methods and clinical usefulness of predictive models of readmission risk. J Biomed Inform. 2017;76:9–18.

    Article  Google Scholar 

  • Watson JC. Establishing evidence for internal structure using exploratory factor analysis. Measure Eval Counsel Develop. 2017;50:232–8.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles P. Friedman .

Answers to Self-Tests

Answers to Self-Tests

Self-Test 8.1

  1. 1.

    Stages 1–5 are explicitly represented in the example. Stage 6 is not represented because no validity study is described. This makes Stage 7 only partially complete and Stage 8 would be premature.

  2. 2.

    Ratings based on one judge, as prognosticated by the Prophesy Formula, have a reliability of .36, which is not acceptable.

  3. 3.

    Content validity could be explored by review of the credentials of the selected judges. Criterion validity could be explored by locating other cases similar to those used in the study, and then examining whether cases where the behavior recommended by the resource was taken (for whatever reason since they would not have had access to the resource) by clinicians caring for those patients. The system’s advice would be valid to the extent that the cases where the recommended actions were taken exhibited better outcomes.

Self-Test 8.2

  1. 1.

    It would affect calibration, by altering the difference between the scores for Observation 5 and the other observations. The change reduces all values of Observation 5 by one scale point. This would not affect the correlation with the other observations.

  2. 2.

    More observations are needed to increase the reliability. The observations in the set are generally well behaved. There are not enough of them.

  3. 3.

    It appears that two attributes are being measured. Items 1–3 are measuring one attribute, and items 4–6 are measuring the other.

Self-Test 8.3

  1. 1.

    (a) Attribute is the acuity of each skin lesion.

    Objects are the lesions, presumably images of them.

    An independent observation is a diagnostic assessment of one lesion by one dermatologist.

    (b) Because the machine learning approach uses a computer algorithm, as long as the code is stable and the hardware is functioning properly, the assessments made by machine would be expected to be completely consistent and reproducible.

    (c) Using the attenuation formula, the “true” (unattenuated) correlation would be 0.54. (The two reliabilities are: 0.7 for the dermatologists’ assessments and 1 for the machine learning assessments.)

  2. 2.

    The answer may be obtained by substituting rcorrected ≤ 1 into the formula:

$$ {r}_{\mathrm{corrected}}=\frac{r_{\mathrm{observed}}}{\sqrt{\uprho_1{\uprho}_2}} $$
  • to obtain the inequality:

$$ 1\ge \frac{r_{\mathrm{observed}}}{\sqrt{\uprho_1{\uprho}_2}} $$

Self-Test 8.4

  1. 1.

    SEmeas (eight judges) = 1.10.

  2. 2.

    Judge H displays the highest corrected part–whole correlation (0.55) and thus can be considered the “best” judge. Judge E is a close second with a part–whole correlation of 0.50. Judge C may be considered the worst judge, with a part–whole correlation of −0.27. Removing Judge C raises the reliability from 0.29 to 0.54 in this example. Such a large change in reliability is seen in part because the number of objects in this example is small. Judges B and D can in some sense be considered the worst, as they rendered the same result for every object and their part–whole correlations cannot be calculated.

  1. 3.

    Reliability (4 judges) = 0.17; reliability (10 judges) = 0.34.

  1. 4.

    Hypercritic

    Judges

    Valid

    Not valid

    Generated

    3

    2

    Not generated

    4

    3

  1. 5.

    Corrected correlation is 0.17.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Friedman, C.P., Wyatt, J.C., Ash, J.S. (2022). Conducting Measurement Studies and Using the Results. In: Evaluation Methods in Biomedical and Health Informatics. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-030-86453-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86453-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86452-1

  • Online ISBN: 978-3-030-86453-8

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics