Evaluation

Watt, Emily; Arnold, Corey; Sayre, James

doi:10.1007/978-1-4419-0385-3_10

Evaluation

Emily Watt MLIS³,
Corey Arnold PhD⁴ &
James Sayre PhD⁵

Chapter
First Online: 10 October 2009

2121 Accesses
1 Citations

Abstract

Evaluation is a cornerstone of informatics, allowing us to objectively assess the strengths and weaknesses of a given tool. These insights ultimately provide insight and feedback for the improvement of a system and its approach in the future. Thus, this final chapter aims to provide an overview of the fundamental techniques that are used in informatics evaluations. The basis upon which any quantitative evaluation starts is with statistics and formal study design. A review of inferential statistical concepts is provided from the perspective of biostatistics (confidence intervals; hypothesis testing; error assessment including sensitivity/ specificity and receiver operating characteristics). Under study design, differences between observational investigations and controlled experiments are covered. Issues pertaining to population selection and study errors are briefly introduced. With these general tools, we then look to more specific informatics evaluations, using information retrieval (IR) systems and usability studies as examples to motivate further discussion. Methods for designing both types of evaluations and endpoint metrics are described in detail.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In this particular example, if we assume a binominal distribution, then the probability of the test guessing correctly 13 of 16 times is ~0.01. Given this low probability, it is unlikely that the tool's results are due to chance.
2.
A more rigorous, but complicated goodness-of-fit test is the Kolmogorov-Smirnov test, which can also be used to assess whether a sample comes from a population with a given distribution.
3.
From a statistical viewpoint, this belief may be true - but from a probabilistic perspective, if a sufficiently large cohort is used, an observational study may in fact have equivalent power to a controlled experiment. Knowledge discovery through inductive observational studies can be as conclusive as those obtained from experimental methods [6, 13]. See also Chapter 8 for a discussion of the subjectivist vs. objectivist interpretation.
4.
In contrast, post-hoc power analysis is done subsequent to data collection to compute the study's actual power based on the observed data.
5.
Researchers distinguish CWA from CTA in that CWA is a broader construct for understanding the environment, work domain, and interaction/behavior whereas CTA is directed more to accomplishing a goal in terms of sequential steps.

References

Aisen A, Broderick L, Winer-Muram H, Brodley C, Kak A, Pavlopoulou C, Dy J, Shyu, CR, Marchiori A (2003) Automated storage and retrieval of thin-section CT images to assist diagnosis: System description and preliminary assessment. Radiology, 228(1):265-270.
Article Google Scholar
Ammenwerth E, Brender J, Nykanen P, Prokosch HU, Rigby M, Talmon J (2004) Visions and strategies to improve evaluation of health information systems: Reflections and lessons based on the HIS-EVAL workshop in Innsbruck. Int J Med Inform, 73(6):479-491.
Article Google Scholar
Ammenwerth E, de Keizer N (2005) An inventory of evaluation studies of information technology in health care trends in evaluation research 1982-2002. Methods Inf Med, 44(1):44-56.
Google Scholar
Ammenwerth E, de Keizer N (2007) A viewpoint on evidence-based health informatics, based on a pilot survey on evaluation studies in health care informatics. J Am Med Inform Assoc, 14(3):368-371.
Article Google Scholar
Anderson JG, Aydin CE (2005) Overview: Theoretical perspectives and methodologies for the evaluation of healthcare information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 5-29.
Chapter Google Scholar
Benson K, Hartz AJ (2000) A comparison of observational studies and randomized, controlled trials. N Engl J Med, 342(25):1878-1886.
Article Google Scholar
Beuscart-Zephir MC, Anceaux F, Crinquette V, Renard JM (2001) Integrating users' activity modeling in the design and assessment of hospital electronic patient records: The example of anesthesia. Intl J Medical Informatics, 64(2):157-171.
Article Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
Google Scholar
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proc 21st Intl ACM SIGIR Conf Research and Development in Information Retrieval, Melbourne, Australia, pp 335-336.
Google Scholar
Card SK, Moran TP, Newell A (1983) The Psychology of Human-computer Interaction. L Erlbaum Associates, Hillsdale, NJ.
Google Scholar
Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. Proc SIGCHI Conf Human Factors in Computing Systems, Washington DC, USA, pp 213-218.
Google Scholar
Cleverdon C, Mills J, Keen M (1966) Factors determining the performace of indexing systems. Aslib Cranfield Research Project, College of Aeronautics.
Google Scholar
Concato J, Shah N, Horwitz RI (2000) Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med, 342(25):1887-1892.
Article Google Scholar
Daniels J, Fels S, Kushniruk A, Lim J, Ansermino JM (2007) A framework for evaluating usability of clinical monitoring technology. J Clin Monit Comput, 21(5):323-330.
Article Google Scholar
Dawson B, Trapp RG (2004) Basic & Clinical Biostatistics. 4th edition. Lange Medical Books/McGraw-Hill, Medical Pub. Division, New York, NY.
Google Scholar
Demner-Fushman D, Lin J (2007) Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1):63-103.
Article Google Scholar
Denne JS, Jennison C (1999) Estimating the sample size for a t-test using an internal pilot. Stat Med, 18:1575-1585.
Article MathSciNet Google Scholar
Despont-Gros C, Mueller H, Lovis C (2005) Evaluating user interactions with clinical information systems: A model based on human-computer interaction models. J Biomedical Informatics, 38(3):244-255.
Article Google Scholar
Effken JA (2002) Different lenses, improved outcomes: A new approach to the analysis and design of healthcare information systems. Int J Med Inform, 65(1):59-74.
Article Google Scholar
Flack V, Afifi A, Lachenbruch P, Schouten H (1988) Sample size determinations for the two rater kappa statistic. Psychometrika, 53(3):321-325.
Article MATH Google Scholar
Fletcher RH, Fletcher SW (2005) Clinical epidemiology: The essentials. 4th edition. Lippincott Williams & Wilkins, Philadelphia, PA.
Google Scholar
Friedman CP, Wyatt JC, Owens DK (2006) Evaluation and technology asessment. In: Shortliffe EH, Cimino JJ (eds) Biomedical Informatics: Computer Applications in Health Care and Biomedicine. Springer.
Google Scholar
Graham MJ, Kubose TK, Jordan D, Zhang J, Johnson TR, Patel VL (2004) Heuristic evaluation of infusion pumps: Implications for patient safety in intensive care units. Int J Med Inform, 73(11-12):771-779.
Article Google Scholar
Hajdukiewicz JR, Doyle DJ, Milgram P, Vicente KJ, Burns CM (1998) A work domain analysis of patient monitoring in the operating room. Proc 42nd Annual Meeting Human Factors and Ergonomics Society, pp 1038-1042.
Google Scholar
Hersh W (2003) Information Retrieval: A Health and Biomedical Perspective. Springer-Verlag, New York.
Google Scholar
Hersh W, Hickam D (1998) How well do physicians use electronic information retrieval systems. JAMA, 280(15):1347-1352.
Article Google Scholar
Hornbæk K (2006) Current practice in measuring usability: Challenges to usability studies and research. Intl J Human-Computer Studies, 64(2):79-102.
Article Google Scholar
Horsthemke WH, Raicu DS, Furst JD (2008) Evaluation challenges for bridging the semantic gap: Shape disagreements on pulmonary nodules in the Lung Image Database Consortium. Intl J Healthcare Information Systems and Informatics, 4(1):17-33.
Google Scholar
Huang X, Lin J, Demner-Fushman D (2006) Evaluation of PICO as a knowledge representation for clinical questions. Proc AMIA Annu Symp:359-363.
Google Scholar
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Information Systems, 20(4):422-446.
Article Google Scholar
Kaplan B (1997) Addressing organizational issues into the evaluation of medical systems. J Am Med Inform Assoc, 4(2):94-101.
Google Scholar
Kaplan B, Maxwell J (2005) Qualitative research methods for evaluating computer information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 30-55.
Chapter Google Scholar
Kernan WN, Viscoli CM, Makuch RW, Brass LM, Horwitz RI (1999) Stratified randomization for clinical trials. J Clin Epidemiol, 52(1):19-26.
Article Google Scholar
Kjeldskov J, Skov MB, Stage J (2008) A longitudinal study of usability in health care: Does time heal? Int J Med Inform.
Google Scholar
Kurosu M, Kashimura K (1995) Apparent usability vs. inherent usability. Proc SIGCHI Conf Human Factors in Computing Systems, pp 292-293.
Google Scholar
Kushniruk AW, Patel VL (2004) Cognitive and usability engineering methods for the evaluation of clinical information systems. J Biomed Inform, 37(1):56-76.
Article Google Scholar
Laerum H, Ellingsen G, Faxvaag A (2001) Doctors' use of electronic medical records systems in hospitals: Cross sectional survey. BMJ, 323(7325):1344-1348.
Article Google Scholar
Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform, 38(5):404-415.
Article Google Scholar
Lee F, Teich JM, Spurr CD, Bates DW (1996) Implementation of physician order entry: User satisfaction and self-reported usage patterns. J Am Med Inform Assoc, 3(1):42-55.
Google Scholar
Lehmann TM, Guld MO, Thies C, Fischer B, Spitzer K, Keysers D, Ney H, Kohnen M, Schubert H, Wein BB (2004) Content-based image retrieval in medical applications. Methods Inf Med, 43(4):354-361.
Google Scholar
Lewis JR (1995) IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. Intl J Human-computer Interaction, 7(1):57-78.
Article Google Scholar
Limbourg Q, Vanderdonckt J (2003) Comparing task models for user interface design. In: Diaper D, Stanton N (eds) The Handbook of Task Analysis for Human-Computer Interaction, pp 135-154.
Google Scholar
Lindgaard G, Chattratichart J (2007) Usability testing: What have we overlooked? Proc SIGCHI Conf Human Factors in Computing Systems pp 1415-1424.
Google Scholar
Loh WY, Shih YS (1997) Split selection methods for classification trees. Statistica Sinica, 7:815-840.
MATH MathSciNet Google Scholar
Long LR, Antani S, Deserno T, Thoma GR (2009) Content-based image retrieval in medicine: Retrospective assessment, state of the art, and future directions. Intl J Healthcare Information Systems and Informatics, 4(1):1-17.
Google Scholar
Maclure M (1991) The case-crossover design: A method for studying transient effects on the risk of acute events. Am J Epidemiol, 133(2):144-153.
Google Scholar
Mayhew DJ (1999) The Usability Engineering Lifecycle: A Practitioner's Handbook for User Interface Design. Morgan Kaufmann Publishers, San Francisco, Calif.
Google Scholar
Metz CE (2006) Receiver operating characteristic analysis: A tool for the quantitative evaluation of observer performance and imaging systems. J Am Coll Radiol, 3(6):413-422.
Article Google Scholar
Militello LG, Hutton RJB (1998) Applied cognitive task analysis (ACTA): A practitioner's toolkit for understanding cognitive task demands. Ergonomics, 41(11):1618-1641.
Article Google Scholar
Morton SC, Adams JL, Suttorp MK, Shanman R, Valentine D, Rhodes S, Shekelle PG (2004) Meta-regression approaches: What, why, when, and how? (Technical Review 04-0033). Agency for Healthcare Research and Quality, Rockville, MD.
Google Scholar
Müller H, Clough P, Hersh B, Geissbühler A (2007) Variation of relevance assessments for medical image retrieval. In: Marchand-Maillet S, Bruno E, Nurnberger A, Detyniecki M (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback (LNCS). Springer, pp 232-246.
Chapter Google Scholar
Müller H, Deselaers T, Deserno T, Kalpathy-Cramer J, Kim E, Hersh W (2007) Overview of the ImageCLEF 2007 medical retrieval and annotation tasks. Advances in Multilingual and Multimodal Information Retrieval: Proc 8th Workshop Cross-Language Evaluation Forum (CLEF), Budapest, Hungary, pp 472-491.
Google Scholar
Müller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform, 73(1):1-23.
Article Google Scholar
Müller H, Rosset A, Vallée J, Terrier F, Geissbuhler A (2004) A reference data set for the evaluation of medical image retrieval systems. Comp Med Imaging and Graphics, 28(6):295-305.
Article Google Scholar
Murff HJ, Kannry J (2001) Physician satisfaction with two order entry systems. J Am Med Inform Assoc, 8(5):499-509.
Google Scholar
Nielsen J (1993) Usability Engineering. Academic Press, Boston.
MATH Google Scholar
Nielsen J (1994) Heuristic evaluation. In: Nielsen J, Mack RL (eds) Usability Inspection Methods. Wiley, New York.
Google Scholar
Obuchowski NA (2003) Receiver operating characteristic curves and their use in radiology. Radiology, 229(1):3-8.
Article Google Scholar
Obuchowski NA (2005) ROC analysis. Am. J. Roentgenol., 184(2):364-372.
Google Scholar
Pampel FC (2000) Logistic Regression: A Primer Sage Publications, Thousand Oaks, CA.
MATH Google Scholar
Quinlan JR (1986) Induction of decision trees. Machine Learning, 1(1):81-106.
Google Scholar
Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artificial Intelligence, 4:77-90.
MATH Google Scholar
Rose AF, Schnipper JL, Park ER, Poon EG, Li Q, Middleton B (2005) Using qualitative studies to improve the usability of an EMR. J Biomedical Informatics, 38(1):51-60.
Article Google Scholar
Rosenberger WF, Lachin JM (2002) Randomization in Clinical Trials: Theory and practice. Wiley, New York, NY.
Book MATH Google Scholar
Salton G, Lesk M (1965) The SMART automatic document retrieval systems - An illustration. Communications of the ACM, 8(6):391-398.
Article Google Scholar
Salton G, Wong A, C.S. Y (1975) A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620.
Article MATH Google Scholar
Schamber L, Eisenberg M, Nilan M (1990) A re-examination of relevance: Toward a dynamic, situational definition. Information Processing and Management, 26(6):755-776.
Article Google Scholar
Shneiderman B, Plaisant C (2004) Designing the User Interface: Strategies for Effective Human-Computer Interaction. 4th edition. Pearson/Addison Wesley, Boston.
Google Scholar
Shyu CR, Brodley C, Kak A, Kosaka A, Aisen A, Broderick L (1999) ASSERT: A physician-in-the-loop content-based retrieval system for HRCT image databases. Computer Vision and Image Understanding, 75(1-2):111-132.
Article Google Scholar
Sittig DF, Kuperman GJ, Fiskio J (1999) Evaluating physician satisfaction regarding user interactions with an electronic medical record system. Proc AMIA Symp:400-404.
Google Scholar
Snyder C (2006) Bias in usability testing. Accessed February 19, 2009.
Google Scholar
Stein C (1945) A two-sample test for a linear hypothesis whose power is independent of the variance. Ann Math Stat, 16:243-258.
Article MATH Google Scholar
Stoicu-Tivadar L, Stoicu-Tivadar V (2006) Human-computer interaction reflected in the design of user interfaces for general practitioners. Int J Med Inform, 75(3-4):335-342.
Article Google Scholar
Tagare H, Jaffe C, Duncan J (1997) Medical image databases: A content-based retrieval approach. J Am Med Inform Assoc, 4:184-198.
Google Scholar
Talmon J, Enning J, Castaneda G, Eurlings F, Hoyer D, Nykanen P, Sanz F, Thayer C, Vissers M (1999) The VATAM guidelines. Int J Med Inform, 56(1-3):107-115.
Article Google Scholar
Tang Z, Johnson TR, Tindall RD, Zhang J (2006) Applying heuristic evaluation to improve the usability of a telemedicine system. Telemed J E Health, 12(1):24-34.
Article Google Scholar
Taylor RS (1962) The process of asking questions. American Documentation, 13(4):391-396.
Article Google Scholar
Tractinsky N, Katz AS, Ikar D (2000) What is beautiful is usable. Interact Comp, 13(2):127-145.
Article Google Scholar
Vicente KJ (1999) Cognitive Work Analysis: Toward Safe, Productive, and Healthy Computer-based Work. Lawrence Erlbaum Associates, Mahwah, NJ.
Google Scholar
Virzi RA (1992) Refining the test phase of usability evaluation: How many subjects is enough? Human Factors, 34(4):457-468.
Google Scholar
Wittes J, Brittain E (1990) The role of internal pilot studies in increasing the efficiency of clinical trials. Stat Med, 9:65-72.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Medical Imaging Informatics, UCLA Biomedical Engineering IDP, Los Angeles, CA, USA
Emily Watt MLIS
Medical Imaging Informatics & Department of Information Studies, University of California, Los Angeles, CA, USA
Corey Arnold PhD
Departments of Biostatistics & Radiological Sciences, UCLA David Geffen School of Medicine, Los Angeles, CA, USA
James Sayre PhD

Authors

Emily Watt MLIS
View author publications
You can also search for this author in PubMed Google Scholar
Corey Arnold PhD
View author publications
You can also search for this author in PubMed Google Scholar
James Sayre PhD
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Medical Imaging Informatics Group, University of California, Los Angeles, Westwood Blvd. 924 , Los Angeles, 90024, U.S.A.
Alex A.T. Bui
Medical Imaging Informatics Group, University of California, Los Angeles, Westwood Blvd. 924 , Los Angeles, 90024, U.S.A.
Ricky K. Taira

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Watt, E., Arnold, C., Sayre, J. (2010). Evaluation. In: Bui, A., Taira, R. (eds) Medical Imaging Informatics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0385-3_10

Download citation

DOI: https://doi.org/10.1007/978-1-4419-0385-3_10
Published: 10 October 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0384-6
Online ISBN: 978-1-4419-0385-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics