Abstract
Evaluation is a cornerstone of informatics, allowing us to objectively assess the strengths and weaknesses of a given tool. These insights ultimately provide insight and feedback for the improvement of a system and its approach in the future. Thus, this final chapter aims to provide an overview of the fundamental techniques that are used in informatics evaluations. The basis upon which any quantitative evaluation starts is with statistics and formal study design. A review of inferential statistical concepts is provided from the perspective of biostatistics (confidence intervals; hypothesis testing; error assessment including sensitivity/ specificity and receiver operating characteristics). Under study design, differences between observational investigations and controlled experiments are covered. Issues pertaining to population selection and study errors are briefly introduced. With these general tools, we then look to more specific informatics evaluations, using information retrieval (IR) systems and usability studies as examples to motivate further discussion. Methods for designing both types of evaluations and endpoint metrics are described in detail.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In this particular example, if we assume a binominal distribution, then the probability of the test guessing correctly 13 of 16 times is ~0.01. Given this low probability, it is unlikely that the tool's results are due to chance.
- 2.
A more rigorous, but complicated goodness-of-fit test is the Kolmogorov-Smirnov test, which can also be used to assess whether a sample comes from a population with a given distribution.
- 3.
From a statistical viewpoint, this belief may be true - but from a probabilistic perspective, if a sufficiently large cohort is used, an observational study may in fact have equivalent power to a controlled experiment. Knowledge discovery through inductive observational studies can be as conclusive as those obtained from experimental methods [6, 13]. See also Chapter 8 for a discussion of the subjectivist vs. objectivist interpretation.
- 4.
In contrast, post-hoc power analysis is done subsequent to data collection to compute the study's actual power based on the observed data.
- 5.
Researchers distinguish CWA from CTA in that CWA is a broader construct for understanding the environment, work domain, and interaction/behavior whereas CTA is directed more to accomplishing a goal in terms of sequential steps.
References
Aisen A, Broderick L, Winer-Muram H, Brodley C, Kak A, Pavlopoulou C, Dy J, Shyu, CR, Marchiori A (2003) Automated storage and retrieval of thin-section CT images to assist diagnosis: System description and preliminary assessment. Radiology, 228(1):265-270.
Ammenwerth E, Brender J, Nykanen P, Prokosch HU, Rigby M, Talmon J (2004) Visions and strategies to improve evaluation of health information systems: Reflections and lessons based on the HIS-EVAL workshop in Innsbruck. Int J Med Inform, 73(6):479-491.
Ammenwerth E, de Keizer N (2005) An inventory of evaluation studies of information technology in health care trends in evaluation research 1982-2002. Methods Inf Med, 44(1):44-56.
Ammenwerth E, de Keizer N (2007) A viewpoint on evidence-based health informatics, based on a pilot survey on evaluation studies in health care informatics. J Am Med Inform Assoc, 14(3):368-371.
Anderson JG, Aydin CE (2005) Overview: Theoretical perspectives and methodologies for the evaluation of healthcare information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 5-29.
Benson K, Hartz AJ (2000) A comparison of observational studies and randomized, controlled trials. N Engl J Med, 342(25):1878-1886.
Beuscart-Zephir MC, Anceaux F, Crinquette V, Renard JM (2001) Integrating users' activity modeling in the design and assessment of hospital electronic patient records: The example of anesthesia. Intl J Medical Informatics, 64(2):157-171.
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proc 21st Intl ACM SIGIR Conf Research and Development in Information Retrieval, Melbourne, Australia, pp 335-336.
Card SK, Moran TP, Newell A (1983) The Psychology of Human-computer Interaction. L Erlbaum Associates, Hillsdale, NJ.
Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. Proc SIGCHI Conf Human Factors in Computing Systems, Washington DC, USA, pp 213-218.
Cleverdon C, Mills J, Keen M (1966) Factors determining the performace of indexing systems. Aslib Cranfield Research Project, College of Aeronautics.
Concato J, Shah N, Horwitz RI (2000) Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med, 342(25):1887-1892.
Daniels J, Fels S, Kushniruk A, Lim J, Ansermino JM (2007) A framework for evaluating usability of clinical monitoring technology. J Clin Monit Comput, 21(5):323-330.
Dawson B, Trapp RG (2004) Basic & Clinical Biostatistics. 4th edition. Lange Medical Books/McGraw-Hill, Medical Pub. Division, New York, NY.
Demner-Fushman D, Lin J (2007) Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1):63-103.
Denne JS, Jennison C (1999) Estimating the sample size for a t-test using an internal pilot. Stat Med, 18:1575-1585.
Despont-Gros C, Mueller H, Lovis C (2005) Evaluating user interactions with clinical information systems: A model based on human-computer interaction models. J Biomedical Informatics, 38(3):244-255.
Effken JA (2002) Different lenses, improved outcomes: A new approach to the analysis and design of healthcare information systems. Int J Med Inform, 65(1):59-74.
Flack V, Afifi A, Lachenbruch P, Schouten H (1988) Sample size determinations for the two rater kappa statistic. Psychometrika, 53(3):321-325.
Fletcher RH, Fletcher SW (2005) Clinical epidemiology: The essentials. 4th edition. Lippincott Williams & Wilkins, Philadelphia, PA.
Friedman CP, Wyatt JC, Owens DK (2006) Evaluation and technology asessment. In: Shortliffe EH, Cimino JJ (eds) Biomedical Informatics: Computer Applications in Health Care and Biomedicine. Springer.
Graham MJ, Kubose TK, Jordan D, Zhang J, Johnson TR, Patel VL (2004) Heuristic evaluation of infusion pumps: Implications for patient safety in intensive care units. Int J Med Inform, 73(11-12):771-779.
Hajdukiewicz JR, Doyle DJ, Milgram P, Vicente KJ, Burns CM (1998) A work domain analysis of patient monitoring in the operating room. Proc 42nd Annual Meeting Human Factors and Ergonomics Society, pp 1038-1042.
Hersh W (2003) Information Retrieval: A Health and Biomedical Perspective. Springer-Verlag, New York.
Hersh W, Hickam D (1998) How well do physicians use electronic information retrieval systems. JAMA, 280(15):1347-1352.
Hornbæk K (2006) Current practice in measuring usability: Challenges to usability studies and research. Intl J Human-Computer Studies, 64(2):79-102.
Horsthemke WH, Raicu DS, Furst JD (2008) Evaluation challenges for bridging the semantic gap: Shape disagreements on pulmonary nodules in the Lung Image Database Consortium. Intl J Healthcare Information Systems and Informatics, 4(1):17-33.
Huang X, Lin J, Demner-Fushman D (2006) Evaluation of PICO as a knowledge representation for clinical questions. Proc AMIA Annu Symp:359-363.
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Information Systems, 20(4):422-446.
Kaplan B (1997) Addressing organizational issues into the evaluation of medical systems. J Am Med Inform Assoc, 4(2):94-101.
Kaplan B, Maxwell J (2005) Qualitative research methods for evaluating computer information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 30-55.
Kernan WN, Viscoli CM, Makuch RW, Brass LM, Horwitz RI (1999) Stratified randomization for clinical trials. J Clin Epidemiol, 52(1):19-26.
Kjeldskov J, Skov MB, Stage J (2008) A longitudinal study of usability in health care: Does time heal? Int J Med Inform.
Kurosu M, Kashimura K (1995) Apparent usability vs. inherent usability. Proc SIGCHI Conf Human Factors in Computing Systems, pp 292-293.
Kushniruk AW, Patel VL (2004) Cognitive and usability engineering methods for the evaluation of clinical information systems. J Biomed Inform, 37(1):56-76.
Laerum H, Ellingsen G, Faxvaag A (2001) Doctors' use of electronic medical records systems in hospitals: Cross sectional survey. BMJ, 323(7325):1344-1348.
Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform, 38(5):404-415.
Lee F, Teich JM, Spurr CD, Bates DW (1996) Implementation of physician order entry: User satisfaction and self-reported usage patterns. J Am Med Inform Assoc, 3(1):42-55.
Lehmann TM, Guld MO, Thies C, Fischer B, Spitzer K, Keysers D, Ney H, Kohnen M, Schubert H, Wein BB (2004) Content-based image retrieval in medical applications. Methods Inf Med, 43(4):354-361.
Lewis JR (1995) IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. Intl J Human-computer Interaction, 7(1):57-78.
Limbourg Q, Vanderdonckt J (2003) Comparing task models for user interface design. In: Diaper D, Stanton N (eds) The Handbook of Task Analysis for Human-Computer Interaction, pp 135-154.
Lindgaard G, Chattratichart J (2007) Usability testing: What have we overlooked? Proc SIGCHI Conf Human Factors in Computing Systems pp 1415-1424.
Loh WY, Shih YS (1997) Split selection methods for classification trees. Statistica Sinica, 7:815-840.
Long LR, Antani S, Deserno T, Thoma GR (2009) Content-based image retrieval in medicine: Retrospective assessment, state of the art, and future directions. Intl J Healthcare Information Systems and Informatics, 4(1):1-17.
Maclure M (1991) The case-crossover design: A method for studying transient effects on the risk of acute events. Am J Epidemiol, 133(2):144-153.
Mayhew DJ (1999) The Usability Engineering Lifecycle: A Practitioner's Handbook for User Interface Design. Morgan Kaufmann Publishers, San Francisco, Calif.
Metz CE (2006) Receiver operating characteristic analysis: A tool for the quantitative evaluation of observer performance and imaging systems. J Am Coll Radiol, 3(6):413-422.
Militello LG, Hutton RJB (1998) Applied cognitive task analysis (ACTA): A practitioner's toolkit for understanding cognitive task demands. Ergonomics, 41(11):1618-1641.
Morton SC, Adams JL, Suttorp MK, Shanman R, Valentine D, Rhodes S, Shekelle PG (2004) Meta-regression approaches: What, why, when, and how? (Technical Review 04-0033). Agency for Healthcare Research and Quality, Rockville, MD.
Müller H, Clough P, Hersh B, Geissbühler A (2007) Variation of relevance assessments for medical image retrieval. In: Marchand-Maillet S, Bruno E, Nurnberger A, Detyniecki M (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback (LNCS). Springer, pp 232-246.
Müller H, Deselaers T, Deserno T, Kalpathy-Cramer J, Kim E, Hersh W (2007) Overview of the ImageCLEF 2007 medical retrieval and annotation tasks. Advances in Multilingual and Multimodal Information Retrieval: Proc 8th Workshop Cross-Language Evaluation Forum (CLEF), Budapest, Hungary, pp 472-491.
Müller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform, 73(1):1-23.
Müller H, Rosset A, Vallée J, Terrier F, Geissbuhler A (2004) A reference data set for the evaluation of medical image retrieval systems. Comp Med Imaging and Graphics, 28(6):295-305.
Murff HJ, Kannry J (2001) Physician satisfaction with two order entry systems. J Am Med Inform Assoc, 8(5):499-509.
Nielsen J (1993) Usability Engineering. Academic Press, Boston.
Nielsen J (1994) Heuristic evaluation. In: Nielsen J, Mack RL (eds) Usability Inspection Methods. Wiley, New York.
Obuchowski NA (2003) Receiver operating characteristic curves and their use in radiology. Radiology, 229(1):3-8.
Obuchowski NA (2005) ROC analysis. Am. J. Roentgenol., 184(2):364-372.
Pampel FC (2000) Logistic Regression: A Primer Sage Publications, Thousand Oaks, CA.
Quinlan JR (1986) Induction of decision trees. Machine Learning, 1(1):81-106.
Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artificial Intelligence, 4:77-90.
Rose AF, Schnipper JL, Park ER, Poon EG, Li Q, Middleton B (2005) Using qualitative studies to improve the usability of an EMR. J Biomedical Informatics, 38(1):51-60.
Rosenberger WF, Lachin JM (2002) Randomization in Clinical Trials: Theory and practice. Wiley, New York, NY.
Salton G, Lesk M (1965) The SMART automatic document retrieval systems - An illustration. Communications of the ACM, 8(6):391-398.
Salton G, Wong A, C.S. Y (1975) A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620.
Schamber L, Eisenberg M, Nilan M (1990) A re-examination of relevance: Toward a dynamic, situational definition. Information Processing and Management, 26(6):755-776.
Shneiderman B, Plaisant C (2004) Designing the User Interface: Strategies for Effective Human-Computer Interaction. 4th edition. Pearson/Addison Wesley, Boston.
Shyu CR, Brodley C, Kak A, Kosaka A, Aisen A, Broderick L (1999) ASSERT: A physician-in-the-loop content-based retrieval system for HRCT image databases. Computer Vision and Image Understanding, 75(1-2):111-132.
Sittig DF, Kuperman GJ, Fiskio J (1999) Evaluating physician satisfaction regarding user interactions with an electronic medical record system. Proc AMIA Symp:400-404.
Snyder C (2006) Bias in usability testing. Accessed February 19, 2009.
Stein C (1945) A two-sample test for a linear hypothesis whose power is independent of the variance. Ann Math Stat, 16:243-258.
Stoicu-Tivadar L, Stoicu-Tivadar V (2006) Human-computer interaction reflected in the design of user interfaces for general practitioners. Int J Med Inform, 75(3-4):335-342.
Tagare H, Jaffe C, Duncan J (1997) Medical image databases: A content-based retrieval approach. J Am Med Inform Assoc, 4:184-198.
Talmon J, Enning J, Castaneda G, Eurlings F, Hoyer D, Nykanen P, Sanz F, Thayer C, Vissers M (1999) The VATAM guidelines. Int J Med Inform, 56(1-3):107-115.
Tang Z, Johnson TR, Tindall RD, Zhang J (2006) Applying heuristic evaluation to improve the usability of a telemedicine system. Telemed J E Health, 12(1):24-34.
Taylor RS (1962) The process of asking questions. American Documentation, 13(4):391-396.
Tractinsky N, Katz AS, Ikar D (2000) What is beautiful is usable. Interact Comp, 13(2):127-145.
Vicente KJ (1999) Cognitive Work Analysis: Toward Safe, Productive, and Healthy Computer-based Work. Lawrence Erlbaum Associates, Mahwah, NJ.
Virzi RA (1992) Refining the test phase of usability evaluation: How many subjects is enough? Human Factors, 34(4):457-468.
Wittes J, Brittain E (1990) The role of internal pilot studies in increasing the efficiency of clinical trials. Stat Med, 9:65-72.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Watt, E., Arnold, C., Sayre, J. (2010). Evaluation. In: Bui, A., Taira, R. (eds) Medical Imaging Informatics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0385-3_10
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0385-3_10
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0384-6
Online ISBN: 978-1-4419-0385-3
eBook Packages: EngineeringEngineering (R0)