Skip to main content

Evaluation

  • Chapter
  • First Online:

Abstract

Evaluation is a cornerstone of informatics, allowing us to objectively assess the strengths and weaknesses of a given tool. These insights ultimately provide insight and feedback for the improvement of a system and its approach in the future. Thus, this final chapter aims to provide an overview of the fundamental techniques that are used in informatics evaluations. The basis upon which any quantitative evaluation starts is with statistics and formal study design. A review of inferential statistical concepts is provided from the perspective of biostatistics (confidence intervals; hypothesis testing; error assessment including sensitivity/ specificity and receiver operating characteristics). Under study design, differences between observational investigations and controlled experiments are covered. Issues pertaining to population selection and study errors are briefly introduced. With these general tools, we then look to more specific informatics evaluations, using information retrieval (IR) systems and usability studies as examples to motivate further discussion. Methods for designing both types of evaluations and endpoint metrics are described in detail.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In this particular example, if we assume a binominal distribution, then the probability of the test guessing correctly 13 of 16 times is ~0.01. Given this low probability, it is unlikely that the tool's results are due to chance.

  2. 2.

    A more rigorous, but complicated goodness-of-fit test is the Kolmogorov-Smirnov test, which can also be used to assess whether a sample comes from a population with a given distribution.

  3. 3.

    From a statistical viewpoint, this belief may be true - but from a probabilistic perspective, if a sufficiently large cohort is used, an observational study may in fact have equivalent power to a controlled experiment. Knowledge discovery through inductive observational studies can be as conclusive as those obtained from experimental methods [6, 13]. See also Chapter 8 for a discussion of the subjectivist vs. objectivist interpretation.

  4. 4.

    In contrast, post-hoc power analysis is done subsequent to data collection to compute the study's actual power based on the observed data.

  5. 5.

    Researchers distinguish CWA from CTA in that CWA is a broader construct for understanding the environment, work domain, and interaction/behavior whereas CTA is directed more to accomplishing a goal in terms of sequential steps.

References

  1. Aisen A, Broderick L, Winer-Muram H, Brodley C, Kak A, Pavlopoulou C, Dy J, Shyu, CR, Marchiori A (2003) Automated storage and retrieval of thin-section CT images to assist diagnosis: System description and preliminary assessment. Radiology, 228(1):265-270.

    Article  Google Scholar 

  2. Ammenwerth E, Brender J, Nykanen P, Prokosch HU, Rigby M, Talmon J (2004) Visions and strategies to improve evaluation of health information systems: Reflections and lessons based on the HIS-EVAL workshop in Innsbruck. Int J Med Inform, 73(6):479-491.

    Article  Google Scholar 

  3. Ammenwerth E, de Keizer N (2005) An inventory of evaluation studies of information technology in health care trends in evaluation research 1982-2002. Methods Inf Med, 44(1):44-56.

    Google Scholar 

  4. Ammenwerth E, de Keizer N (2007) A viewpoint on evidence-based health informatics, based on a pilot survey on evaluation studies in health care informatics. J Am Med Inform Assoc, 14(3):368-371.

    Article  Google Scholar 

  5. Anderson JG, Aydin CE (2005) Overview: Theoretical perspectives and methodologies for the evaluation of healthcare information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 5-29.

    Chapter  Google Scholar 

  6. Benson K, Hartz AJ (2000) A comparison of observational studies and randomized, controlled trials. N Engl J Med, 342(25):1878-1886.

    Article  Google Scholar 

  7. Beuscart-Zephir MC, Anceaux F, Crinquette V, Renard JM (2001) Integrating users' activity modeling in the design and assessment of hospital electronic patient records: The example of anesthesia. Intl J Medical Informatics, 64(2):157-171.

    Article  Google Scholar 

  8. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees. Wadsworth International Group, Belmont, CA.

    Google Scholar 

  9. Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proc 21st Intl ACM SIGIR Conf Research and Development in Information Retrieval, Melbourne, Australia, pp 335-336.

    Google Scholar 

  10. Card SK, Moran TP, Newell A (1983) The Psychology of Human-computer Interaction. L Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  11. Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. Proc SIGCHI Conf Human Factors in Computing Systems, Washington DC, USA, pp 213-218.

    Google Scholar 

  12. Cleverdon C, Mills J, Keen M (1966) Factors determining the performace of indexing systems. Aslib Cranfield Research Project, College of Aeronautics.

    Google Scholar 

  13. Concato J, Shah N, Horwitz RI (2000) Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med, 342(25):1887-1892.

    Article  Google Scholar 

  14. Daniels J, Fels S, Kushniruk A, Lim J, Ansermino JM (2007) A framework for evaluating usability of clinical monitoring technology. J Clin Monit Comput, 21(5):323-330.

    Article  Google Scholar 

  15. Dawson B, Trapp RG (2004) Basic & Clinical Biostatistics. 4th edition. Lange Medical Books/McGraw-Hill, Medical Pub. Division, New York, NY.

    Google Scholar 

  16. Demner-Fushman D, Lin J (2007) Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1):63-103.

    Article  Google Scholar 

  17. Denne JS, Jennison C (1999) Estimating the sample size for a t-test using an internal pilot. Stat Med, 18:1575-1585.

    Article  MathSciNet  Google Scholar 

  18. Despont-Gros C, Mueller H, Lovis C (2005) Evaluating user interactions with clinical information systems: A model based on human-computer interaction models. J Biomedical Informatics, 38(3):244-255.

    Article  Google Scholar 

  19. Effken JA (2002) Different lenses, improved outcomes: A new approach to the analysis and design of healthcare information systems. Int J Med Inform, 65(1):59-74.

    Article  Google Scholar 

  20. Flack V, Afifi A, Lachenbruch P, Schouten H (1988) Sample size determinations for the two rater kappa statistic. Psychometrika, 53(3):321-325.

    Article  MATH  Google Scholar 

  21. Fletcher RH, Fletcher SW (2005) Clinical epidemiology: The essentials. 4th edition. Lippincott Williams & Wilkins, Philadelphia, PA.

    Google Scholar 

  22. Friedman CP, Wyatt JC, Owens DK (2006) Evaluation and technology asessment. In: Shortliffe EH, Cimino JJ (eds) Biomedical Informatics: Computer Applications in Health Care and Biomedicine. Springer.

    Google Scholar 

  23. Graham MJ, Kubose TK, Jordan D, Zhang J, Johnson TR, Patel VL (2004) Heuristic evaluation of infusion pumps: Implications for patient safety in intensive care units. Int J Med Inform, 73(11-12):771-779.

    Article  Google Scholar 

  24. Hajdukiewicz JR, Doyle DJ, Milgram P, Vicente KJ, Burns CM (1998) A work domain analysis of patient monitoring in the operating room. Proc 42nd Annual Meeting Human Factors and Ergonomics Society, pp 1038-1042.

    Google Scholar 

  25. Hersh W (2003) Information Retrieval: A Health and Biomedical Perspective. Springer-Verlag, New York.

    Google Scholar 

  26. Hersh W, Hickam D (1998) How well do physicians use electronic information retrieval systems. JAMA, 280(15):1347-1352.

    Article  Google Scholar 

  27. Hornbæk K (2006) Current practice in measuring usability: Challenges to usability studies and research. Intl J Human-Computer Studies, 64(2):79-102.

    Article  Google Scholar 

  28. Horsthemke WH, Raicu DS, Furst JD (2008) Evaluation challenges for bridging the semantic gap: Shape disagreements on pulmonary nodules in the Lung Image Database Consortium. Intl J Healthcare Information Systems and Informatics, 4(1):17-33.

    Google Scholar 

  29. Huang X, Lin J, Demner-Fushman D (2006) Evaluation of PICO as a knowledge representation for clinical questions. Proc AMIA Annu Symp:359-363.

    Google Scholar 

  30. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Information Systems, 20(4):422-446.

    Article  Google Scholar 

  31. Kaplan B (1997) Addressing organizational issues into the evaluation of medical systems. J Am Med Inform Assoc, 4(2):94-101.

    Google Scholar 

  32. Kaplan B, Maxwell J (2005) Qualitative research methods for evaluating computer information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 30-55.

    Chapter  Google Scholar 

  33. Kernan WN, Viscoli CM, Makuch RW, Brass LM, Horwitz RI (1999) Stratified randomization for clinical trials. J Clin Epidemiol, 52(1):19-26.

    Article  Google Scholar 

  34. Kjeldskov J, Skov MB, Stage J (2008) A longitudinal study of usability in health care: Does time heal? Int J Med Inform.

    Google Scholar 

  35. Kurosu M, Kashimura K (1995) Apparent usability vs. inherent usability. Proc SIGCHI Conf Human Factors in Computing Systems, pp 292-293.

    Google Scholar 

  36. Kushniruk AW, Patel VL (2004) Cognitive and usability engineering methods for the evaluation of clinical information systems. J Biomed Inform, 37(1):56-76.

    Article  Google Scholar 

  37. Laerum H, Ellingsen G, Faxvaag A (2001) Doctors' use of electronic medical records systems in hospitals: Cross sectional survey. BMJ, 323(7325):1344-1348.

    Article  Google Scholar 

  38. Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform, 38(5):404-415.

    Article  Google Scholar 

  39. Lee F, Teich JM, Spurr CD, Bates DW (1996) Implementation of physician order entry: User satisfaction and self-reported usage patterns. J Am Med Inform Assoc, 3(1):42-55.

    Google Scholar 

  40. Lehmann TM, Guld MO, Thies C, Fischer B, Spitzer K, Keysers D, Ney H, Kohnen M, Schubert H, Wein BB (2004) Content-based image retrieval in medical applications. Methods Inf Med, 43(4):354-361.

    Google Scholar 

  41. Lewis JR (1995) IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. Intl J Human-computer Interaction, 7(1):57-78.

    Article  Google Scholar 

  42. Limbourg Q, Vanderdonckt J (2003) Comparing task models for user interface design. In: Diaper D, Stanton N (eds) The Handbook of Task Analysis for Human-Computer Interaction, pp 135-154.

    Google Scholar 

  43. Lindgaard G, Chattratichart J (2007) Usability testing: What have we overlooked? Proc SIGCHI Conf Human Factors in Computing Systems pp 1415-1424.

    Google Scholar 

  44. Loh WY, Shih YS (1997) Split selection methods for classification trees. Statistica Sinica, 7:815-840.

    MATH  MathSciNet  Google Scholar 

  45. Long LR, Antani S, Deserno T, Thoma GR (2009) Content-based image retrieval in medicine: Retrospective assessment, state of the art, and future directions. Intl J Healthcare Information Systems and Informatics, 4(1):1-17.

    Google Scholar 

  46. Maclure M (1991) The case-crossover design: A method for studying transient effects on the risk of acute events. Am J Epidemiol, 133(2):144-153.

    Google Scholar 

  47. Mayhew DJ (1999) The Usability Engineering Lifecycle: A Practitioner's Handbook for User Interface Design. Morgan Kaufmann Publishers, San Francisco, Calif.

    Google Scholar 

  48. Metz CE (2006) Receiver operating characteristic analysis: A tool for the quantitative evaluation of observer performance and imaging systems. J Am Coll Radiol, 3(6):413-422.

    Article  Google Scholar 

  49. Militello LG, Hutton RJB (1998) Applied cognitive task analysis (ACTA): A practitioner's toolkit for understanding cognitive task demands. Ergonomics, 41(11):1618-1641.

    Article  Google Scholar 

  50. Morton SC, Adams JL, Suttorp MK, Shanman R, Valentine D, Rhodes S, Shekelle PG (2004) Meta-regression approaches: What, why, when, and how? (Technical Review 04-0033). Agency for Healthcare Research and Quality, Rockville, MD.

    Google Scholar 

  51. Müller H, Clough P, Hersh B, Geissbühler A (2007) Variation of relevance assessments for medical image retrieval. In: Marchand-Maillet S, Bruno E, Nurnberger A, Detyniecki M (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback (LNCS). Springer, pp 232-246.

    Chapter  Google Scholar 

  52. Müller H, Deselaers T, Deserno T, Kalpathy-Cramer J, Kim E, Hersh W (2007) Overview of the ImageCLEF 2007 medical retrieval and annotation tasks. Advances in Multilingual and Multimodal Information Retrieval: Proc 8th Workshop Cross-Language Evaluation Forum (CLEF), Budapest, Hungary, pp 472-491.

    Google Scholar 

  53. Müller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform, 73(1):1-23.

    Article  Google Scholar 

  54. Müller H, Rosset A, Vallée J, Terrier F, Geissbuhler A (2004) A reference data set for the evaluation of medical image retrieval systems. Comp Med Imaging and Graphics, 28(6):295-305.

    Article  Google Scholar 

  55. Murff HJ, Kannry J (2001) Physician satisfaction with two order entry systems. J Am Med Inform Assoc, 8(5):499-509.

    Google Scholar 

  56. Nielsen J (1993) Usability Engineering. Academic Press, Boston.

    MATH  Google Scholar 

  57. Nielsen J (1994) Heuristic evaluation. In: Nielsen J, Mack RL (eds) Usability Inspection Methods. Wiley, New York.

    Google Scholar 

  58. Obuchowski NA (2003) Receiver operating characteristic curves and their use in radiology. Radiology, 229(1):3-8.

    Article  Google Scholar 

  59. Obuchowski NA (2005) ROC analysis. Am. J. Roentgenol., 184(2):364-372.

    Google Scholar 

  60. Pampel FC (2000) Logistic Regression: A Primer Sage Publications, Thousand Oaks, CA.

    MATH  Google Scholar 

  61. Quinlan JR (1986) Induction of decision trees. Machine Learning, 1(1):81-106.

    Google Scholar 

  62. Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artificial Intelligence, 4:77-90.

    MATH  Google Scholar 

  63. Rose AF, Schnipper JL, Park ER, Poon EG, Li Q, Middleton B (2005) Using qualitative studies to improve the usability of an EMR. J Biomedical Informatics, 38(1):51-60.

    Article  Google Scholar 

  64. Rosenberger WF, Lachin JM (2002) Randomization in Clinical Trials: Theory and practice. Wiley, New York, NY.

    Book  MATH  Google Scholar 

  65. Salton G, Lesk M (1965) The SMART automatic document retrieval systems - An illustration. Communications of the ACM, 8(6):391-398.

    Article  Google Scholar 

  66. Salton G, Wong A, C.S. Y (1975) A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620.

    Article  MATH  Google Scholar 

  67. Schamber L, Eisenberg M, Nilan M (1990) A re-examination of relevance: Toward a dynamic, situational definition. Information Processing and Management, 26(6):755-776.

    Article  Google Scholar 

  68. Shneiderman B, Plaisant C (2004) Designing the User Interface: Strategies for Effective Human-Computer Interaction. 4th edition. Pearson/Addison Wesley, Boston.

    Google Scholar 

  69. Shyu CR, Brodley C, Kak A, Kosaka A, Aisen A, Broderick L (1999) ASSERT: A physician-in-the-loop content-based retrieval system for HRCT image databases. Computer Vision and Image Understanding, 75(1-2):111-132.

    Article  Google Scholar 

  70. Sittig DF, Kuperman GJ, Fiskio J (1999) Evaluating physician satisfaction regarding user interactions with an electronic medical record system. Proc AMIA Symp:400-404.

    Google Scholar 

  71. Snyder C (2006) Bias in usability testing. Accessed February 19, 2009.

    Google Scholar 

  72. Stein C (1945) A two-sample test for a linear hypothesis whose power is independent of the variance. Ann Math Stat, 16:243-258.

    Article  MATH  Google Scholar 

  73. Stoicu-Tivadar L, Stoicu-Tivadar V (2006) Human-computer interaction reflected in the design of user interfaces for general practitioners. Int J Med Inform, 75(3-4):335-342.

    Article  Google Scholar 

  74. Tagare H, Jaffe C, Duncan J (1997) Medical image databases: A content-based retrieval approach. J Am Med Inform Assoc, 4:184-198.

    Google Scholar 

  75. Talmon J, Enning J, Castaneda G, Eurlings F, Hoyer D, Nykanen P, Sanz F, Thayer C, Vissers M (1999) The VATAM guidelines. Int J Med Inform, 56(1-3):107-115.

    Article  Google Scholar 

  76. Tang Z, Johnson TR, Tindall RD, Zhang J (2006) Applying heuristic evaluation to improve the usability of a telemedicine system. Telemed J E Health, 12(1):24-34.

    Article  Google Scholar 

  77. Taylor RS (1962) The process of asking questions. American Documentation, 13(4):391-396.

    Article  Google Scholar 

  78. Tractinsky N, Katz AS, Ikar D (2000) What is beautiful is usable. Interact Comp, 13(2):127-145.

    Article  Google Scholar 

  79. Vicente KJ (1999) Cognitive Work Analysis: Toward Safe, Productive, and Healthy Computer-based Work. Lawrence Erlbaum Associates, Mahwah, NJ.

    Google Scholar 

  80. Virzi RA (1992) Refining the test phase of usability evaluation: How many subjects is enough? Human Factors, 34(4):457-468.

    Google Scholar 

  81. Wittes J, Brittain E (1990) The role of internal pilot studies in increasing the efficiency of clinical trials. Stat Med, 9:65-72.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Watt, E., Arnold, C., Sayre, J. (2010). Evaluation. In: Bui, A., Taira, R. (eds) Medical Imaging Informatics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0385-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0385-3_10

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0384-6

  • Online ISBN: 978-1-4419-0385-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics