Skip to main content

Overview of the ShARe/CLEF eHealth Evaluation Lab 2013

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8138))

Abstract

Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have difficulties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-specific idioms. This paper reports on an evaluation lab with an aim to support the continuum of care by developing methods and resources that make clinical reports in English easier to understand for patients, and which helps them in finding information related to their condition. This ShARe/CLEFeHealth2013 lab offered student mentoring and shared tasks: identification and normalisation of disorders (1a and 1b) and normalisation of abbreviations and acronyms (2) in clinical reports with respect to terminology standards in healthcare as well as information retrieval (3) to address questions patients may have when reading clinical reports. The focus on patients’ information needs as opposed to the specialised information needs of physicians and other healthcare workers was the main feature of the lab distinguishing it from previous shared tasks. De-identified clinical reports for the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Task 3 were from the Internet and originated from the Khresmoi project. Task 1 annotations originated from the ShARe annotations. For Tasks 2 and 3, new annotations, queries, and relevance assessments were created. 64, 56, and 55 people registered their interest in Tasks 1, 2, and 3, respectively. 34 unique teams (3 members per team on average) participated with 22, 17, 5, and 9 teams in Tasks 1a, 1b, 2 and 3, respectively. The teams were from Australia, China, France, India, Ireland, Republic of Korea, Spain, UK, and USA. Some teams developed and used additional annotations, but this strategy contributed to the system performance only in Task 2. The best systems had the F1 score of 0.75 in Task 1a; Accuracies of 0.59 and 0.72 in Tasks 1b and 2; and Precision at 10 of 0.52 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravicius, V., Hassel, M., Kokkinakis, D., Lundgren-Laine, H., Nilsson, G., Nytro, O., Salanterä, S., Skeppstedt, M., Suominen, H., Velupillai, S.: Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics 2(suppl. 3), S1 (2011)

    Article  Google Scholar 

  2. Suominen, H. (ed.): The Proceedings of the CLEFeHealth2012 — the CLEF 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis. NICTA (2012)

    Google Scholar 

  3. Fox, S.: Health Topics: 80% of internet users look for health information online. Technical report, Pew Research Center (February 2011)

    Google Scholar 

  4. Kummervold, P., Chronaki, C., Lausen, B., Prokosch, H., Rasmussen, J., Santana, S., Staniszewski, A., Wangberg, S.: eHealth trends in Europe 2005–2007: A population-based survey. Journal of Medical Internet Research 10(4), e42 (2008)

    Article  Google Scholar 

  5. Experian Hitwise: Google Receives 87.81 Percent of Australian Searches in June 2008 (2008), http://www.hitwise.com/au/press-centre/press-releases/2008/ap-google-searches-for-june/

  6. Pradhan, S., Elhadad, N., South, B., Martinez, D., Christensen, L., Vogel, A., Suominen, H., Chapman, W., Savova, G.: Task 1: ShARe/CLEF eHealth Evaluation Lab 2013. In: Online Working Notes of CLEF, CLEF (2013)

    Google Scholar 

  7. Mowery, D., South, B., Christensen, L., Murtola, L., Salanterä, S., Suominen, H., Martinez, D., Elhadad, N., Pradhan, S., Savova, G., Chapman, W.: Task 2: ShARe/CLEF eHealth Evaluation Lab 2013. In: Online Working Notes of CLEF, CLEF (2013)

    Google Scholar 

  8. Goeuriot, L., Jones, G., Kelly, L., Leveling, J., Hanbury, A., Müller, H., Salanterä, S., Suominen, H., Zuccon, G.: ShARe/CLEF eHealth Evaluation Lab 2013, Task 3: Information retrieval to address patients’ questions when reading clinical reports. In: Online Working Notes of CLEF, CLEF (2013)

    Google Scholar 

  9. Becker, H.: Computerization of patho-histological findings in natural language. Pathologia Europaea 7(2), 193–200 (1972)

    Google Scholar 

  10. Anderson, B., Bross, I., Sager, N.: Grammatical compression in notes and records: Analysis and computation. American Journal of Computational Linguistics 2(4), 68–82 (1975)

    Google Scholar 

  11. Hirschman, L., Grishman, R., Sager, N.: From text to structured information: automatic processing of medical reports. In: American Federation of Information Processing Societies: 1976 National Computer Conference. AFIPS Conference Proceedings, vol. 45, pp. 267–275. Association for Computational Linguistics, New York (1976)

    Google Scholar 

  12. Collen, M.: Patient data acquisition. Medical Instrumentation 12, 222–225 (1978)

    Google Scholar 

  13. Sarkar, I.: Biomedical informatics and translational medicine. Journal of Translational Medicine 8, 22 (2010) (review)

    Article  Google Scholar 

  14. Demner-Fushman, D., Chapman, W., McDonald, C.: What can natural language processing do for clinical decision support? Journal of Biomedical Informatics 42(5), 760–772 (2009) (review)

    Article  Google Scholar 

  15. Meystre, S., Savova, G., Kipper-Schuler, K., Hurdle, J.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of Medical Informatics, 128–144 (2008) (review)

    Google Scholar 

  16. Reiner, B., Knight, N., Siegel, E.: Radiology reporting, past, present, and future: the radiologist’s perspective. Journal of the American College of Radiology: JACR 4(5), 313–319 (2007) (review)

    Article  Google Scholar 

  17. Suominen, H., Lehtikunnas, T., Back, B., Karsten, H., Salakoski, T., Salanterä, S.: Applying language technology to nursing documents: pros and cons with a focus on ethics. International Journal of Medical Informatics 76(suppl. 2), S293–S301 (2007) (review)

    Google Scholar 

  18. Zweigenbaum, P., Demner-Fushman, D., Yu, H., Cohen, K.: Frontiers of biomedical text mining: current progress. Briefings in Bioinformatics 8(5), 358–375 (2007) (review)

    Article  Google Scholar 

  19. Mendonça, E., Haas, J., Shagina, L., Larson, E., Friedman, C.: Extracting information on pneumonia in infants using natural language processing of radiology reports. Journal of Biomedical Informatics 38(4), 314–321 (2005)

    Article  Google Scholar 

  20. Pakhomov, S., Buntrock, J., Chute, C.: Automating the assignment of diagnosis codes to patient encounters using example based and machine learning techniques. Journal of the American Medical Informatics Association: JAMIA 13(5), 516–525 (2006)

    Article  Google Scholar 

  21. Chapman, W., Nadkarni, P., Hirschman, L., D’Avolio, L., Savova, G., Uzuner, Ö.: Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association: JAMIA 18, 540–543 (2011) (editorial)

    Article  Google Scholar 

  22. Robertson, S., Hull, D.: The TREC-9 filtering track final report. In: NIST Special Publication 500-249: The 9th Text REtrieval Conference (TREC 9), pp. 25–40 (2000)

    Google Scholar 

  23. Roberts, P.M., Cohen, A.M., Hersh, W.R.: Tasks, topics and relevance judging for the TREC genomics track: five years of experience evaluating biomedical text information retrieval systems. Information Retrieval 12, 81–97 (2009)

    Article  Google Scholar 

  24. Voorhees, E.M., Tong, R.M.: Overview of the TREC 2011 medical records track. In: Proceedings of TREC, NIST (2011)

    Google Scholar 

  25. Kalpathy-Cramer, J., Müller, H., Bedrick, S., Eggel, I., de Herrera, A., Tsikrika, T.: The CLEF 2011 medical image retrieval and classification tasks. In: Working Notes of CLEF 2011 (Cross Language Evaluation Forum) (2011)

    Google Scholar 

  26. Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds.): Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series, vol. 32. Springer (2010)

    Google Scholar 

  27. Uzuner, Ö., South, B., Shen, S., DuVall, S.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association: JAMIA 18, 552–556 (2011)

    Article  Google Scholar 

  28. Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K., Duch, W.: A shared task involving multi-label classification of clinical free text. In: BioNLP Workshop of the Association for Computational Linguistics, pp. 97–104. Association for Computational Linguistics (2007)

    Google Scholar 

  29. Pestian, J., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, Ö., Wiebe, J., Cohen, K., Hurdle, J., Brew, C.: Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights 5(suppl. 1), 3–16 (2012)

    Article  Google Scholar 

  30. Boyer, C., Gschwandtner, M., Hanbury, A., Kritz, M., Pletneva, N., Samwald, M., Vargas, A.: Use case definition including concrete data requirements (D8.2). public deliverable, Khresmoi EU project (2012)

    Google Scholar 

  31. Hanbury, A., Müller, H.: Khresmoi – multimodal multilingual medical information search. In: MIE Village of the Future (2012)

    Google Scholar 

  32. Bodenreider, O., McCray, A.: Exploring semantic groups through visual approaches. Journal of Biomedical Informatics 36, 414–432 (2003)

    Article  Google Scholar 

  33. South, B.R., Shen, S., Leng, J., Forbush, T.B., DuVall, S.L., Chapman, W.W.: A prototype tool set to support machine-assisted annotation. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, BioNLP 2012, pp. 130–139. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  34. Goeuriot, L., Kelly, L., Jones, G., Zuccon, G., Suominen, H., Hanbury, A., Müller, H., Leveling, J.: Creation of a New Evaluation Benchmark for Information Retrieval Targeting Patient Information Needs. In: Song, R., Webber, W., Kando, N., Kishida, K. (eds.) Proceedings of the 5th International Workshop on Evaluating Information Access (EVIA), A Satellite Workshop of the NTCIR-10 Conference. National Institute of Informatics/Kijima Printing, Tokyo/Fukuoka (2013)

    Google Scholar 

  35. Koopman, B., Zuccon, G.: Relevation! an open source system for information retrieval relevance assessment. arXiv preprint (2013)

    Google Scholar 

  36. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  37. Robertson, S.E., Jones, S.: Simple, proven approaches to text retrieval. Technical Report 356, University of Cambridge (1994)

    Google Scholar 

  38. Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th Conference on Computational Linguistics (COLING), Saarbrücken, Germany, pp. 947–953 (2000)

    Google Scholar 

  39. Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 623–632. Association for Computing Machinery, New York (2007)

    Google Scholar 

  40. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Suominen, H. et al. (2013). Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visualization. CLEF 2013. Lecture Notes in Computer Science, vol 8138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40802-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40802-1_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40801-4

  • Online ISBN: 978-3-642-40802-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics