Skip to main content

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 6990))

Abstract

The large amount and diversity of available biomedical information has put a high demand on existing search systems. Such a tool should be able to not only retrieve the sought information, but also filter out irrelevant documents, while giving the relevant ones the highest ranking. Focusing on biomedical information, this work investigates how to improve the ability for a system to find and rank relevant documents. To achieve this goal, we apply a series of information retrieval techniques to search in biomedical information and combine them in an optimal manner. These techniques include extending and using well-established information retrieval (IR) similarity models such as the Vector Space Model (VSM) and BM25 and their underlying scoring schemes. The techniques also allow users to affect the ranking according to their view of relevance. The techniques have been implemented and tested in a proof-of-concept prototype called BioTracer, which extends a Java-based open source search engine library. The results from our experiments using the TREC 2004 Genomic Track collection are promising. Our investigation have also revealed that involving the user in the search process will indeed have positive effects on the ranking of search results, and that the approaches used in BioTracer can be used to meet the user’s information needs.

This article is a revised and an extended version of the ITBAM 2010 paper [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ramampiaro, H.: BioMedical information retrieval: The bioTracer approach. In: Khuri, S., Lhotská, L., Pisanti, N. (eds.) ITBAM 2010. LNCS, vol. 6266, pp. 143–157. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  2. Krauthammer, M., Nenadic, G.: Term identification in the biomedical literature. Journal of Biomedical Informatics 37(6), 512–526 (2004)

    Article  Google Scholar 

  3. Chen, L., Liu, H., Friedman, C.: Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21(2), 248–256 (2005)

    Article  Google Scholar 

  4. Netzel, R., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: The way we write. EMBO Reports 4(5), 446–451 (2003)

    Article  Google Scholar 

  5. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)

    Google Scholar 

  6. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  7. Croft, B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, 1st edn. Addison-Wesley, Reading (2009)

    Google Scholar 

  8. Lowe, H.J., Barnett, G.O.: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. JAMA 271(14), 1103–1108 (1994)

    Article  Google Scholar 

  9. Trieschnigg, D., Kraaij, W., de Jong, F.: The influence of basic tokenization on biomedical document retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, p. 803 (2007)

    Google Scholar 

  10. Jiang, J., Zhai, C.: An empirical study of tokenization strategies for biomedical information retrieval. Information Retrieval 10(4-5), 341–363 (2007)

    Article  Google Scholar 

  11. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2011)

    Google Scholar 

  12. Hatcher, E., Gospodnetic, O.: Lucene in Action. Manning Publications Co., 209 Bruce Park Ave., Greenwich, CT 06830 (2005)

    Google Scholar 

  13. Robertson, S.E., Jones, K.S.: Simple proven approaches to text retrieval. Technical Report 356, University of Cambridge (1994)

    Google Scholar 

  14. Zhai, C.: Notes on the lemur TFIDF model. note with lemur 1.9 documentation. Technical report, School of CS, CMU (2001)

    Google Scholar 

  15. Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems 20(4), 357–389 (2002)

    Article  Google Scholar 

  16. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 275–281. ACM, New York (1998)

    Google Scholar 

  17. Wilkinson, R.: Effective retrieval of structured documents. In: Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994), pp. 311–317. Springer-Verlag New York, Inc., New York (1994)

    Google Scholar 

  18. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 42–49. ACM, Washington, D.C., USA (2004)

    Chapter  Google Scholar 

  19. Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(1), 57–71 (2005)

    Article  Google Scholar 

  20. Leser, U., Hakenberg, J.: What makes a gene name? Named entity recognition in the biomedical literature. Briefings in Bioinformatics 6(4), 357 (2005)

    Article  Google Scholar 

  21. Kabiljo, R., Clegg, A., Shepherd, A.: A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics 10(1), 233 (2009)

    Article  Google Scholar 

  22. Johannsson, D.V.: Biomedical information retrieval based on document-level term boosting. Master’s thesis, Norwegian University of Science and Technology (NTNU) (2009)

    Google Scholar 

  23. Herskovic, J., Tanaka, L., Hersh, W., Bernstam, E.: A day in the life of PubMed: Analysis of a typical days query log. Journal of the American Medical Informatics Association 14(2), 212–220 (2007)

    Article  Google Scholar 

  24. Hersh, W.R., Bhupatiraju, R.T., Ross, L., Roberts, P., Cohen, A.M., Kraemer, D.F.: Enhancing access to the bibliome: the trec 2004 genomics track. Journal of Biomedical Discovery and Collaboration 1(3), 10 (2006)

    Google Scholar 

  25. Yilmaz, E., Aslam, J.A.: Estimating average precision when judgments are incomplete. Knowledge and Information Systems 16(2), 173–211 (2008)

    Article  Google Scholar 

  26. Voorhees, E.M.: On test collections for adaptive information retrieval. Inf. Process. Manage. 44(6), 1879–1885 (2008)

    Article  Google Scholar 

  27. Käki, M., Aula, A.: Controlling the complexity in comparing search user interfaces via user studies. Information Processing and Management 44(1), 82–91 (2008); Evaluation of Interactive Information Retrieval Systems

    Article  Google Scholar 

  28. Kelly, D., Harper, D.J., Landau, B.: Questionnaire mode effects in interactive information retrieval experiments. Information Processing and Management 44(1), 122–141 (2008); Evaluation of Interactive Information Retrieval Systems

    Article  Google Scholar 

  29. Abdou, S., Savoy, J.: Searching in Medline: Query expansion and manual indexing evaluation. Information Processing & Management 44(2), 781–789 (2008)

    Article  Google Scholar 

  30. Eaton, A.D.: Hubmed: a web-based biomedical literature search interface. Nucleic Acids Research 34(Web Server issue), W745–W747 (2006)

    Article  Google Scholar 

  31. Wang, J., Cetindil, I., Ji, S., Li, C., Xie, X., Li, G., Feng, J.: Interactive and fuzzy search: a dynamic way to explore medline. Bioinformatics 26(18), 2321–2327 (2010)

    Article  Google Scholar 

  32. Muller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11), e309 (2004)

    Article  Google Scholar 

  33. Divoli, A., Attwood, T.K.: BioIE: extracting informative sentences from the biomedical literature. Bioinformatics 21, 2138–2139 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ramampiaro, H., Li, C. (2011). Supporting BioMedical Information Retrieval: The BioTracer Approach. In: Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. Lecture Notes in Computer Science, vol 6990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23740-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23740-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23739-3

  • Online ISBN: 978-3-642-23740-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics