Skip to main content

Enrichissement des bases de connaissances en biologie par extraction de marqueurs de confiance dans la littérature scientifique

  • Conference paper
Risques, Technologies de l’Information pour les Pratiques Médicales

Part of the book series: Informatique et Santé ((INFORMATIQUE,volume 17))

  • 452 Accesses

Abstract

The characterization of biomedical knowledge, taking into account the degree of confidence expressed in texts by authors themselves or given by other external hints carried out by the impact factor of the journal, or the study type for example, is an important issue in the biomedical domain. The authors of scientific texts use grammatical and lexical devices to qualify their assertions, voluntarily or not. We named these markers of qualification “confidence markers”. We present here the results of our efforts to collect confidence markers (often associated to epistemic modality) from full texts and abstracts, to classify them on the basis of semantics, and their use within a knowledge extraction system. We propose in this study, an implementation of these conjidence markers for functional annotation of the human gene Apolipoprotein (APOE) thought to be involved in Alzheimer’s disease. As a result, we obtain, through the extraction system, triplets: (G, F, PMID), in which G is the gene APOE, F is its function found in texts and the PMID of the article from which this knowledge was extracted. Moreover, a multidimensional projection space is proposed for representing the extracted knowledge depending on confidence criteria associated with it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Références

  1. Koehler J. Editorial. Briefings in Bioinformatics 2005; 6(3): 220–221.

    Article  Google Scholar 

  2. Rice SB, Nenadic G, Stapley BJ. Mining protein function from text using term-based support vector machines. BUC Bioinformatics 2005; 6(Suppl 1): S22.

    Article  Google Scholar 

  3. Krallinger M. Predon M, Valencia A. A sentence sliding window approach to extract protein annotations from biomedical articles. BUC Bioinformatics 2005; 6(Suppl 1): S19.

    Article  Google Scholar 

  4. Ono T, Hishigaki H, Tanigami A, Takagi T. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001; 17(2): 155–161.

    Article  CAS  PubMed  Google Scholar 

  5. Blaschke C, Leon EA, Krallinger M, Valencia A. Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 2005; 6(Suppl 1): S16.

    Article  Google Scholar 

  6. Hersh W. Report on TREC 2003 genomics track first-year results and future plans. ACM SIGIR Forum 2004; 38(1): 69–72

    Article  Google Scholar 

  7. Jilani I, Grabar N, Jaulent MC. Fitting the finite-state automata platform for mining gene functions from biological scientific literature. In SMBM (Symposium on Semantic Mining in Biomedicine), Germany, 2006.

    Google Scholar 

  8. Camon EB, Barrell DG, Dinuner EC, Lee V, Magrane M, Maslen J, Binns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 2005; 6(Suppl 1): S17.

    Article  Google Scholar 

  9. Coates J. The Semantics of the Modoi Auxiliaries. Routledge 1983, 259 p.

    Google Scholar 

  10. Le Querler N. Typalogie des Modalités. Presses Universitaires de Caen 1996.

    Google Scholar 

  11. Lakoff G. Hedges: A study in meaning criteria and the logic of fuzzy concepts, Journal of Philosophical Logic 1972; 2, 458–508.

    Google Scholar 

  12. Hyland K. Talking to the Academy: Forms of Hedging in Science Research Articles, Written Communication 1996; 13, 251–281.

    Article  Google Scholar 

  13. Light M, et al. The Language of Bioscience: Facts, Speculations, and Statements In Between. HLT-NAACL 2004 Workshop: BioLINK 2004, Linking Biological Literature, Ontologies and Databases. Association for Computational Linguistics, Boston, Massachusetts, USA, 2004; 17–24.

    Google Scholar 

  14. Mercer RE, Di Marco C. A Design Methodology for a Biomedical Literature Indexing Tool Using the Rhetoric of Science. HLT-NAACL 2004 Workshop: BioLINK, Linking Biological Literature, Ontologies and Databases. Association for Computational Linguistics, Boston, Massachusetts, USA, 2004; 77–84.

    Google Scholar 

  15. Mizuta Y, et al. Zone analysis in biology articles as a basis for information extraction, Int J Med Inform 2006; 75, 468–487.

    Article  PubMed  Google Scholar 

  16. Teufel S, and Moens M. Summarizing scientific articles: experiments with relevance and rhetorical status, Computational Linguistics 2002; 28, 409–445.

    Article  Google Scholar 

  17. Rizomilioti V. Exploring Epistemic Modality in Academic Discourse Using Corpora. In US, S. (ed), Information Technology in Languages for Specific Purposes. 2006; 53–71.

    Google Scholar 

  18. Grabar N, Jaulent MC, Chambaz A, Lefebvre C, Neri C. Sifting abstracts from Medline and evaluating their relevance to molecular biology. Stud Health Technol Inform 2006; 124: 111–6.

    PubMed  Google Scholar 

  19. Lefebvre C, Aude JC, Glemet E, Neri C. Balancing protein similarity and gene co-expression reveals new links between genetic conservation and developmental diversity in invertebrates. Bioinformatics 2005; 21(8): 1550–8

    Article  CAS  PubMed  Google Scholar 

  20. WordNet, An Electronic Lexical Database, C. Fellbaum ed., The MIT Press, Cambridge, Mass, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ines Jilani .

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag France

About this paper

Cite this paper

Jilani, I., Janlent, MC. (2009). Enrichissement des bases de connaissances en biologie par extraction de marqueurs de confiance dans la littérature scientifique. In: Risques, Technologies de l’Information pour les Pratiques Médicales. Informatique et Santé, vol 17. Springer, Paris. https://doi.org/10.1007/978-2-287-99305-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-2-287-99305-3_11

  • Publisher Name: Springer, Paris

  • Print ISBN: 978-2-287-99304-6

  • Online ISBN: 978-2-287-99305-3

Publish with us

Policies and ethics