Abstract
The characterization of biomedical knowledge, taking into account the degree of confidence expressed in texts by authors themselves or given by other external hints carried out by the impact factor of the journal, or the study type for example, is an important issue in the biomedical domain. The authors of scientific texts use grammatical and lexical devices to qualify their assertions, voluntarily or not. We named these markers of qualification “confidence markers”. We present here the results of our efforts to collect confidence markers (often associated to epistemic modality) from full texts and abstracts, to classify them on the basis of semantics, and their use within a knowledge extraction system. We propose in this study, an implementation of these conjidence markers for functional annotation of the human gene Apolipoprotein (APOE) thought to be involved in Alzheimer’s disease. As a result, we obtain, through the extraction system, triplets: (G, F, PMID), in which G is the gene APOE, F is its function found in texts and the PMID of the article from which this knowledge was extracted. Moreover, a multidimensional projection space is proposed for representing the extracted knowledge depending on confidence criteria associated with it.
Preview
Unable to display preview. Download preview PDF.
Références
Koehler J. Editorial. Briefings in Bioinformatics 2005; 6(3): 220–221.
Rice SB, Nenadic G, Stapley BJ. Mining protein function from text using term-based support vector machines. BUC Bioinformatics 2005; 6(Suppl 1): S22.
Krallinger M. Predon M, Valencia A. A sentence sliding window approach to extract protein annotations from biomedical articles. BUC Bioinformatics 2005; 6(Suppl 1): S19.
Ono T, Hishigaki H, Tanigami A, Takagi T. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 2001; 17(2): 155–161.
Blaschke C, Leon EA, Krallinger M, Valencia A. Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 2005; 6(Suppl 1): S16.
Hersh W. Report on TREC 2003 genomics track first-year results and future plans. ACM SIGIR Forum 2004; 38(1): 69–72
Jilani I, Grabar N, Jaulent MC. Fitting the finite-state automata platform for mining gene functions from biological scientific literature. In SMBM (Symposium on Semantic Mining in Biomedicine), Germany, 2006.
Camon EB, Barrell DG, Dinuner EC, Lee V, Magrane M, Maslen J, Binns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 2005; 6(Suppl 1): S17.
Coates J. The Semantics of the Modoi Auxiliaries. Routledge 1983, 259 p.
Le Querler N. Typalogie des Modalités. Presses Universitaires de Caen 1996.
Lakoff G. Hedges: A study in meaning criteria and the logic of fuzzy concepts, Journal of Philosophical Logic 1972; 2, 458–508.
Hyland K. Talking to the Academy: Forms of Hedging in Science Research Articles, Written Communication 1996; 13, 251–281.
Light M, et al. The Language of Bioscience: Facts, Speculations, and Statements In Between. HLT-NAACL 2004 Workshop: BioLINK 2004, Linking Biological Literature, Ontologies and Databases. Association for Computational Linguistics, Boston, Massachusetts, USA, 2004; 17–24.
Mercer RE, Di Marco C. A Design Methodology for a Biomedical Literature Indexing Tool Using the Rhetoric of Science. HLT-NAACL 2004 Workshop: BioLINK, Linking Biological Literature, Ontologies and Databases. Association for Computational Linguistics, Boston, Massachusetts, USA, 2004; 77–84.
Mizuta Y, et al. Zone analysis in biology articles as a basis for information extraction, Int J Med Inform 2006; 75, 468–487.
Teufel S, and Moens M. Summarizing scientific articles: experiments with relevance and rhetorical status, Computational Linguistics 2002; 28, 409–445.
Rizomilioti V. Exploring Epistemic Modality in Academic Discourse Using Corpora. In US, S. (ed), Information Technology in Languages for Specific Purposes. 2006; 53–71.
Grabar N, Jaulent MC, Chambaz A, Lefebvre C, Neri C. Sifting abstracts from Medline and evaluating their relevance to molecular biology. Stud Health Technol Inform 2006; 124: 111–6.
Lefebvre C, Aude JC, Glemet E, Neri C. Balancing protein similarity and gene co-expression reveals new links between genetic conservation and developmental diversity in invertebrates. Bioinformatics 2005; 21(8): 1550–8
WordNet, An Electronic Lexical Database, C. Fellbaum ed., The MIT Press, Cambridge, Mass, 1998.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag France
About this paper
Cite this paper
Jilani, I., Janlent, MC. (2009). Enrichissement des bases de connaissances en biologie par extraction de marqueurs de confiance dans la littérature scientifique. In: Risques, Technologies de l’Information pour les Pratiques Médicales. Informatique et Santé, vol 17. Springer, Paris. https://doi.org/10.1007/978-2-287-99305-3_11
Download citation
DOI: https://doi.org/10.1007/978-2-287-99305-3_11
Publisher Name: Springer, Paris
Print ISBN: 978-2-287-99304-6
Online ISBN: 978-2-287-99305-3