CONAN: An Integrative System for Biomedical Literature Mining

Malik, Rainer; Siebes, Arno

doi:10.1007/11595014_25

Rainer Malik²¹ &
Arno Siebes²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3808))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

1459 Accesses
2 Citations

Abstract

The amount of information about the genome, transcriptome and proteome, forms a problem for the scientific community: how to find the right information in a reasonable amount of time. Most research aiming to solve this problem, however, concentrate on a certain organism or a very limited dataset. Complementary to those algorithms, we developed CONAN, a system which provides a full-scale approach, tailored to experimentalists, designed to combine several information extraction methods and connect the outcome of these methods to gather novel information. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts, linking to MeSH and Gene Ontology terms, which can all be found back by querying the system. We present a full-scale approach that will ultimately cover all of PubMed/MEDLINE. We show that this universality has no effect on quality: our system performs as well as existing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rebholz-Schuhmann, D., Kirsch, H., Couto, F.: Facts from text–is text mining ready to deliver? PLoS Biol. 3, e65 (2005)
Article Google Scholar
Krallinger, M., Valencia, A.: Text-mining and information-retrieval services for molecular biology. Genome Biol 6, 224 (2005)
Article Google Scholar
Canese, K., Jentsch, J., Myers, C.: The NCBI Handbook. National Center for Biotechnology Information (2003)
Google Scholar
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)
Article Google Scholar
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32, 262–266 (2004)
Article Google Scholar
Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, R., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)
Article Google Scholar
Birney, E., Andrews, T.D., Bevan, B., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cuff, J., Curwen, V., Cutts, T., Down, T., Eyras, E., Fernandez-Suarez, X., Gzane, P., Gibbins, B., Gilbert, J., Hammond, M., Hotz, H., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Keenan, S., Lehvaslaiho, H., McVicker, G., Melsopp, C., Meidl, P., Mongin, E., Pettett, R., Potter, S., Proctor, G., Rae, M., Searle, S., Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey, R., Ureta-Vidal, A.: An Overview of Ensembl. Genome Res. 14, 925–928 (2004)
Article Google Scholar
Krauthammer, M., Rzhetsky, A., Morozov, P., Friedman, C.: Using BLAST for identifying gene and protein names in journal articles. Gene. 259, 245–252 (2000)
Article Google Scholar
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, 267–270 (2004)
Article Google Scholar
Tanabe, L., Wilbur, W.: Tagging gene and protein names in biomedical text. Bioinformatics 18, 1124–1132 (2002)
Article Google Scholar
Horn, F., Lau, A., Cohen, F.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20, 557–568 (2004)
Article Google Scholar
Mika, S., Rost, B.: Protein names precisely peeled off free text. Bioinformatics 20, I241–I247 (2004)
Article Google Scholar
Mika, S., Rost, B.: NLProt: extracting protein names and sequences from papers. Nucleic Acids Res. 32, W634–W637 (2004)
Article Google Scholar
Donaldson, I., Martin, J., de Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S., Baskin, B., Bader, G., Michalickova, K., Pawson, T., Hogue, C.: PreBIND and Textomy–mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 4, 11 (2003)
Article Google Scholar
Bader, G., Betel, D., Hogue, C.: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250 (2003)
Article Google Scholar
Xenarios, I., Salwinski, L., Duan, X., Higney, P., Kim, S., Eisenberg, D.: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002)
Article Google Scholar
Chen, H., Sharp, B.: Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5, 147 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Information and Computing Sciences, Universiteit Utrecht, PO Box 80.089, 3508TB, Utrecht, The Netherlands
Rainer Malik & Arno Siebes

Authors

Rainer Malik
View author publications
You can also search for this author in PubMed Google Scholar
Arno Siebes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Portugal Telecom Inovação (PTI), Centro de Informatica e Sistemas da Universidade de Coimbra (CISUC),
Carlos Bento
Department of Informatics Engineering, Coimbra University, Portugal
Amílcar Cardoso
Centre of Human Language Technology and Bioinformatics, University of Beira Interior, 6201-001, Covilhã, Portugal
Gaël Dias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malik, R., Siebes, A. (2005). CONAN: An Integrative System for Biomedical Literature Mining. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_25

Download citation

DOI: https://doi.org/10.1007/11595014_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30737-2
Online ISBN: 978-3-540-31646-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics