Text Mining of Full Text Articles and Creation of a Knowledge Base for Analysis of Microarray Data

Bremer, Eric G.; Natarajan, Jeyakumar; Zhang, Yonghong; DeSesa, Catherine; Hack, Catherine J.; Dubitzky, Werner

doi:10.1007/978-3-540-30478-4_8

Eric G. Bremer²¹,
Jeyakumar Natarajan²²,
Yonghong Zhang²¹,
Catherine DeSesa²³,
Catherine J. Hack²² &
…
Werner Dubitzky²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3303))

Included in the following conference series:

International Symposium on Knowledge Exploration in Life Science Informatics

348 Accesses
5 Citations

Abstract

Automated extraction of information from biological literature promises to play an increasingly important role in text-based knowledge discovery processes. This is particularly true in regards to high throughput approaches such as microarrays and combining data from different sources in a systems biology approach. We have developed an integrated system that combines protein/gene name dictionaries, synonymy dictionaries, natural language processing, and pattern matching rules to extract and organize gene relationships from full text articles. In the first phase full text articles were collected from 20 peer-reviewed journals in the field of molecular biology and biomedicine over the last 5 years (1999-2003). The extracted relationships were organized in a database that included the unique PubMed ID and section id (abstract, introduction, materials and method, and results and discussion) to identify the source article and section from which concepts were extracted. The system architecture, its uniqueness and advantages are presented in this paper. It is hoped that the resulting knowledge base will assist in the understanding of gene lists generated from microarray experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

National Library of Medicine’s bibliographic database at http://www.ncbi.nlm.nih.gov
Cowie, J., Lehnert, W.: Information extraction. Communications of the ACM 39, 80–91 (1996)
Article Google Scholar
Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Towards Information Extraction: identifying protein names from biological papers. Pacific Symposium on Biocomputing, 707–718 (1998)
Google Scholar
Eriksson, G., Franzen, K., Olsson, F.: Exploiting syntax when detecting protein names in text. In: Workshop on natural language processing in Biomedical Applications (2002)
Google Scholar
Narayanaswamy, M., Ravikumar, K.E., Vijay-shankar, K.: A Biological Named Enitity Recognizer. Pacific Symposium on Biocomputing 8, 427–438 (2003)
Google Scholar
Krauthammer, M., Rzhetsky, A., Morozov, P., Friedman, C.: Using blast for identifying gene and protein names in journal articles. Gene, 245–252 (2000)
Google Scholar
Hanisch, D., Fluck, J., Mevissien, D.T., Zimmer, R.: Playing Biology’s Name Game: Identifying protein names in scientific text. Pacific Symposium on Biocomputing 8, 403–414 (2003)
Google Scholar
Egorov, S., Yuryev, A., Daraselia, N.: A simple and practical dictionary based approach for identification of proteins in Medline abstracts. JAMIA 11(3), 174–178 (2004)
Google Scholar
Hatzivassiloglou, V., Duboue, P.A., Rzhetsky, A.: Disambiguating proteins, genes, and RNA in text: a machine learning approach. In: Proceedings of the 9th International Conference on Intelligent Systems for Molecular Biology, pp. 97–106 (2001)
Google Scholar
Wilbur, W., et al.: Analysis of biomedical text for biochemical names: As comparison of three methods. AMIA symposium, 176–180 (1999)
Google Scholar
Collier, N., Nobata, C., Tsujii, T.: 2000. In: COLING conference proceedings, pp. 201–207 (2000)
Google Scholar
Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: Proceedings of the Natural Language Processing in the Biomedical Domain, Philadelphia, PA, USA (2002)
Google Scholar
Chang, J.T., Schutze, H., Altman, R.B.: GAPSCORE: finding gene and protein names one word at a time. Bioinformatics 20, 216–225 (2004)
Article Google Scholar
Tanabe, L., Wilbur, J.: Tagging gene and protein names in biomedical text. Bioinformatics 18, 1124–1132 (2002)
Article Google Scholar
Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17, 155–161 (2001)
Article Google Scholar
Wong, L.: A protein interaction extraction system. Pacific Symposium on Biocomputing 6, 520–531 (2001)
Google Scholar
Humphreys, K., Demetriou, G., Gaizauskas, R.: Two applications of information extraction to biological science journal articles: enzyme interactions and protein structure. Pacific Symposium on Biocomputing 5, 505–516 (2000)
Google Scholar
Park, J.C., Kim, H.S., Kim, J.J.: Bi-directional incremental parsing for automatic pathway identification with combinatory categorical grammar. Pacific Symposium on Biocomputing 6, 396–407 (2001)
Google Scholar
Pusteojovsky, J., Castano, J., Zhang, J., Kotecki, M., Cochran, B.: Robust relational parsing over biomedical literature: Extracting inhibits relations. Pacific Symposium on Biocomputing 7, 362–373 (2002)
Google Scholar
Yakushiji, A., Tateisi, Y., Miyao, Y., Tsujii, J.: Event extraction from biomedical papers using a full parser. Pacific Symposium on Biocomputing 6, 408–419 (2001)
Google Scholar
Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. In: Proceedings of the workshop on Genome Informatics, pp. 62–71 (1998)
Google Scholar
Rindflesch, T., Tanabe, L., Weinstein, J., Hunter, L.: EDGAR: Extraction of drugs, genes and relations from the biomedical literature. Pacific Symposium on Biocomputing 5, 517–528 (2000)
Google Scholar
Ng, S.-K., Wong, M.: Towards routine automatic pathway discovery from on-line scientific text abstracts. In: Proceedings of the workshop on Genome Informatics, vol. 10, pp. 104–112 (1999)
Google Scholar
Rzhetsky, A., et al.: GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. Jr of Biomedical Informatics 37, 43–53 (2004)
Article Google Scholar
Pustejovsky, J., et al.: Medstract: Creating large scale information servers for biomedical libraries. In: ACL 2002, Philadelphia (2002)
Google Scholar
Wong, L.: PIES a protein interaction extraction system. Pacific Symposium on Biocomputing 6, 520–531 (2001)
Google Scholar
Schena, M., Shalon, D., Davis, R.W., Brown, P.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995)
Article Google Scholar
DeRisi, J., Iyer, V., Brown, P.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997)
Article Google Scholar
SPSS LexiQuest mine, available at http://www.spss.com
GetItRight, available at http://www.cthtech.com
LocusLink online gene database, available at http://www.ncbi.nlm.nih.gov/locuslink
Genecards online human gene databank available at http://bioinformatics.weizmann.ac.il/cards/
Swissprot senquence database, available at http://ca.expasy.org/sprot/
GoldenPath, Human Genome project, at http://www.cse.ucsc.edu/centers/cbe/Genome/
HUGO Human Genome Organization, at http://www.gene.ucl.ac.uk/hugo/
Chen, P.: The entity-relationship model: Toward a uniform view of data. ACM Transactions on Database systems 1(1), 9–36 (1976)
Article Google Scholar
DIP online protein interaction database, available at http://dip.doe-mbi.ucla.edu/
KEGG: Kyoto Encyclopedia of Genes and Genomes, available at http://www.genome.ad.jp/kegg/
Baeza-Yates, R., Ribeiro-Nato, B.: Modern information retrieval. Addison-Wesley, Harlow (1999)
Google Scholar
Mayanil, C.S., George, D., Freilich, L., Miljan, E.J., Mania-Farnell, B., McLone, D.G., Bremer, E.G.: Microarray analysis detects novel Pax3 downstream target genes. J Biol. Chem. 276(52), 49299–49309 (2001)
Article Google Scholar
SPSS Clementine workbench, available at http://www.spss.com

Download references

Author information

Authors and Affiliations

Brain Tumor Research Program, Children’s Memorial Hospital, and Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Eric G. Bremer & Yonghong Zhang
Bioinformatics Research Group, University of Ulster, UK
Jeyakumar Natarajan, Catherine J. Hack & Werner Dubitzky
SPSS, Inc, Chicago, IL, USA
Catherine DeSesa

Authors

Eric G. Bremer
View author publications
You can also search for this author in PubMed Google Scholar
Jeyakumar Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Catherine DeSesa
View author publications
You can also search for this author in PubMed Google Scholar
Catherine J. Hack
View author publications
You can also search for this author in PubMed Google Scholar
Werner Dubitzky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Sciences, School of Mathematics and Computing, University of Southern Queensland, 4350, Toowoomba, QLD, Australia
Jesús A. López
Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea 62, 20157, Milano, Italy
Emilio Benfenati
School of Biomedial Sciences, Bioinformatics Research Group, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland, UK
Werner Dubitzky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bremer, E.G., Natarajan, J., Zhang, Y., DeSesa, C., Hack, C.J., Dubitzky, W. (2004). Text Mining of Full Text Articles and Creation of a Knowledge Base for Analysis of Microarray Data. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-30478-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23927-7
Online ISBN: 978-3-540-30478-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics