Mining Protein–Protein Interactions from Published Literature Using Linguamatics I2E

Bandy, Judith; Milward, David; McQuay, Sarah

doi:10.1007/978-1-60761-175-2_1

Judith Bandy³,
David Milward⁴ &
Sarah McQuay⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 563))

2401 Accesses
20 Citations

Abstract

Natural language processing (NLP) technology can be used to rapidly extract protein–protein interactions from large collections of published literature. In this chapter we will work through a case study using MEDLINE^® biomedical abstracts (1) to find how a specific set of 50 genes interact with each other. We will show what steps are required to achieve this using the I2E software from Linguamatics (www.linguamatics.com (2)).

To extract protein networks from the literature, there are two typical strategies. The first is to find pairs of proteins which are mentioned together in the same context, for example, the same sentence, with the assumption that textual proximity implies biological association. The second approach is to use precise linguistic patterns based on NLP to find specific relationships between proteins. This can reveal the direction of the relationship and its nature such as “phosphorylation” or “upregulation”. The I2E system uses a flexible text-mining approach, supporting both of these strategies, as well as hybrid strategies which fall between the two. In this chapter we show how multiple strategies can be combined to obtain high-quality results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

MEDLINE^® (Medical Literature Analysis and Retrieval System Online) is the U.S. National Library of Medicine’s^® (NLM) premier bibliographic database that contains over 17 million references to journal articles in life sciences with a concentration on biomedicine (www.nlm.nih.gov).
I2E is developed and marketed by Linguamatics Ltd. Further information can be obtained from www.linguamatics.com or by contacting the contributing authors.
Milward, D., Blaschke, C., Neefs, J.-M., Ott, M.-C., Verbeeck, R., and Stubbs, A. (2006) Flexible Text Mining Strategies for Drug Discovery. Proc. Second International Symposium on Semantic Mining in BioMedicine (SMBM 2006), Jena, Germany April 9–12, 2006 pp. 101–104.
Google Scholar
Thomas, J., Milward, D., Ouzounis, C., Pulman, S., and Carroll, M. (2000) Automatic extraction of protein interactions from scientific abstracts. Pac. Symp. Biocomput., Waikiki, Hawaii, 2000 January 4–9 541–552.
Google Scholar
Humphreys, K., Demetriou, G., and Geizauskas, R. (2000) Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structure. Pac. Symp. Biocomput., Waikiki, Hawaii, 2000 January 4–9 502–513.
Google Scholar
Milward, D., Bjäreland, M., Hayes, W., Maxwell, M., Öberg, L., Tilford, N., Thomas, J., Hale, R., Knight, S., and Barnes, J. (2005) Ontology-based interactive information extraction from scientific abstracts. Comp. Funct. Genomics, 6, 67–71.
Article PubMed CAS Google Scholar
HUGO, The Human Gene Organization, www.hugo-international.org
Maglott, D., Ostell, J., Pruitt, K.D., and Tatusova, T. (2005) Entrez Gene: Gene-centered information at NCBI. Nucleic Acids Res., 33, D54–D58.
Article PubMed CAS Google Scholar
Hearst, M.A. (1999) Untangling Text Data Mining. Proc. 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, College Park. June 20–26, 1999.
Google Scholar
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003) Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res., 13, 2498–2504. www.cytoscape.org
Google Scholar
The InforSense Platform is developed and marketed by InforSense Ltd. Further information can be obtained from www.inforsense.com

Download references

Author information

Authors and Affiliations

Linguamatics Ltd, St John’s Innovation Centre, Cambridge, UK
Judith Bandy
Linguamatics Ltd, Cambridge, UK
David Milward & Sarah McQuay

Authors

Judith Bandy
View author publications
You can also search for this author in PubMed Google Scholar
David Milward
View author publications
You can also search for this author in PubMed Google Scholar
Sarah McQuay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GeneGo Inc., Saxony Road 169, Encinitas, 92024, U.S.A.
Yuri Nikolsky
GeneGo Inc., Saxony Road 169, Encinitas, 92024, U.S.A.
Julie Bryant

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bandy, J., Milward, D., McQuay, S. (2009). Mining Protein–Protein Interactions from Published Literature Using Linguamatics I2E. In: Nikolsky, Y., Bryant, J. (eds) Protein Networks and Pathway Analysis. Methods in Molecular Biology, vol 563. Humana Press. https://doi.org/10.1007/978-1-60761-175-2_1

Download citation

DOI: https://doi.org/10.1007/978-1-60761-175-2_1
Published: 29 May 2009
Publisher Name: Humana Press
Print ISBN: 978-1-60761-174-5
Online ISBN: 978-1-60761-175-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics