Metabolic Pathway Mining

Czarnecki, Jan M.; Shepherd, Adrian J.

doi:10.1007/978-1-4939-6613-4_8

Metabolic Pathway Mining

Jan M. Czarnecki³ &
Adrian J. Shepherd⁴

Protocol
First Online: 29 November 2016

5164 Accesses
1 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

Abstract

Understanding metabolic pathways is one of the most important fields in bioscience in the post-genomic era, but curating metabolic pathways requires considerable man-power. As such there is a lack of reliable, experimentally verified metabolic pathways in databases and databases are forced to predict all but the most immediately useful pathways.

Text-mining has the potential to solve this problem, but while sophisticated text-mining methods have been developed to assist the curation of many types of biomedical networks, such as protein–protein interaction networks, the mining of metabolic pathways from the literature has been largely neglected by the text-mining community. In this chapter we describe a pipeline for the extraction of metabolic pathways built on freely available open-source components and a heuristic metabolic reaction extraction algorithm.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Abbreviations

NER :: Named entity recognition
NLP :: Natural language processing
PPI :: Protein–protein interaction

References

PubMed Help [Internet] (2005) National Center for Biotechnology Information (US), Bethesda, MD. Available from https://www.ncbi.nlm.nih.gov/books/NBK3830/
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Article CAS PubMed PubMed Central Google Scholar
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB et al (1997) Cath–a hierarchic classification of protein domain structures. Structure 5:1093–1108
Article CAS PubMed Google Scholar
Schomburg I, Chang A, Placzek S, Söhngen C, Rother M et al (2013) Brenda in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res 41:D764–D772
Article CAS PubMed Google Scholar
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H et al (1999) Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27:29–34
Article CAS PubMed PubMed Central Google Scholar
Caspi R, Altman T, Dale JM, Dreher K, Fulcher CA et al (2010) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 38:D473–D479
Article CAS PubMed Google Scholar
McQuilton P, FlyBase Consortium (2012) Opportunities for text mining in the flybase genetic literature curation workflow. Database (Oxford) 2012:bas039
Google Scholar
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L et al (2013) The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42:D358–D363
Article PubMed PubMed Central Google Scholar
Orchard S, Kerrien S, Abbani S, Aranda B, Bhate J et al (2012) Protein interaction data curation: the international molecular exchange (imex) consortium. Nat Methods 9:345–350
Article CAS PubMed PubMed Central Google Scholar
Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A (2008) Overview of the protein-protein interaction annotation extraction task of biocreative ii. Genome Biol 9(Suppl 2):S4
Article PubMed PubMed Central Google Scholar
Kabiljo R, Clegg AB, Shepherd AJ (2009) A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinf 10:233
Article Google Scholar
Miyao Y, Sagae K, Saetre R, Matsuzaki T, Tsujii J (2009) Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25:394–400
Article CAS PubMed Google Scholar
Hunter L, Lu Z, Firby J, Baumgartner WA, Johnson HL et al (2008) Opendmap: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC Bioinf 9:78
Article Google Scholar
Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A (2008) Text processing through web services: calling Whatizit. Bioinformatics 24:296–298
Article CAS PubMed Google Scholar
Krallinger M, Vazquez M, Leitner F, Salgado D, Chatr-Aryamontri A et al (2011) The protein-protein interaction tasks of biocreative iii: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinf 12(Suppl 8):S3
Article Google Scholar
Kwon D, Kim S, Shin SY, Chatr-aryamontri A, Wilbur WJ (2014) Assisting manual literature curation for protein-protein interactions using BioQRator. Database 2014:bau067
Google Scholar
Jamieson DG, Gerner M, Sarafraz F, Nenadic G, Robertson DL (2012) Towards semi-automated curation: using text mining to recreate the hiv-1, human protein interaction database. Database (Oxford) 2012:bas023
Google Scholar
Leaman R, Gonzalez G (2008) Banner: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput 13:652–663
Google Scholar
Björne J, Ginter F, Pyysalo S, Tsujii J, Salakoski T (2010) Complex event extraction at pubmed scale. Bioinformatics 26:i382–i390
Article PubMed PubMed Central Google Scholar
Miwa M, Saetre R, Kim JD, Tsujii J (2010) Event extraction with complex event classification using rich features. J Bioinform Comput Biol 8:131–146
Article CAS PubMed Google Scholar
Li L, Zhang P, Zheng T, Zhang H, Jiang Z et al (2014) Integrating semantic information into multiple kernels for protein-protein interaction extraction from biomedical literatures. PLoS One 9:e91898
Article PubMed PubMed Central Google Scholar
Quan C, Wang M, Ren F (2014) An unsupervised text mining method for relation extraction from biomedical literature. PLoS One 9:e102039
Article PubMed PubMed Central Google Scholar
Kim J, Ohta T, Pyysalo S, Kano Y, Tsujii J (2009) Overview of bionlp’09 shared task on event extraction. In: Proceedings of the BioNLP 2009 workshop companion volume for shared task. Association for Computational Linguistics, Boulder, CO, pp 1–9. http://www.aclweb.org/anthology-new/W/W09/W09-1401.bib
Blaschke C, Valencia A (2002) The frame-based module of the SUISEKI information extraction system. IEEE Intell Syst 17:14–20
Google Scholar
Iossifov I, Krauthammer M, Friedman C, Hatzivassiloglou V, Bader JS et al (2004) Probabilistic inference of molecular networks from noisy data sources. Bioinformatics 20:1205–1213
Article CAS PubMed Google Scholar
Rzhetsky A, Iossifov I, Koike T, Krauthammer M, Kra P et al (2004) Geneways: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J Biomed Inform 37:43–53
Article CAS PubMed Google Scholar
Santos C, Eggle D, States DJ (2005) Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction. Bioinformatics 21:1653–1658
Article CAS PubMed Google Scholar
Yuryev A, Mulyukov Z, Kotelnikova E, Maslov S, Egorov S et al (2006) Automatic pathway building in biological association networks. BMC Bioinf 7:171
Article Google Scholar
Marshall B, Su H, McDonald D, Eggers S, Chen H (2006) Aggregating automatically extracted regulatory pathway relations. IEEE Trans Inf Technol Biomed 10:100–108
Article PubMed Google Scholar
Rodríguez-Penagos C, Salgado H, Martínez-Flores I, Collado-Vides J (2007) Automatic reconstruction of a bacterial regulatory network using natural language processing. BMC Bioinf 8:293
Article Google Scholar
Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinf 6(Suppl 1):S1
Article Google Scholar
Smith L, Tanabe LK, nee Ando RJ, Kuo CJ, Chung IF et al (2008) Overview of biocreative ii gene mention recognition. Genome Biol 9(Suppl 2):S2
Google Scholar
Lu Z, Kao HY, Wei CH, Huang M, Liu J et al (2011) The gene normalization task in biocreative iii. BMC Bioinf 12(Suppl 8):S2
Article Google Scholar
Humphreys K, Demetriou G, Gaizauskas R (2000) Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Pac Symp Biocomput 5:505–516
Google Scholar
Novichkova S, Egorov S, Daraselia N (2003) MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 19:1699–1706
Article CAS PubMed Google Scholar
Karamanis N, Lewin I, Seal R, Drysdale R, Briscoe E (2007) Integrating natural language processing with flybase curation. Pac Symp Biocomput 12:245–256
Google Scholar
Wiegers TC, Davis AP, Cohen KB, Hirschman L, Mattingly CJ (2009) Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD). BMC Bioinf 10:326
Article Google Scholar
Winnenburg R, Wächter T, Plake C, Doms A, Schroeder M (2008) Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? Brief Bioinform 9:466–478
Article CAS PubMed Google Scholar
Kottmann J, Margulies B, Ingersoll G, Drost I, Kosin J, Baldridge J, Goetz T, Morton T, Silva W, Autayeu A, Galitsky B (2011) Apache opennlp. Online. www.opennlp.apache.org
Clegg AB, Shepherd AJ (2007) Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinf 8:24
Article Google Scholar
Buyko E, Wermter J, Poprat M, Hahn U (2006) Automatically adapting an NLP core engine to the biology domain. In: Proceedings of the ISMB 2006 joint linking literature, information and knowledge for biology and the 9th bio-ontologies meeting.
Google Scholar
Kim JD, Ohta T, Tateisi Y, Tsujii J (2003) Genia corpus–semantically annotated corpus for bio-text mining. Bioinformatics 19(Suppl 1):i180–i182
Article PubMed Google Scholar
Kulick S, Bies A, Liberman M, Mandel M, Mcdonald R et al (2004) Integrated annotation for biomedical information extraction. In: Biolink: linking biological literature, ontologies and databases, proceedings of HLT-NAACL, pp 61–68
Google Scholar
Hahn U, Matthies F, Faessler E, Hellrich J (2016) UIMA-based JCoRe 2.0 goes GitHub and Maven central―state-of-the-art software resource engineering and distribution of NLP pipelines. In: Calzolari N (Conference Chair), Choukri K, Declerck T, Grobelnik M, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds.) Proceedings of the tenth international conference on language resources and evaluation (LREC 2016), Portorož, Slovenia
Google Scholar
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S et al (2010) Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513
Article PubMed PubMed Central Google Scholar
Corbett P, Murray-Rust P (2006) High throughput identification of chemistry in life science texts. In: Proceedings of the 2nd international symposium on computational life science (CompLife ’06), pp 107–118
Google Scholar
Jessop DM, Adams SE, Willighagen EL, Hawizy L, Murray-Rust P (2011) Oscar4: a flexible architecture for chemical text-mining. J Cheminform 3:41
Article CAS PubMed PubMed Central Google Scholar
Rocktäschel T, Weidlich M, Leser U (2012) Chemspot: a hybrid system for chemical named entity recognition. Bioinformatics 28:1633–1640
Article PubMed Google Scholar
Kolarik C, Klinger R, Friedrich CM, Hofmann-Apitius M, Fluck J (2008) Chemical names: Terminological resources and corpora annotation. In: Workshop on Building and evaluating resources for biomedical text mining (6th edition of the Language Resources and Evaluation Conference). Marrakech, Morocco
Google Scholar
Gerner M, Nenadic G, Bergman CM (2010) Linnaeus: a species name identification system for biomedical literature. BMC Bioinf 11:85
Article Google Scholar
Yepes AJ, Verspoor K (2014) Literature mining of genetic variants for curation: quantifying the importance of supplementary material. Database (Oxford) 2014:bau003
Google Scholar
de Matos P, Ennis M, Darsow M, Guedj M, Degtyarenko K et al (2006) Chebi — chemical entities of biological interest. Database Summary Paper 646, EMBL Outstation - The European Bioinformatics Institute
Google Scholar
Czarnecki J, Nobeli I, Smith AM, Shepherd AJ (2012) A text-mining system for extracting metabolic reactions from full-text articles. BMC Bioinf 13:172
Article Google Scholar
Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) Chapter 12 PubChem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem 4:217–241
Article CAS Google Scholar
de Matos P, Alcantara R, Dekker A, Ennis M, Hastings J et al (2010) Chemical entities of biological interest: an update. Nucleic Acids Res 38:D249–D254
Article PubMed Google Scholar
(2006) Porter stemming algorithm implementations. http://tartarus.org/~martin/PorterStemmer/
Porter M (1980) An algorithm for suffix stripping. Program 14:130–137
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Biosciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
Jan M. Czarnecki
Department of Biological Sciences and Institute of Structural and Molecular Biology, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
Adrian J. Shepherd

Authors

Jan M. Czarnecki
View author publications
You can also search for this author in PubMed Google Scholar
Adrian J. Shepherd
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian J. Shepherd .

Editor information

Editors and Affiliations

Monash University, Melbourne, Victoria, Australia
Jonathan M. Keith

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Czarnecki, J.M., Shepherd, A.J. (2017). Metabolic Pathway Mining. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_8

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6613-4_8
Published: 29 November 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6611-0
Online ISBN: 978-1-4939-6613-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics