Abstract
Here, the basic principles of reconstructing genome-scale metabolic models with merlin are described. This tool covers the basic stages of this process, providing several tools that allow assembling models, using the sequenced genome as a starting point.
merlin has two main modules, separating the process of annotating (enzymes, transporters, and compartments) on the genome from the process of model assembly, though information from the former is integrated in the latter after curation. Moreover, merlin provides several tools to curate the model, including tools for generating reactions’ gene rules and placeholder entities for biomass precursors, such as proteins (e-protein) or nucleotides (e-DNA and e-RNA) among others.
This tutorial covers each feature of merlin in detail, including the assessment of experimental data for the validation of the model.
References
Otero JM, Nielsen J (2010) Industrial systems biology. Biotechnol Bioeng 105:439–460. https://doi.org/10.1002/bit.22592
Kitano H (2002) Systems biology: a brief overview. Science 295:1662–1664. https://doi.org/10.1126/science.1069492
Dias O, Rocha I (2015) Systems biology in fungi. In: Paterson R (ed) Mol. Biol. Food water borne mycotoxigenic mycotic fungi. CRC Press, Boca Raton, FL, pp 69–92
gismo Meaning in the Cambridge English Dictionary. http://dictionary.cambridge.org/dictionary/english/gismo#translations. Accessed 13 Apr 2017
Gizmo definition and meaning | Collins English Dictionary. https://www.collinsdictionary.com/dictionary/english/gizmo. Accessed 13 Apr 2017
Thiele I, Palsson BØ (2010) A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc 5:93–121. https://doi.org/10.1038/nprot.2009.203
Dias O, Rocha M, Ferreira EC, Rocha I (2015) Reconstructing genome-scale metabolic models with merlin. Nucleic Acids Res 43:3899–3910. https://doi.org/10.1093/nar/gkv294
Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL (2010) High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol 28:977–982. https://doi.org/10.1038/nbt.1672
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr J-H, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novère N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19:524–531. https://doi.org/10.1093/bioinformatics/btg015
Rocha I, Maia P, Evangelista P, Vilaça P, Soares S, Pinto JP, Nielsen J, Patil KR, Ferreira EC, Rocha M (2010) OptFlux: an open-source software platform for in silico metabolic engineering. BMC Syst Biol 4:45. https://doi.org/10.1186/1752-0509-4-45
Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, Kang J, Hyduke DR, Palsson BØ (2011) Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc 6:1290–1307. https://doi.org/10.1038/nprot.2011.308
Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, Crampin EJ, Halstead M, Klipp E, Mendes P, Nielsen P, Sauro H, Shapiro B, Snoep JL, Spence HD, Wanner BL (2005) Minimum information requested in the annotation of biochemical models (MIRIAM). Nat Biotechnol 23:1509–1515. https://doi.org/10.1038/nbt1156
Glez-Peña D, Reboiro-Jato M, Maia P, Rocha M, Díaz F, Fdez-Riverola F (2010) AIBench: a rapid application development framework for translational research in biomedicine. Comput Methods Programs Biomed 98:191–203. https://doi.org/10.1016/j.cmpb.2009.12.003
UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212. https://doi.org/10.1093/nar/gku989
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I (2016) UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: how to use the entry view. Methods Mol Biol 1374:23–54. https://doi.org/10.1007/978-1-4939-3167-5_2
Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Yaschenko E, Ye J (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37:D5–15. https://doi.org/10.1093/nar/gkn741
Schomburg I, Chang A, Schomburg D (2002) BRENDA, enzyme data and metabolic information. Nucleic Acids Res 30:47–49
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27:29–34. https://doi.org/10.1093/nar/27.1.29
Lipman DJ, Pearson WRW (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441. PMID: 2983426
Federhen S (2012) The NCBI Taxonomy database. Nucleic Acids Res 40:D136–D143. https://doi.org/10.1093/nar/gkr1178
Kitts PA, Church DM, Thibaud-Nissen F, Choi J, Hem V, Sapojnikov V, Smith RG, Tatusova T, Xiang C, Zherikov A, DiCuccio M, Murphy TD, Pruitt KD, Kimchi A (2016) Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res 44:D73–D80. https://doi.org/10.1093/nar/gkv1226
mysql-server - Linux Mint Community. https://community.linuxmint.com/software/view/mysql-server. Accessed 13 Apr 2017
MySQL :: About MySQL. https://www.mysql.com/about/. Accessed 13 Apr 2017
Pearson WR (2013) An introduction to sequence similarity (“Homology”) searching. In: Curr. Protoc. Bioinforma. John Wiley & Sons, Inc., Hoboken, NJ, pp 3.1.1–3.1.8
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37. https://doi.org/10.1093/nar/gkr367
Magrane M, Consortium UP (2011) UniProt Knowledgebase: a hub of integrated protein data. Database. https://doi.org/10.1093/database/bar009
Dias O, Gomes D, Vilaca P, Cardoso J, Rocha M, Ferreira E, Rocha I (2017) Genome-wide semi-automated annotation of transporter systems. IEEE/ACM Trans Comput Biol Bioinforma 14:443. https://doi.org/10.1109/TCBB.2016.2527647
Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, Brinkman FSL (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26:1608–1615. https://doi.org/10.1093/bioinformatics/btq249
Goldberg T, Hecht M, Hamp T, Karl T, Yachdav G, Ahmed N, Altermann U, Angerer P, Ansorge S, Balasz K, Bernhofer M, Betz A, Cizmadija L, Do KT, Gerke J, Greil R, Joerdens V, Hastreiter M, Hembach K, Herzog M, Kalemanov M, Kluge M, Meier A, Nasir H, Neumaier U, Prade V, Reeb J, Sorokoumov A, Troshani I, Vorberg S, Waldraff S, Zierer J, Nielsen H, Rost B (2014) LocTree3 prediction of localization. Nucleic Acids Res 42:W350–W355. https://doi.org/10.1093/nar/gku396
Saier MH (2000) A functional-phylogenetic classification system for transmembrane solute transporters. Microbiol Mol Biol Rev 64:354–411
Sonnhammer EL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6:175–182
Käll L, Krogh A, Sonnhammer ELL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036. https://doi.org/10.1016/j.jmb.2004.03.016
Moller S, Croning MDR, Apweiler R, Möller S (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17:646–653. https://doi.org/10.1093/bioinformatics/17.7.646
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197. https://doi.org/10.1016/0022-2836(81)90087-5
Gardy JL, Brinkman FSL (2006) Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 4:741–751. https://doi.org/10.1038/nrmicro1494
Ma H, Zeng A-P (2003) Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics 19:270–277. https://doi.org/10.1093/bioinformatics/19.2.270
Stelzer M, Sun J, Kamphans T, Fekete SP, Zeng A-P (2011) An extended bioreaction database that significantly improves reconstruction and analysis of genome-scale metabolic networks. Integr Biol (Camb) 3:1071–1086. https://doi.org/10.1039/c1ib00008j
Tanabe M, Kanehisa M (2012) Using the KEGG database resource. Curr Protoc Bioinformatics Chapter 1:Unit1.12. doi: https://doi.org/10.1002/0471250953.bi0112s38
Varma A, Palsson BO (1993) Metabolic capabilities of Escherichia coli II. Optimal growth patterns. J Theor Biol 165:503–522. https://doi.org/10.1006/jtbi.1993.1203
Santos ST (2013) Development of computational methods for the determination of biomass composition and evaluation of its impact in genome-scale models predictions. Universidade do Minho
Santos S, Rocha I (2016) Estimation of biomass composition from genomic and transcriptomic information. J Integr Bioinform. https://doi.org/10.2390/biecoll-jib-2016-285
Xavier JC, Patil KR, Rocha I (2017) Integration of biomass formulations of genome-scale metabolic models with experimental data reveals universally essential cofactors in prokaryotes. Metab Eng 39:200. https://doi.org/10.1016/j.ymben.2016.12.002
Dias O, Pereira R, Gombert AK, Ferreira EC, Rocha I (2014) iOD907, the first genome-scale metabolic model for the milk yeast Kluyveromyces lactis. Biotechnol J 9:776–790. https://doi.org/10.1002/biot.201300242
Sauer U, Lasko DR, Fiaux J, Hochuli M, Glaser R, Szyperski T, Wuthrich K, Bailey JE (1999) Metabolic flux ratio analysis of genetic and environmental modulations of escherichia coli central carbon metabolism. J Bacteriol 181:6679–6688
Brohée S, Barriot R, Moreau Y, André B (2010) YTPdb: a wiki database of yeast membrane transporters. Biochim Biophys Acta 1798:1908–1912. https://doi.org/10.1016/j.bbamem.2010.06.008
Saier MH, Reddy VS, Tamang DG, Västermark A (2014) The transporter classification database. Nucleic Acids Res 42:D251–D258. https://doi.org/10.1093/nar/gkt1097
Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD (2012) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 40:D742–D753. https://doi.org/10.1093/nar/gkr1014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Fig. S1
NCBI assembly webpage. The genome can be accessed from the links on the right (GenBank—green arrow; RefSeq—dashed green arrow). Below other relevant links. The link inside the red ellipse allows retrieving the taxonomy identifier (blue circle) from the NCBI taxonomy database (PDF 723 kb)
Fig. S2
InterProScan report. Red circle—submenu for accessing the report. Genes with InterProScan’s reports are noticeable by buttons with purple background (PDF 471 kb)
Fig. S3
Transporters annotation panel. Black circle—information types available in the information window; Red ellipse—integrate similarity information with TRIAGE’s TAD; blue ellipse—create transport reactions; green ellipse—integrate to model or export information to tabular file. The information panel shows several ontology reactions, derived from the primary transporters’ annotations (PDF 418 kb)
Fig. S4
Compartments annotation panel. Secondary compartments may be annotated if the score is close to the one of the main compartments (PDF 367 kb)
Fig. S5
Growth rate versus ATP flux. The slope represents the growth ATP requirements and the y-intercept value indicates the maintenance ATP flux (PDF 28 kb)
Fig. S6
merlin’s main interface. The main interface has three main components, namely the operation bar (blue square), the clipboard (green square), and the data visualizer (red square) (PDF 187 kb)
Fig. S7
RefSeq multispecies annotation (PDF 262 kb)
Fig. S8
Flowchart for the annotation of new transporters from TCDB (PDF 206 kb)
Fig. S9
Example of plots for determining the specific growth rate (a) and specific consumption rate (b). In the former only the first five data points should be selected to perform the linear regression as the other do not belong to the exponential growth phase (PDF 94 kb)
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Dias, O., Rocha, M., Ferreira, E.C., Rocha, I. (2018). Reconstructing High-Quality Large-Scale Metabolic Models with merlin. In: Fondi, M. (eds) Metabolic Network Reconstruction and Modeling. Methods in Molecular Biology, vol 1716. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7528-0_1
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7528-0_1
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7527-3
Online ISBN: 978-1-4939-7528-0
eBook Packages: Springer Protocols