Reconstruction and analysis of a Kluyveromyces marxianus genome-scale metabolic model
Kluyveromyces marxianus is a thermotolerant yeast with multiple biotechnological potentials for industrial applications, which can metabolize a broad range of carbon sources, including less conventional sugars like lactose, xylose, arabinose and inulin. These phenotypic traits are sustained even up to 45 °C, what makes it a relevant candidate for industrial biotechnology applications, such as ethanol production. It is therefore of much interest to get more insight into the metabolism of this yeast. Recent studies suggested, that thermotolerance is achieved by reducing the number of growth-determining proteins or suppressing oxidative phosphorylation. Here we aimed to find related factors contributing to the thermotolerance of K. marxianus.
Here, we reported the first genome-scale metabolic model of Kluyveromyces marxianus, iSM996, using a publicly available Kluyveromyces lactis model as template. The model was manually curated and refined to include the missing species-specific metabolic capabilities. The iSM996 model includes 1913 reactions, associated with 996 genes and 1531 metabolites. It performed well to predict the carbon source utilization and growth rates under different growth conditions. Moreover, the model was coupled with transcriptomics data and used to perform simulations at various growth temperatures.
K. marxianus iSM996 represents a well-annotated metabolic model of thermotolerant yeast, which provides a new insight into theoretical metabolic profiles at different temperatures of K. marxianus. This could accelerate the integrative analysis of multi-omics data, leading to model-driven strain design and improvement.
KeywordsGenome-scale metabolic model Kluyveromyces marxianus Thermotolerant yeast Constraint-based flux analysis
Flux balance analysis
Genome-scale metabolic model
Yeast extract peptone glucose
Yeast extract peptone xylose
The ascomycetous yeast Kluyveromyces marxianus is a prospective microbial host for industrial biotechnology. It is known as a fast growing, Crabtree negative organism, capable of utilizing a broad selection of sugars and having a high secretion capacity for proteins [1, 2]. For these reasons, K. marxianus has been successfully applied in numerous studies for the endogenous enzymes production including inulinase , β-galactosidase , β-glucosidase  and β-xylosidase . Even though its sister species Kluyveromyces lactis contains 94% of K. marxianus genes , only K. marxianus is thermotolerant, growing up to 45–50 °C. Since it has been isolated mainly from dairy products for years, it has GRAS (Generally Regarded as Safe) and QPS (Qualified Presumption of Safety) status, therefore making it suitable for applications in the food and pharma industry.
Thermotolerant hosts may positively impact the high temperature industrial bioprocesses in several ways. Firstly, thermotolerance is favourable once the simultaneous saccharification and fermentation (SSF) is considered as many amylases and cellulases are more efficient at high temperatures . Secondly, the maintenance costs may be reduced due to the lower cost for bioreactor cooling and sterilization of cultivation media [8, 9]. The media sterility is less important at high temperatures due to the decreased competition between the species for the available medium resources. This is particularly beneficial for the complex media like whey and slurry. At high temperatures, thermotolerant yeasts are forced to reorganize their profiles at the transcriptional and proteomic levels . For instance, protein folding, phosphorylation and cell wall organisation processes were upregulated in YPD medium under high temperature, thereby suppressing carbohydrate, amino acid and lipid metabolism [11, 12]. The cell likely benefits from the medium richness, while depleting biosynthetic pathways for the several YPD compounds. However, not much is known about K. marxianus intracellular metabolite levels at high temperatures. Such knowledge may be useful for minimal media design and for identifying high temperature metabolic capabilities, in particular for evaluation of the production potential of various compounds.
Genome-scale metabolic models (GEMs) are valuable systems biology tools as they combine genome annotation, cultivation and other experimental data for a given organism into a whole-cell metabolic network. Complementary to the general omics analysis, GEMs give an ability to integrate omics data into metabolic networks and predict the condition specific metabolic capabilities. GEMs have been successfully used to design strains and evaluate the cell capabilities upon different conditions [13, 14].
To evaluate the metabolic capabilities of K. marxianus, we reconstructed iSM996, the first genome-scale metabolic model (GEM) for this species. This model was used to predict the growth on various carbon sources and we found a good consistency between the model predictions and the experimental data reported in the literature. We used transcriptomics data to obtain the context specific models at different chemostat conditions, i.e. 30 °C YPD shaking, 30 °C YPD non-shaking and 45 °C YPD shaking. The models revealed the main metabolic bottlenecks associated with growth at high temperatures. To our knowledge, this approach is one of the first attempts to use GEMs to evaluate metabolic profiles at different temperatures.
The genome-scale metabolic network reconstruction for K. marxianus DMKU3–1042 was based on the published genome annotation (NCBI accession PRJDA65233) and other databases including KEGG , MetaCyc , TransportDB  and BRENDA . Two draft models were generated using the RAVEN Toolbox . The first model contained homologous reactions from iOD907, the GEM for the sister species Kluyveromyces lactis . This was accomplished with the RAVEN function getModelFromHomology, which utilises BLASTP  for the bi-directional homology search. The following homology criteria were used: e-value 1E-30; identity 40%; alignment length of 200. An additional check to identify homologs was considered for proteins shorter than 250 amino acids, since for those it was less likely to satisfy e-value and alignment length criteria. Spontaneous iOD907 reactions without gene associations were also checked to be added to this draft model. The second draft model was generated from KEGG using the RAVEN function getKEGGModelForOrganism. This function uses HMMER (http://hmmer.org/) in homology search, where the query proteome is queried against KEGG Orthology (KO) specific hidden Markov model (HMM) sets. The default e-value threshold was used, i.e. 1E-50. It was then compartmentalised with the RAVEN function predictLocalization, which uses WoLF PSORT  protein scores as input. Manual curation was then performed for metabolite names and reaction reversibility, directionality, when using iOD907, Yeast 7.6  and HMR2  as reference. Thereafter, the unique reactions from the second draft model were then manually added to the first draft model. All the compartmentalisation discrepancies were fixed according to the latter model.
The biomass composition was adapted from iOD907 and adjusted with K. marxianus bibliographic data wherever possible. The cell dry weight composition (g/gDW) for protein, carbohydrate, lipid, RNA and DNA content was incorporated from cultivation study . The stoichiometric coefficients for biomass compounds were then scaled-up to constitute up to 1 g of biomass in biomass equation. The experimental data from the cell wall study  was used to estimate the composition of the cell wall carbohydrates, namely glucan, mannan and chitin. Knowing the total carbohydrates mass in 1 g of biomass, we calculated the remaining mass for other carbohydrates (trehalose and amylose). The molar ratio between trehalose and amylose was considered the same as in iOD907. Nucleotide, deoxynucleotide and amino acid composition was calculated from genome, transcriptome (including tRNA and rRNA) and proteome according to the suggested protocol . The detailed biomass composition is shown in Additional file 4. The energy parameters for the growth associated maintenance (GAM), the non-growth associated maintenance (NGAM) and P/O ratio were retained the same as in iOD907, since related information was lacking for K. marxianus.
The semi-automatic gap filling was performed to ensure that the draft model could produce all biomass components in Verduyn medium . This medium was also considered as the minimal medium in this study. The main gap-filling reaction sources were iOD907, Yeast 7.6 models and KEGG database. The semi-automatic gap filling was done for each biomass component separately, i.e. for all reactants in biomass, protein, carbohydrate, fatty acid, lipid, DNA and RNA pseudo reactions (r_1906-r_1912). The RAVEN function fillGaps was used to identify the candidate reactions for the gap filling, which were kept in the model upon no contradictions in the literature. Such pre-processed draft model was then greatly improved with K. marxianus literature knowledge. For instance, we added the missing uptake pathways for several carbon  sources like inulin, L-arabinose or D-mannitol. The draft model was then thoroughly curated for gene associations, EC numbers , metabolite names, reactions elemental/charge balance. The existing gene-protein-reaction (GPR) rules were checked for the substrate, co-factor usage and sub-cellular localization relevance. Upon any discrepancies for a particular gene, its association would be migrated to the newly created reaction with altered substrate, co-factor usage or different compartmentalization for involved metabolites. The reactions which after such curation no longer contained GPR rules, were considered for removal, unless they were found to be essential for simulations done in this study. We considered Yeast 7.6 , BRENDA  and MetaCyc , TransportDB  as reference. The additional information for every reaction is included in Additional file 2 (RXNS sheet, NOTE column).
The iSM996 model (Additional file 1) was checked for the overall consistency using memote, the genome-scale metabolic model test suite . The full report is included as Additional file 3. The model was validated using the constraint-based flux balance analysis (FBA)  upon minimal medium. An ability to predict growth upon various carbon and nitrogen sources was tested and compared with the literature knowledge. During the testing of the different nitrogen sources D-glucose was chosen as a carbon source. The iSM996 was also checked for the growth rate accuracy in minimal media including various uptake rates for carbon sources. In such simulations, the lower and upper bounds for relevant substrates were set to the corresponding experimental uptake rate values. No cultivation experiments were considered in this study, see the complete reference list for the cited experimental values in Additional file 4 (Sheet 8).
Transcriptomic data integration
The quantitative transcription start site sequencing (TSS-seq)  data was used to deactivate reactions (Additional file 5) and obtain condition specific models. TSS-seq data was used from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) under the accession ID GSE66600. Three conditions were considered from the downloaded dataset: 30 °C YPD shaking (30D), 30 °C YPD non-shaking (30DS) and 45 °C YPD shaking (45D). TSS-seq dataset also contained the data for D-xylose medium (YPX) in 30 °C (30X). However, it was decided not to include this condition in the analysis, since KmXKS1 gene, responsible for D-xylulose phosphorylation, was inactive therefore preventing D-xylose assimilation. However, all four conditions were utilised to identify the genes which were turned off in at least one condition. Thereafter, three condition specific models were obtained, each having inactivated reactions once their associated genes were inactive. No threshold values for expression abundance were considered, the gene was called as inactive if it had zero expression values in all three replicates per condition.
Reduced cost analysis
The values for reduced cost were calculated using the optimizeCbModel function from The COBRA Toolbox for MATLAB .
Results and discussion
The model reconstruction of K. marxianus included four steps: i) obtain and curate the draft reconstruction based on template model iOD907; ii) fetch and curate the second draft model from KEGG; iii) merge both draft models; iv) perform the thorough manual curation. The first draft model after curation had 875 genes including 20 manually identified homologs and 29 non-gene-associated spontaneous reactions. Meanwhile, the compartmentalised and curated second draft model had 524 genes. These two models were merged into a new draft model with 987 genes, 1934 reactions and 1678 metabolites. We then added exchange, biomass and ATP maintenance reactions from iOD907 and performed the growth simulation in a minimal medium. As the model could predict growth, we used the draft model as a basis for further model modifications.
The manual curation started by fixing two inconsistencies inferred from the template GEM. Firstly, in iOD907 all the cytosolic metabolites could be transported to extracellular space and had exchange reactions. Since the cell ability to import or secrete all cytosolic metabolites seemed unspecific, we commenced the manual check by using Yeast 7.6 as reference. Overall, the number of transportable cytosolic metabolites was decreased from 313 to 182 in iSM996. Another iOD907 problem was that the cell could import or secrete several fatty acid-acyl carrier protein (ACP) complexes even though the cell membrane should be impermeable for such large complexes. These complexes were therefore replaced by the corresponding free fatty acids. Correspondingly, the biosynthetic pathways were added for the known products like 2-phenylethanol, phenethyl acetate or ethyl acetate.
Comparison between Kluyveromyces lactis GEM iOD907 and Kluyveromyces marxianus GEM iSM996
S. cerevisiae homologues
An additional evaluation was performed to find the in silico explanation through the iSM996, why K. marxianus cannot assimilate the non-growth carbon sources, listed in Fig. 2 a. The results suggested that D-gluconate and N − acetyl−D − glucosamine cannot not be assimilated due to the absence of correspondent transporters. Melibiose, trehalose, starch, and myo-inositol lack their respective glycoside hydrolases. Unlike L-arabinose, D-arabinose cannot not be assimilated since the cell lacks the redox-driven L-arabinose isomerisation capability to D-xylulose through L-arabinitol. Finally, the growth upon D-glucosamine as a sole carbon source is not possible, since K. marxianus, just like S. cerevisiae , does not have glucosamine-6-phosphate deamination capability.
The iSM996 was also checked for the growth rate prediction accuracy upon various carbon sources and their uptake rates. The results (Fig. 2 b) indicate the strong correlation between experimental and in silico growth rates. The difference between in vivo and in silico growth rates was smaller when growth rate did not exceed 0.3 h− 1. Whereas NGAM value has a minimal impact to predicted growth rates, one may hypothesize that in reality the cell “tweaks” the stoichiometric coefficients in the biomass equation including GAM. It is therefore not possible to re-use the same biomass composition in a wide range of the growth rates. While the biomass reactions in GEMs are usually most relevant for the growth rates lower than 0.4 h− 1, the more specific experimental data for the biomass composition, GAM, and NGAM are needed for the higher growth rates, if one aims to perform the simulations at high growth rates.
In silico evaluation of K. marxianus metabolic capacities in stress conditions
The iSM996 model represents the whole cell metabolic capabilities assuming that all genes are active. We next used iSM996 to predict the cell metabolic stress response upon the low oxygen availability and the high temperature in the rich medium. This was done by integrating TSS-seq data to GEM and obtaining the context-specific models in YPD medium for the following conditions: 30 °C shaking (30D), 30 °C static (30DS) and 45 °C shaking (45D). The analysis would greatly benefit from additionally considering the metabolomics data, however we reckon that the comparative analysis of condition-specific models based on only transcriptomics data is still relevant to identify the key differences for biomass precursor synthesis between the conditions.
The simulations indicate that across the conditions, K. marxianus retains the ability to allocate the amino acids necessary for roughly the same protein mass. Amino acid auxotrophies to L-arginine and L-histidine were identified in 30D and 45D. High temperature-specific auxotrophies included the previously reported L-lysine, L-isoleucine  and previously undiscovered auxotrophies to L-alanine, L-phenylalanine and L-tyrosine. Having the depleted biosynthetic pathways for seven amino acids at high temperatures allows the cells to conserve the precursor metabolites required for these pathways and likely contributes to an increased flux through Embden–Meyerhof–Parnas (EMP) pathway and the TCA cycle due to the increased carbon availability. However, the decreased theoretical production pool sizes in 45D for L-alanine, L-isoleucine, L-phenylalanine, L-tyrosine and L-arginine may reduce the availability for the biosynthesis of some proteins enriched by these amino acids.
The lipid pool size remains the same at all three conditions. However, this assumption is based on the arbitrary abundance value for myo-inositol (0.001 mmol/gDW/h), so one may find the different lipid pool sizes once the experimental values for myo-inositol are used. The excessive availability for the other lipid components is suppressed by an inactivation of the KmDGK1 gene, which is a known contributor to the increased heat sensitivity when overexpressed .
The significant decrease of flux found in deoxynucleotides was the shortage of dAMP. Due to the inactivation of the KmISN1 and KmSDT1 genes, dAMP shortage cannot be compensated by salvaging other purine nucleotides. This suggest that the cell aims to conserve these building blocks. This coincides with the upregulation of the genes responsible for DNA repair .
We hereby present iSM996, the first genome scale-metabolic model for the thermotolerant yeast K. marxianus. This model is capable of predicting the growth on the majority of the reported carbon and nitrogen sources. The growth simulations hereby indicate the close correspondence between in vivo an in silico growth rates for the numerous carbon sources. In addition, the model contains several unique biosynthetic pathways for aroma compounds like 2-phenylethanol, phenethyl acetate and ethyl acetate. The iSM996 was used to construct three condition specific models in YPD medium while considering the low oxygen and high temperature conditions. The results suggested that at the high temperature the cell turns off more genes thereby introducing new auxotrophies and hereby utilizing as many resources as possible from the medium. These findings may be used to the growth medium design upon the low oxygen availability and high temperature. Finally, the iSM996 model is a solid tool to evaluate K. marxianus metabolic features, allowing the experimental data integration and the model-driven strain design.
SM, BJ and JN conceived the study. SM and BJ performed model reconstruction and validation. SM performed condition-specific model reconstructions and comparative analysis. SM, BJ and JN drafted, reviewed and approved the manuscript.
This project was funded by the ERASysAPP project SysMilk.
The funding body did not play any roles in the design of study, collection, analysis, interpretation of results or the writing of this manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 10.Taymaz-Nikerel H, Cankorur-Cetinkaya A, Kirdar B. Genome-Wide Transcriptional Response of Saccharomyces cerevisiae to Stress-Induced Perturbations. Front Bioeng Biotechnol. 2016;4. https://doi.org/10.3389/fbioe.2016.00017.
- 12.Shui W, Xiong Y, Xiao W, Qi X, Zhang Y, Lin Y, et al. Understanding the mechanism of Thermotolerance distinct from heat shock response through proteomic analysis of industrial strains of Saccharomyces cerevisiae. Mol Cell Proteomics. 2015;14:1885–97. https://doi.org/10.1074/mcp. M114.045781.CrossRefPubMedPubMedCentralGoogle Scholar
- 19.Wang H, Marcišauskas S, Sánchez BJ, Domenzain I, Hermansson D, Agren R, et al. RAVEN 2.0: a versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor. PLoS Comput Biol. 2018;14:e1006541.Google Scholar
- 28.Lachance MA. Kluyveromyces van der Walt (1971). Elsevier B.V; 2011. https://doi.org/10.1016/B978-0-444-52149-1.00035-5.CrossRefGoogle Scholar
- 35.Yarimizu T, Nonklang S, Nakamura J, Tokuda S, Nakagawa T, Lorreungsil S, et al. Identification of auxotrophic mutants of the yeast Kluyveromyces marxianus by non-homologous end joining-mediated integrative transformation with genes from Saccharomyces cerevisiae. Yeast. 2013;30:485–500. https://doi.org/10.1002/yea.2985.CrossRefPubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.