Genome-wide association analysis in tetraploid potato reveals four QTLs for protein content
- 555 Downloads
Valorisation of tuber protein is relevant for the potato starch industry to create added-value and reduce impact on the environment. Hence, protein content has emerged as a key quality trait for innovative potato breeders. In this study, we estimated trait heritability, explored the relationship between protein content and tuber under-water weight (UWW), inferred haplotypes underlying quantitative trait loci (QTLs) and pinpointed candidate genes. We used a panel of varieties (N = 277) that was genotyped using the SolSTW 20 K Infinium single-nucleotide polymorphism (SNP) marker array. Protein content data were collected from multiple environments and years. Our genome-wide association study (GWAS) identified QTLs on chromosomes 3, 5, 7 and 12. Alleles of StCDF1 (maturity) were associated with QTLs found on chromosome 5. The QTLs on chromosomes 7 and 12 are presented here for the first time, whereas those on chromosomes 3 and 5 co-localized with loci reported in earlier studies. The candidate genes underlying the QTLs proposed here are relevant for functional studies. This study provides resources for genomics-enabled breeding for protein content in potato.
KeywordsProtein content Potato Tetraploid Haplotypes Genome-wide association analysis (GWAS) Candidate genes
Global population growth, accompanied with increased consumer wealth, will change food consumption patterns worldwide (Tilman and Clark 2014). By the year 2050, the projected demand for protein from animal sources is expected to double from 2000 (Alexandratos 1999). This trend raises sustainability and food security concerns, as the intensive production of animal protein adds pressure on the environment—as vast amounts scarce (non-renewable) resources such as land, water and minerals are needed. On the contrary, the production of plant protein is more sustainable for the environment as less resources are needed (Sabaté and Soret 2014).
Potato (Solanum tuberosum L.) is a well-known starch crop. However, few realise that the potato crop also serves as an abundant source of plant protein (Jørgensen et al. 2006). Although protein content in potato tubers is relatively low (0.32–1.63%) (Bárta et al. 2012; Klaassen et al. 2019; Ortiz-Medina 2006), protein yield per hectare (ha) is eminent due to the high-yielding ability and high harvest-index of the potato crop that can reach up to 124 ton−1ha (Kunkel and Campbell 1987). The potato starch industry processes potatoes to produce starch and by-products. After starch is extracted from tubers, potato fruit juice (PFJ) is released as a major aqueous by-product that contains protein. After proteins are extracted from PFJ, functional (native) potato protein isolates may be utilized in high-end food and pharmaceutical applications that include foaming agents, anti-oxidants, emulsifiers (Creusot et al. 2011; Edens et al. 1999; Kudo et al. 2009), inhibitors of faecal proteolytic compounds that cause dermatitis (Ruseler-van Embden et al. 2004) and satiety agents (Hill et al. 1990). Therefore, valorisation of protein provides opportunities to create added-value for the potato starch industry. Consequently, innovative firms in the industry are keen to use protein-rich potato varieties and therefore high protein content has emerged as a key quality trait for breeders. However, breeding for protein content in potato is challenging due to the complex genetic basis underlying the trait (Klaassen et al. 2019). To facilitate breeding for protein content in potato, improved comprehension of the inheritance, quantitative trait loci (QTLs) and relationships with other agronomical relevant traits are useful.
Knowledge on the inheritance of protein content in potato is limited. To the best of our knowledge, three genetic studies on protein content in bi-parental populations have been published (Acharjee et al. 2018; Klaassen et al. 2019; Werij 2011). These studies estimated moderate levels of trait heritability (40–74%) and identified minor-effect QTLs on chromosomes 1, 2, 3, 5 and 9 in both non-cultivated diploid and cultivated tetraploid potato germplasm. As for other crops that include soybean, maize and wheat (Balyan et al. 2013; Hwang et al. 2014; Karn et al. 2017), protein content has been described as a complex trait that is regulated by a plethora of interactions between genetic and environmental factors. Therefore, QTLs for protein content in heterozygous tetraploid (2n = 4x = 48) potato are likely to be affected by both epistasis and environmental factors.
Genome-wide association studies (GWAS) have been used as a method to dissect the genetic architecture of complex traits in multiple species that include potato (Rosyara et al. 2016; Sharma et al. 2018). As opposed to genetic studies performed on bi-parental populations, GWAS offers the advantage to identify QTLs within a panel of diverse individuals, and to potentially gain a high mapping resolution for identifying candidate genes.
In this study, we carried out a GWAS to dissect the genetics of protein content in a panel of tetraploid potato. We report on the relationship between protein content and tuber under-water weight (a proxy for starch content), haplotypes underlying QTLs and putative candidate genes.
Materials and methods
The panel (N = 277) consisted of tetraploid (2n = 4x = 48) individuals. The panel was composed of 189 varieties (D’hoop et al. 2008) and 88 starch potato progenitors that originated from five potato breeding companies (Agrico, Averis Seeds, C. Meijer, HZPC and KWS) (Supplementary Table 1). These included both modern and old individuals from different market segments and geographic origins. Analysis of population structure in the panel displayed three sub-populations, as reported earlier (D’hoop et al. 2008; Vos 2016). These sub-populations, hereafter referred to as “Processing”, “Other” and “Starch”, were used for analyses.
Raw phenotypic data were collected over years and locations (multi-location, multi-year) from unbalanced field trials that were carried out in the Netherlands. These trials were carried out in years 2008–2010 in Bant, Emmeloord, Metslawier, Rilland and Valthermond. The accessions were replicated three times or more, except for nine accessions that were replicated twice or once. A replicate (experimental unit) consisted of a four-plant plot within a row in the field. Raw phenotypic data were used to compute the BLUEs for the accessions. The trials were carried out during the conventional potato growing seasons in the Netherlands as described by D’hoop et al. (2008). Uniform seed tubers were used as planting material and were propagated at a single location 1 year prior to the trials. The seed potatoes were planted at 75-cm spacing between the rows and 35 cm between the hills. Guard rows were used to separate the plots in the trial. Regular husbandry practices for potato production in the Netherlands were carried out during the field trials. After harvest, the tubers were stored under cool conditions prior to use.
Quantification of phenotypes
Soluble protein content in potato fruit juice (PFJ) was determined by using the bicinchoninic acid (BCA) assay (Smith et al. 1985). Bovine serum albumin (BSA) was used as a standard. Protein content was quantified as described by Klaassen et al. (2019). Tuber under-water weight (UWW), a proxy for starch content, was quantified as described in a previous study (Bradshaw et al. 2008).
Best linear unbiased estimates
Genotyping and genotype calling
The panel was genotyped using the SolSTW 20 K Infinium SNP marker array (Vos et al. 2015). Genotype calling (assignment of SNP allele dosages) were carried out by using fitTetra (Voorrips et al. 2011) and Illumina GenomeStudio software version 2010.3 (Illumina, San Diego, CA, USA), as described by Vos et al. (2015). The threshold for minor-allele frequency (MAF) was set at 1.5% (equivalent to 6% for tetraploid potato with four sets of homologous chromosomes). After filtering, 14,436 high-quality SNP markers were used for GWAS. The physical coordinates of the SNPs were based on the potato reference genome, i.e. pseudomolecules v4.03 (PGSC 2011).
Population structure analysis
The population structure of the panel was analysed by using STRUCTURE software package v2.3.4 (Pritchard et al. 2000). Ten runs were performed to estimate the K values using 2000 randomly selected SNPs. A Markov chain Monte Carlo (MCMC) burn-in period of 10,000 was used and the number of iterations was set at 10,000. The appropriate number of sub-populations were determined from delta K and optimal K values (Evanno et al. 2005) based on output data derived from STRUCTURE Harvester (Earl and vonHoldt 2012) (http://taylor0.biology.ucla.edu/structureHarvester/). Membership probability estimates from thirty runs were averaged and used to assign each individual to cluster groups (sub-populations). The sub-populations were denoted as “Processing”, “Other” and “Starch”, based on prior knowledge that these three sub-populations existed in the panel (D’hoop et al. 2008; Vos 2016).
Genome-wide association study
In the equations 3, 4 and 5, “Y” represents the BLUEs, “X” represents the SNP markers (fixed effect), “K” represents the random kinship (co-ancestry) matrix and “A” represents the SNP marker set as cofactor (fixed). The term “ε” represents the vector of random residual errors. The term “α” represents the estimated SNP effects, “β” represents the estimated effect of the SNP marker set as cofactor and “μ” represents the estimated kinship variance component. Analyses were performed in R software package GWASpoly (Rosyara et al. 2016). Phenotypic variance explained (R2) by SNPs were calculated from squared correlation coefficients between the BLUEs and SNP dosage scores (allele copy number).
Significance threshold and QTL support interval
We used several significance thresholds to identify QTLs. To correct for multiple testing, we used the 5% Bonferroni threshold (−log10(P) = 5.3). The Bonferroni threshold is known to inflate the probability of Type II errors (false-negative findings) in the presence of high linkage disequilibrium between markers (Gao et al. 2008; Johnson et al. 2010). Therefore, the 5% Li and Ji threshold was also computed by correlated multiple testing (Li and Ji 2005) (−log10(P) = 3.9). Correlated multiple testing was conducted at α = 0.05, to adjust for the effective number of independent tests and compensate for Type II errors. For naive analyses, permutation testing was carried out with N = 1000 permutations at α = 0.05 to define the threshold value (−log10(P) = 5.0) (Churchill and Doerge 1994) (−log10(P) = 5.0). The support intervals of QTLs were set at 1.5 Mbp for non-introgressed regions and 2.5 Mbp for introgressed regions as described by Vos (2016).
Determination of haplotypes underlying QTLs was performed using a contemporary haplotype inference method developed in tetraploid potato (Willemsen 2018). This method estimated the linkage phase between pairs of SNP markers, followed by joining of linked SNPs into haplotypes. Only SNPs exceeding the Li and Ji threshold (−log10(P) = 3.9) were used for haplotype construction and to obtain the dosages of the haplotypes.
SNP allele frequency
Phenotypic variation of the traits (BLUEs)
Variance components and heritability estimates for the traits
Protein content PFJ
G × L
G × Y
G × L × Y
Identification of QTLs
Results from kinship-corrected GWAS on the panel (N = 277)
QTL peak position (bp)
Freq. of Alt SNP variant (%)
To pinpoint putative candidate genes from the genomic regions underlying the QTLs, linkage disequilibrium (LD)-based QTL support intervals were used as described by Vos et al. (2017). The genes underlying these intervals were retrieved from the potato reference genome (PGSC 2011). From the longlists of genes (Supplementary Table 4), putative candidates were selected based on their annotation (gene name). As a result, the QTL interval on chromosome 3 co-localized with a nitrate transporter (60.09 Mbp). The interval on chromosome 5 harboured StCDF1 (4.54 Mbp) and a cluster of nine nitrate transporters (6.00–7.52 Mbp). No obvious candidate genes could be proposed to be implicated with the QTL on chromosome 7.
Haplotypes underlying QTLs
A contemporary approach by Willemsen (2018) was used to determine the haplotype-specificity of the SNP markers underlying the QTLs. Results showed that all SNPs underlying the QTL on chromosome 5 (that exceeded the Li and Ji threshold) were haplotype-specific (Table 2). These SNPs were haplotype-specific for a late maturity allele of StCDF1 (Supplementary Table 2), as proposed by Willemsen (2018). Moreover, these SNPs also tagged a unique introgression segment from wild potato (Solanum vernei Bitter & Wittm.) as described by van Eck et al. (2017). Over the years, this introgression segment has been used by potato breeders to introduce resistance against Globodera pallida nematodes (the so-called Gpa5 locus) in the genepool of cultivated potato (Rouppe van der Voort et al. 2000; Van Eck et al. 2017). Graphical genotypes of the panel, as performed by van Eck et al. (2017), illustrated that this introgression segment was mainly present in the starch varieties and starch progenitors. For these varieties and progenitors, the introgression segment was found to be present in either simplex (a single copy) or duplex (two copies) form (Supplementary Fig. 4). The SNPs underlying the QTLs on chromosome 3 and 7 were not found to be haplotype-specific.
Variance explained by multiple QTLs
By using multiple linear regression, we tested the cumulative effect of multiple significant SNP markers underlying QTLs together. The SNPs underlying the QTLs on chromosomes 3, 5 and 7 together explained 22% of the variance (Supplementary Table 3). When the SNP on chromosome 5 was excluded, the QTLs on chromosomes 3 and 7 together explained 21%. The combination of SNPs on chromosomes 5 and 7 jointly explained 20%. The QTLs on chromosomes 3 and 5 jointly explained less variance (13%).
Results from kinship-corrected GWAS on sub-populations “Starch” and “Other”
QTL peak position (bp)
Minor allele freq. (%)
(N = 106)
(N = 136)
To verify whether or not the QTL at the start of chromosome 5 was associated with plant maturity (StCDF1) (Kloosterman et al. 2013), we performed conditional kinship-corrected GWAS on the sub-population “Other” by using the SNP marker “PotVar0079081” as a cofactor that tags the early maturity allele (StCDF1.1), as described by Willemsen (2018). This approach, reduced the significance of the original QTL at the start of chromosome 5 (from −log10(P) = 4.46 down to 3.19) (Fig. 3). This finding suggested that the maturity score of potato varieties, as largely controlled by StCDF1.1, indirectly influenced protein content in this sub-population. By performing the cofactor analysis, an otherwise masked QTL was uncovered at the end of chromosome 12 (Peak SNP: “PotVar0052807”; 59,294,858 bp; −log10(P) = 4.63). Naive GWAS on the sub-population “Other” showed inflated associations that probably caused false-positive QTLs on chromosomes 1, 2, 3, 4, 5, 7 and 10 (Supplementary Fig. 5).
GWAS as a tool to detect QTLs
We used GWAS to shed light on the complex genetic architecture of protein content in potato. We identified QTLs with minor effects on chromosomes 3, 5, 7 and 12 (Fig. 2; Fig. 3). The QTLs identified on chromosomes 3 and 5, coincided with previous studies (Acharjee et al. 2018; Klaassen et al. 2019; Werij 2011). For chromosome 3, the QTL identified in the entire panel was also observed in the sub-population “Starch”. For chromosome 5, we uncovered an introgression segment from wild potato that was associated with protein content (Supplementary Fig. 4). This introgressed segment harboured a late maturity allele of StCDF1 (Supplementary Table 2), as well as the Gpa5 resistance allele against potato cyst nematodes (Globodera pallida). However, the SNPs tagging this introgression segment did not bring forth a QTL in the sub-population “Starch”, even though the allele frequency of these SNPs in this sub-population was considerable (9–10%). We also observed that the additive effect of this QTL was lower than expected when combined with the other two QTLs on chromosomes 3 and 7 (Supplementary Table 3). We showed that protein content was confounded with population structure in the panel. This result was likely caused by higher BLUEs values for protein content in the sub-population “Starch” (Supplementary Fig. 1). Therefore, we propose that the QTL on chromosome 5 in the panel could be an artefact. Validation studies, for instance using bi-parental mapping populations, may confirm the relevance of SNPs underlying this QTL for use in breeding to improve protein content. If these SNPs are to be used for breeding, they will at least provide a source of resistance against cyst nematodes and contribute towards a later maturity index due to StCDF1. In the sub-population “Other” we also identified a QTL at the start of chromosome 5. Conditional GWAS on this sub-population showed that this association was not caused by the introgression segment from wild potato. Instead, this QTL coincided with the early maturity allele of StCDF1 (StCDF1.1). Findings from GWAS on the panel as well as the sub-populations showed that different haplotypes at the start of chromosome 5 were associated with protein content.
To the best of our knowledge, the identified QTLs on chromosomes 7 and 12 have not been described before in literature. Bi-parental populations, that descend from crosses between protein-rich varieties, can be used to test/validate and stack multiple copies of favourable variants/alleles for multiple protein content QTLs simultaneously. For instance, the cross between the starch varieties Kartel × Seresta will allow the SNPs underlying all three QTLs identified in the panel here, to segregate in nulliplex (null), simplex (one) and duplex (two) dosages in the F1 progeny. This cross will provide improved insight into the cumulative effects of the underlying haplotypes. Our results, as presented in Supplementary Table 3, suggest both additive and epistatic effects of the SNPs (alleles). We observed that the effects of genotype-by-environment (G × E) interactions were small to moderate for protein content (Table 1). On the other hand, a large proportion of variance was ascribed to the residuals (error). Hence, future genetic studies on protein content may be improved by reducing the residual error in these experiments.
Studies in soybean, wheat and maize describe protein content as a complex trait that is governed by multiple genes and environmental factors. We estimated a moderate trait heritability for protein content (H2 = 0.48). This H2 value ranged between 40 and 74%, i.e. in line with previous studies (Klaassen et al. 2019; Werij 2011). GWAS on the panel identified three QTLs that cumulatively explained 22% of the variance. Hence, we demonstrate a clear example of missing heritability. Several factors may have contributed to this finding, that include the limited statistical power to detect loci with small effects, interactions between loci, effects or rare variants and potential banishment of true-positive QTLs due to kinship correction. Alternatively, overestimation of the broad sense heritability estimate (H2) may also have occurred. In any case, it should be noted that our H2 will be much larger than the narrow sense (h2) estimate.
To optimize the detection of QTLs by GWAS, the design and methodology should be considered carefully. Using more individuals will likely increase statistical power, as shown in numerous human and crop genetic studies, e.g. for soybean (Bandillo et al. 2015). Optimization of GWAS will likely identify loci with minor effects or those caused by rare variants with a low allele frequency. Certainly the population structure, distribution of the phenotypic values, as well as the ascertainment bias of SNPs in marker arrays should be considered beforehand as proposed by Vos (2016).
Correlation between tuber protein content and under-water weight
For other crops, a negative correlation is often observed between protein content and other major (seed) storage compounds, e.g. oil content in soybean (Patil et al. 2017). Interestingly, while expecting a similar trade-off in potato, we found a moderate positive correlation (r = 0.64) between protein content and under-water weight (UWW: a proxy for starch content) (Fig. 1). Therefore, selection pressure for high UWW in the starch genepool, aimed to increase starch content, may have coincided with unconscious selection for high protein content (Supplementary Fig. 1). Kinship-corrected GWAS on UWW in the panel did not identify potential associations between UWW and maturity alleles of StCDF1 at the start of chromosome 5 (Supplementary Fig. 6). The statistical power produced by the 277 individuals here may have been insufficient to uncover significant signals due to the complex (polygenic) genetic architecture of starch content in potato. A positive correlation between protein content and UWW suggests that these traits may be (partly) interrelated due to shared biological mechanisms. It is well established that photosynthesis-derived carbon and nitrogen assimilation pathways are connected and tightly controlled in plants. Molecular studies have shown that intracellular glucose is used by plants to synthesize both protein and starch (Bihmidine et al. 2013). Reduced levels of ADP-glucose (i.e. glucosyl donor of glucose) by inactivated ADP-glucose pyrophosphorylase (AGPase) in barley mutants, was accompanied with the downregulation of genes related to amino acid and storage protein biosynthesis (Faix et al. 2012). Therefore, the genes that regulate protein content in potato may affect starch content, yet this point remains to be addressed in future studies. Unravelling the positive correlation between protein and starch content in potato, will certainly be dealt with in future studies.
Putative candidate genes for protein content
To pinpoint putative candidate genes, we used LD-bound QTL support intervals to narrow down on genomic regions. This approach identified several candidates that included StCDF1 (maturity) and nitrate transporters (Supplementary Table 4). Conditional GWAS on the sub-population “Other” showed that a late maturity allele of StCDF1 was positively associated with protein content. Nitrate transporters are known to function in the uptake and allocation of inorganic nitrate (NO3) in plants (Hsu and Tsay 2013; Léran et al. 2014). Nitrate is the predominant nitrogen-containing macronutrient in aerobic soils under temperate climatic conditions. Hence, allelic variants of nitrate transporters may differ in nitrate uptake and interaction with nitrogen-responsive genes that ultimately affect protein content, as proposed for rice (Hu et al. 2015). Future molecular studies on the above mentioned candidate genes that include gene expression, overexpression and knock-out studies, are certainly relevant to study their biological functions and effects on protein content in potato.
The authors thank the potato-breeding companies from the Centre for Biosystems and Genomics (CBSG) consortium for providing the raw data.
MTK performed the GWAS, graphical genotype analysis and wrote the manuscript. JHW carried out the haplotype analysis. PGV performed the genotype calling. HJvE collected the nematode data. LMT and HJvE coordinated the project. LMT, CM and HJvE conceived the study and helped to draft the manuscript. All authors read and approved the final manuscript.
MTK was funded by Aeres University of Applied Sciences, Centre for Biobased Economy (CBBE), AVEBE and Averis Seeds. JHW was funded by the Dutch National Organisation for Scientific Research (NWO), under project no. 831.14.002. PGV was funded by Centre for Biosystems and Genomics (CBSG) and the breeding companies Agrico, Averis Seeds, HZPC, KWS and Meijer. These funds are gratefully acknowledged.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
The research described in this paper complies with the current laws of the country in which it was performed.
- Balyan HS, Gupta PK, Kumar S, Dhariwal R, Jaiswal V, Tyagi S, Agarwal P, Gahlaut V, Kumari S (2013) Genetic improvement of grain protein content and other health-related constituents of wheat grain. Plant Breed 132(5):446–457Google Scholar
- Bihmidine S, Hunter C, Johns C, Koch K, Braun D (2013) Regulation of assimilate import into sink organs: update on molecular drivers of sink strength. Front Plant Sci 4(177). https://doi.org/10.3389/fpls.2013.00177
- Edens L, Plijter JJ, Van DLJAB (1999) Novel food compositions. Google Patents Google Scholar
- Faix B, Radchuk V, Nerlich A, Hümmer C, Radchuk R, Emery RN, Keller H, Götz KP, Weschke W, Geigenberger P (2012) Barley grains, deficient in cytosolic small subunit of ADP-glucose pyrophosphorylase, reveal coordinate adjustment of C: N metabolism mediated by an overlapping metabolic-hormonal control. Plant J 69(6):1077–1093CrossRefGoogle Scholar
- Kloosterman B, Abelenda JA, Gomez MDMC, Oortwijn M, de Boer JM, Kowitwanich K, Horvath BM, van Eck HJ, Smaczniak C, Prat S, Visser RGF, Bachem CWB (2013) Naturally occurring allele diversity allows potato cultivation in northern latitudes. 495 (7440):246–250Google Scholar
- Ortiz-Medina E (2006) Potato tuber protein and its manipulation by chimeral disassembly using specific tissue explantation for somatic embryogenesis. PhD dissertation. McGill University. Department of Plant Science. Montreal, Quebec, CanadaGoogle Scholar
- Patil G, Mian R, Vuong T, Pantalone V, Song Q, Chen P, Shannon GJ, Carter TC, Nguyen HT (2017) Molecular mapping and genomics of soybean seed protein: a review and perspective for the future. Theor Appl Genet 130(10):1975–1991. https://doi.org/10.1007/s00122-017-2955-8 CrossRefPubMedPubMedCentralGoogle Scholar
- Rouppe van der Voort J, van der Vossen E, Bakker E, Overmars H, van Zandvoort P, Hutten R, Klein Lankhorst R, Bakker J (2000) Two additive QTLs conferring broad-spectrum resistance in potato to Globodera pallida are localized on resistance gene clusters. Theor Appl Genet 101(7):1122–1130. https://doi.org/10.1007/s001220051588 CrossRefGoogle Scholar
- Vos P (2016) Development and application of a 20K SNP array in potato. PhD dissertation. Chair group: Plant breeding. Wageningen University, Wageningen, the Netherlands. Retrieved from http://edepot.wur.nl/392278. Accessed 10 Jan 2017
- Werij JS (2011) Genetic analysis of potato tuber quality traits. PhD dissertation. Wageningen University, Wageningen, The Netherlands. Retrieved from http://edepot.wur.nl/183746. Accessed 13 Oct 2016
- Willemsen JH (2018) The identification of allelic variation in potato. PhD dissertation. Chair group: Plant breeding. Wageningen University, Wageningen, the NetherlandsGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.