Genetic Resources and Crop Evolution

, Volume 66, Issue 7, pp 1469–1482 | Cite as

Genomic characterization of the Native Seeds/SEARCH common bean (Phaseolus vulgaris L.) collection and its seed coat patterns

  • Di Wu
  • Joy Hought
  • Matheus Baseggio
  • John P. Hart
  • Michael A. Gore
  • Daniel C. IlutEmail author
Research Article


Common bean (Phaseolus vulgaris L.) is one of the most important legume crops for human consumption. The Native Seeds/SEARCH common bean collection consists of locally-adapted accessions collected from the southwestern US and northwestern Mexico. In this study, a representative panel of nearly 300 accessions from this collection was genotyped with more than 10,000 high-quality SNP markers and phenotyped for seed coat patterns. The collection consists primarily of accessions from the Mesoamerican gene pool, and they separate into three distinct subpopulations, with strong population differentiation (FST > 0.4) observed between them. Through a genome-wide association study with the Mesoamerican accessions, we identified several SNPs on chromosome 8 that are associated with seed coat pattern traits and reside proximal to the putative location of the C locus, a locus previously shown to control the pattern of the seed coat. Five myb transcription factors linked to these SNPs were identified as candidate causal genes for seed coat patterns controlled by the C locus. Furthermore, we identified a potentially novel locus on chromosome 10 that appears to control the Anasazi seed coat phenotype. Our work is the first to characterize the genetic diversity of the Native Seeds/SEARCH common bean collection, providing valuable genetic information for germplasm conservation efforts.


Common bean Genome-wide association study Population genetics Seed coat pattern 



Bayesian information criterion


False discovery rate


Index of panmixia


Fixation index




Genome-wide association study




Linkage disequilibrium


Multi-locus mixed model


National plant germplasm system


Native Seeds/SEARCH




Principal component analysis


Site frequency spectrum


Single-nucleotide polymorphism



This research was supported by Cornell University startup funds (M.A.G.). We thank the Department of Energy Joint Genome Institute and collaborators (Scott Jackson, Phil McClean, and Jeremy Schmutz), for pre-publication access to release v2.1 of the Phaseolus vulgaris (common bean) genome and annotation. We thank Dr. Phil McClean for seeds of G19833. We are grateful for the help from Nicholas Kaczmar for planting and harvesting, as well as staff of Cornell’s Guterman Bioclimatic Laboratory for greenhouse management.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

10722_2019_823_MOESM1_ESM.pdf (922 kb)
ESM_1. Supplemental Fig. S1 Visual examples of the seed coat pattern classification used in this study. (PDF 922 kb)
10722_2019_823_MOESM2_ESM.pdf (678 kb)
ESM_2. Supplemental Fig. S2 Visual examples of the seed coat pattern for all six accessions classified as “Anasazi.” (PDF 677 kb)
10722_2019_823_MOESM3_ESM.pdf (89 kb)
ESM_3. Supplemental Fig. S3 Schematic overview of the SNP filtering pipeline. SNP Set 0 was used to identify unintended sample duplicates. SNP Set I was used to separate accessions into Andean and Mesoamerican groups. SNP Set II was used for GWAS. SNP Set III was used for population genetics analysis. (PDF 89 kb)
10722_2019_823_MOESM4_ESM.pdf (8 kb)
ESM_4. Supplemental Fig. S4 Principal component analysis of SNP Set I. Mesoamerican NPGS and Andean NPGS classifications are based on a priori annotation of NPGS accessions. Mesoamerican NSS and Andean NSS classifications are based on K-means clustering (K = 2) along PC1 with the respective group of NPGS accessions. (PDF 7 kb)
10722_2019_823_MOESM5_ESM.pdf (7 kb)
ESM_5. Supplemental Fig. S5 Principal component analysis of SNP Set II. Subpopulation labels are based on fastSTRUCTURE analysis of SNP Set III, with a subpopulation assignment criterion of Q ≥ 0.8. (PDF 6 kb)
10722_2019_823_MOESM6_ESM.pdf (43 kb)
ESM_6. Supplemental Fig. S6 Neighbour joining phylogenetic tree of the NS/S (SP1, SP2, SP3, SP mixed) and NPGS (NPGS Durango, NPGS Jalisco, NPGS Mesoamerica) Mesoamerican accessions, rooted using the reference genome accession (G19833, 8 samples, Andean genotype). NPGS accession PI 615391, used in both our study and Kwak and Gepts (2009), is indicated by the dotted line and label. Subpopulation labels are based on fastSTRUCTURE analysis of SNP Set III, with a subpopulation assignment criterion of Q ≥ 0.8. (PDF 42 kb)
10722_2019_823_MOESM7_ESM.pdf (111 kb)
ESM_7. Supplemental Fig. S7 GWAS results of three-class and binary coding of seed coat pattern traits. SNPs selected by the MLMM as significantly associated with the trait are shown in red, while the remaining SNPs are shown in black. (PDF 110 kb)
10722_2019_823_MOESM8_ESM.pdf (5 kb)
ESM_8. Supplemental Fig. S8 Linkage disequilibrium (LD) estimates between SNPs in the NS/S Mesoamerican population of 281 accessions. The distribution of SNPs at different percentile cutoffs are indicated by the labeled lines. Median LD is indicated by the solid line labelled 50%, which decays to background levels (r2 < 0.1) at a physical distances beyond 128 Kb. (PDF 4 kb)
10722_2019_823_MOESM9_ESM.pdf (12 kb)
ESM_9. Supplemental Fig. S9 Histograms of latitude distributions for accessions used in this study and three previous studies. The red color indicates accessions collected from Mexico. (PDF 12 kb)
10722_2019_823_MOESM10_ESM.pdf (9 kb)
ESM_10. Supplemental Fig. S10 Subpopulation composition of the six accessions with Anasazi phenotypes using fastSTRUCTURE results. Three accessions (15DW320, 15DW324, 15DW306) are assigned to SP1 at Q ≥ 0.5, but classified as mixed at Q ≥ 0.8. (PDF 9 kb)
10722_2019_823_MOESM11_ESM.csv (30 kb)
ESM_11. Supplemental Table S1 The complete list of 375 accessions included in this study and relevant metadata. The “Group ID” column indicates whether or not an accession was part of an unintended sample duplicate group, and the “Choice” column indicates whether or not an accession was selected as a representative for its unintended sample duplicate group. The NS/S-assigned accession number, catalog number, common name, and phenotype description are reported for accessions sourced from NS/S. For NPGS sourced accessions, the data refers to annotation from the GRIN database. The race of each NPGS accession was inferred using the documented seed type and morphology description with criteria from Singh et al. (1991). (CSV 29 kb)
10722_2019_823_MOESM12_ESM.csv (13 kb)
ESM_12. Supplemental Table S2 Accession geographical provenance and seed coat pattern trait encoding for the NS/S accessions of Mesoamerican origin. The last column indicates the corresponding seed coat pattern and color descriptors according to IBPGR (1982) and Kornerup (1967). (CSV 12 kb)
10722_2019_823_MOESM13_ESM.csv (0 kb)
ESM_13. Supplemental Table S3 Summary of the four SNP sets used in this study. (CSV 0 kb)
10722_2019_823_MOESM14_ESM.csv (0 kb)
ESM_14. Supplemental Table S4FST results of all pairwise comparisons between the three subpopulations within the NS/S Mesoamerican population. (CSV 0 kb)
10722_2019_823_MOESM15_ESM.csv (1 kb)
ESM_15. Supplemental Table S5 BLAST results for STS marker sequences for a priori seed coat color and pattern genes. Putative genomic locations of seven out of the 11 STS markers described in Table 2 of McClean et al. (2002) are presented. (CSV 1 kb)
10722_2019_823_MOESM16_ESM.csv (2 kb)
ESM_16. Supplemental Table S6 Seed coat pattern trait GWAS results and genomic location information for MLMM selected SNPs. (CSV 1 kb)
10722_2019_823_MOESM17_ESM.csv (45 kb)
ESM_17. Supplemental Table S7 All annotated genes within 128 Kb of the MLMM selected SNPs reported in Supplemental Table S6. (CSV 45 kb)


  1. Acampora A, Ciaffi M, De Pace C et al (2007) Pattern of variation for seed size traits and molecular markers in Italian germplasm of Phaseolus coccineus L. Euphytica 157:69–82CrossRefGoogle Scholar
  2. Aharoni A, De Vos CH, Wein M et al (2001) The strawberry FaMYB1 transcription factor suppresses anthocyanin and flavonol accumulation in transgenic tobacco. Plant J 28:319–332CrossRefGoogle Scholar
  3. Allard RW (1953) Inheritance of some seed-coat colors and patterns in lima beans. Hilgardia 22:167–177CrossRefGoogle Scholar
  4. Ariani A, Berny Mier Y, Teran JC, Gepts P (2018) Spatial and temporal scales of range expansion in wild Phaseolus vulgaris. Mol Biol Evol 35:119–131CrossRefGoogle Scholar
  5. Bassett MJ (2007) Genetics of seed coat color and pattern in common bean. Plant Breed Rev 28:239–315Google Scholar
  6. Bassett MJ, Hartel K, McClean P (2000) Inheritance of the Anasazi pattern of partly colored seedcoats in common bean. J Am Soc Hortic Sci 125:340–343CrossRefGoogle Scholar
  7. Bassett MJ, Lee R, Otto C, McClean PE (2002a) Classical and molecular genetic studies of the strong greenish yellow seedcoat color in ‘Wagenaar’ and ‘Enola’ common bean. J Am Soc Hortic Sci 127:50–55CrossRefGoogle Scholar
  8. Bassett MJ, Lee R, Symanietz T, McClean PE (2002b) Inheritance of reverse margo seedcoat pattern and allelism between the genes J for seedcoat color and L for partly colored seedcoat pattern in common bean. J Am Soc Hortic Sci 127:56–61CrossRefGoogle Scholar
  9. Bemis WP (1957) Inheritance of a base seed-coat color factor in lima beans. J Hered 48:124–127CrossRefGoogle Scholar
  10. Bitocchi E, Bellucci E, Giardini A et al (2012a) Molecular analysis of the parallel domestication of the common bean (Phaseolus vulgaris) in Mesoamerica and the Andes. New Phytol 197:300–313CrossRefGoogle Scholar
  11. Bitocchi E, Nanni L, Bellucci E et al (2012b) Mesoamerican origin of the common bean is revealed by sequence data. Proc Natl Acad Sci USA 109:E788–E796CrossRefGoogle Scholar
  12. Blair MW, Giraldo MC, Buendía HF et al (2006) Microsatellite marker diversity in common bean (Phaseolus vulgaris L.). Theor Appl Genet 113:100–109CrossRefGoogle Scholar
  13. Boodley JW, Sheldrake R (1972) Cornell peat-lite mixes for commercial growing. Inf Bull 43:1–8Google Scholar
  14. Bradbury PJ, Zhang Z, Kroon DE et al (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633–2635CrossRefGoogle Scholar
  15. Broughton WJ, Hernández G, Blair M et al (2003) Beans (Phaseolus spp.)—model food legumes. Plant Soil 252:55–128CrossRefGoogle Scholar
  16. Burgess MA (1994) Cultural responsibility in the preservation of local economic plant resources. Biodivers Conserv 3:126–136CrossRefGoogle Scholar
  17. Caldas GV, Blair MW (2009) Inheritance of seed condensed tannins and their relationship with seed-coat color and pattern genes in common bean (Phaseolus vulgaris L.). Theor Appl Genet 119:131–142CrossRefGoogle Scholar
  18. Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinform 10:421CrossRefGoogle Scholar
  19. Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–771CrossRefGoogle Scholar
  20. Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158CrossRefGoogle Scholar
  21. Eitzinger A, Läderach P, Rodriguez B et al (2017) Assessing high-impact spots of climate change: spatial yield simulations with Decision Support System for Agrotechnology Transfer (DSSAT) model. Mitig Adapt Strateg Glob Change 22:743–760CrossRefGoogle Scholar
  22. Elshire RJ, Glaubitz JC, Sun Q et al (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:e19379CrossRefGoogle Scholar
  23. Endelman JB, Jannink J-L (2012) Shrinkage estimation of the realized relationship matrix. G3 2:1405–1413CrossRefGoogle Scholar
  24. FAO (2018) FAOSTAT. In: FAOSTAT. Accessed 4 Nov 2018
  25. Feenstra WJ (1960) Biochemical aspects of seedcoat colour inheritance in Phaseolus vulgaris L. VeenmanGoogle Scholar
  26. Glaubitz JC, Casstevens TM, Lu F et al (2014) TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE 9:e90346CrossRefGoogle Scholar
  27. Hill WG, Weir BS (1988) Variances and covariances of squared linkage disequilibria in finite populations. Theor Popul Biol 33:54–78CrossRefGoogle Scholar
  28. Hong M, Hu K, Tian T et al (2017) Transcriptomic analysis of seed coats in yellow-seeded Brassica napus reveals novel genes that influence proanthocyanidin biosynthesis. Front Plant Sci. Google Scholar
  29. Hudson RR, Slatkin M, Maddison WP (1992) Estimation of levels of gene flow from DNA sequence data. Genetics 132:583–589Google Scholar
  30. Jones AL (1999) Phaseolus bean post-harvest operationsGoogle Scholar
  31. Kalavacharla V, Liu Z, Meyers BC et al (2011) Identification and analysis of common bean (Phaseolus vulgaris L.) transcriptomes by massively parallel pyrosequencing. BMC Plant Biol 11:135CrossRefGoogle Scholar
  32. Kornerup A (1967) Methuen handbook of colour. Hastings House Pub, New YorkGoogle Scholar
  33. Kour A, Boone AM, Vodkin LO (2014) RNA-Seq profiling of a defective seed coat mutation in Glycine max reveals differential expression of proline-rich and other cell wall protein transcripts. PLoS ONE 9:e96342CrossRefGoogle Scholar
  34. Kwak M, Gepts P (2009) Structure of genetic diversity in the two major gene pools of common bean (Phaseolus vulgaris L., Fabaceae). Theor Appl Genet 118:979–992CrossRefGoogle Scholar
  35. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359CrossRefGoogle Scholar
  36. Lipka AE, Tian F, Wang Q et al (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28:2397–2399CrossRefGoogle Scholar
  37. Liu C, Jun JH, Dixon RA (2014) MYB5 and MYB14 play pivotal roles in seed coat polymer biosynthesis in Medicago truncatula. Plant Physiol 165:1424–1439CrossRefGoogle Scholar
  38. Mamidi S, Rossi M, Moghaddam SM et al (2013) Demographic factors shaped diversity in the two gene pools of wild common bean Phaseolus vulgaris L. Heredity 110:267–276CrossRefGoogle Scholar
  39. McClean PE, Lee RK, Otto C et al (2002) Molecular and phenotypic mapping of genes controlling seed coat pattern and color in common bean (Phaseolus vulgaris L.). J Hered 93:148–152CrossRefGoogle Scholar
  40. Nesi N, Jond C, Debeaujon I et al (2001) The Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as a key determinant for proanthocyanidin accumulation in developing seed. Plant Cell 13:2099–2114CrossRefGoogle Scholar
  41. Prakken R (1974) Inheritance of colours in Phaseolus vulgaris L. IV Recombination within the “Complex Locus C”. Meded Landbouwhogesch Wageningen 24:1–36Google Scholar
  42. Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909CrossRefGoogle Scholar
  43. R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  44. Raj A, Stephens M, Pritchard JK (2014) fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197:573–589CrossRefGoogle Scholar
  45. Romay MC, Millard MJ, Glaubitz JC et al (2013) Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol 14:R55CrossRefGoogle Scholar
  46. Schmutz J, McClean PE, Mamidi S et al (2014) A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 46:707–713CrossRefGoogle Scholar
  47. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464CrossRefGoogle Scholar
  48. Segura V, Vilhjálmsson BJ, Platt A et al (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44:825–830CrossRefGoogle Scholar
  49. Singh SP, Gepts P, Debouck DG (1991) Races of common bean (Phaseolus vulgaris, Fabaceae). Econ Bot 45:379–396CrossRefGoogle Scholar
  50. Stacklies W, Redestig H, Scholz M et al (2007) pcaMethods—a bioconductor package providing PCA methods for incomplete data. Bioinformatics 23:1164–1167CrossRefGoogle Scholar
  51. Swarts K, Li HH, Navarro JAR et al (2014) Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome. Google Scholar
  52. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595Google Scholar
  53. van Schoonhoven A (1991) Common beans: research for crop improvement. CAB International, Centro Internacional de Agricultura Tropical (CIAT), CaliGoogle Scholar
  54. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423CrossRefGoogle Scholar
  55. Wortmann CS (1998) Atlas of common bean (Phaseolus vulgaris L.) production in Africa. Centro Internacional de Agricultura Tropical (CIAT), Cali (CIAT publication no. 297)Google Scholar
  56. Yang K, Jeong N, Moon J-K et al (2010) Genetic analysis of genes controlling natural variation of seed coat and flower colors in soybean. J Hered 101:757–768CrossRefGoogle Scholar
  57. Zabala G, Vodkin LO (2014) Methylation affects transposition and splicing of a large CACTA transposon from a MYB transcription factor regulating anthocyanin synthase genes in soybean seed coats. PLoS ONE 9:e111959CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Plant Breeding and Genetics Section, School of Integrative Plant ScienceCornell UniversityIthacaUSA
  2. 2.Native Seeds/SEARCHTucsonUSA
  3. 3.USDA-ARS Tropical Agriculture Research StationMayagüezUSA

Personalised recommendations