Genetic structure analysis and selection of a core collection for carob tree germplasm conservation and management

  • M. Di Guardo
  • F. Scollo
  • A. Ninot
  • M. Rovira
  • J. F. Hermoso
  • G. Distefano
  • S. La MalfaEmail author
  • I. Batlle
Original Article
Part of the following topical collections:
  1. Population structure


Carob (Ceratonia siliqua L.) is an important evergreen tree of the Mediterranean landscape. Its economic interest is increasing thanks also to the presence, in the seeds, of the locust bean gum (LBG), a galactomannan largely used by the food industry as a stabilizer and thickening agent. Its economic and ecological values make the understanding of carob genetic diversity of great interest both for breeding and conservation purposes. The world’s largest carob germplasm collection was genotyped using both eight carob-specific nuclear short sequence repeat (nSSR) and the sequencing of a chloroplast locus. The collection is composed of 215 accessions introduced from 12 countries of origin spanning from traditional to novel areas of cultivation. To assess the genetic diversity of the collection, several approaches were coupled: structure analysis, principal component analysis (PCA), and graphic clustering either from dissimilarity data and coancestry data. Structure analysis suggested the presence of two distinct genetic pools: one characterizing northeastern Spain and the second spread in other countries and southern Spain. The PCA and discriminant analysis of principal component (DAPC) complemented the structure results allowing a better understanding of the genetic differences between countries while the network-joining analysis provided additional insights on the similarity between individuals. Short sequence repeat (SSR) data coupled with phenotypic data (floral sex and status) were also used to define the first core collection of carobs. Multi-approach analysis of genetic diversity together with the definition of a core collection represent useful tools for the setup of genetic-guided intervention both for conservation and breeding purposes.


Ceratonia siliqua L. Simple sequence repeats Chloroplast Genetic structure Principal component analysis 



The authors gratefully acknowledge Alex Baumel, Aix Marseille Universite, Marseille, France, for providing matK4LF sequence. A.N., M.R., J.F.H., and I.B. are grateful to CERCA Institution from the Generalitat of Catalonia for its support.

This study has been partially supported by the international project DYNAMIC “Deciphering sYmbiotic Networks in cArob-based MedIterranean agro-eCosystems” ( supported by the French national research agency (ANR-14-CE02-0016).

Data archiving statement

Raw genotypic data were presented in Supplementary Table 2.

Supplementary material

11295_2019_1345_MOESM1_ESM.docx (83 kb)
Supplementary Table 1 (DOCX 82 kb)
11295_2019_1345_MOESM2_ESM.docx (48 kb)
Supplementary Table 2 (DOCX 48 kb)
11295_2019_1345_MOESM3_ESM.docx (14 kb)
Supplementary Table 3 (DOCX 13 kb)
11295_2019_1345_MOESM4_ESM.docx (13 kb)
Supplementary Table 4 (DOCX 12 kb)
11295_2019_1345_Fig8_ESM.png (145 kb)
Supplementary Figure 1

Electropherograms of the sequencing of the matK4LF chloroplast locus. The alignment of the sequencing data allowed the detection of a T/G polymorphism at approximately 622 bp position (PNG 144 kb)

11295_2019_1345_MOESM5_ESM.tiff (1 mb)
High resolution image (TIFF 1054 kb)
11295_2019_1345_Fig9_ESM.png (742 kb)
Supplementary Figure 2

Log likelihood curve (ΔK) calculated according to Evanno et al. (2005) plotted against increasing K value. A: plot of the whole carob collection, B: plot of the subset of individuals deemed as Subpop 1, C: plot of the subset of individuals deemed as Subpop 2 (PNG 741 kb)

11295_2019_1345_MOESM6_ESM.tif (401 kb)
High resolution image (TIF 400 kb)
11295_2019_1345_Fig10_ESM.png (165 kb)
Supplementary Figure 3

Absolute frequency of the sex of the accessions according to the country/region of origin (PNG 164 kb)

11295_2019_1345_MOESM7_ESM.tif (524 kb)
High resolution image (TIF 523 kb)
11295_2019_1345_Fig11_ESM.png (77 kb)
Supplementary Figure 4

Relative frequency of the distribution of the two haplotypes detected through the sequencing of the matK4LF chloroplast locus (PNG 76 kb)

11295_2019_1345_MOESM8_ESM.tiff (28.6 mb)
High resolution image (TIFF 29297 kb)
11295_2019_1345_Fig12_ESM.png (754 kb)
Supplementary Figure 5

Discriminant Analysis of Principal Components (DAPC) based on the genetic data. Countries or regions of collection were used as grouping factor (PNG 753 kb)

11295_2019_1345_MOESM9_ESM.tiff (71.5 mb)
High resolution image (TIFF 73242 kb)


  1. Aranzana MJ, Carb J, Ar P (2003) Microsatellite variability in peach [ Prunus persica ( L .) Batsch ]: cultivar identification , marker mutation , pedigree inferences and population structure. 1341–1352. doi: CrossRefGoogle Scholar
  2. Arista M, Talavera S (1990) Números cromosómicos para la flora española. Lagascalia 16:323–328Google Scholar
  3. Batlle I, Tous J (1994) Carob tree germplasm in Andalusia (Spain). NUCIS NewslGoogle Scholar
  4. Batlle I, Tous J (1997) Carob tree. Ceratonia siliqua L. promoting the conservation and use of underutilized and neglected cropsGoogle Scholar
  5. Belaj A, Dominguez-García M d C, Atienza SG et al (2012) Developing a core collection of olive (Olea europaea L.) based on molecular markers (DArTs, SSRs, SNPs) and agronomic traits. Tree Genet Genomes 8:365–378. CrossRefGoogle Scholar
  6. Bink MCAM, Jansen J, Madduri M, Voorrips RE, Durel CE, Kouassi AB, Laurens F, Mathis F, Gessler C, Gobbin D, Rezzonico F, Patocchi A, Kellerhals M, Boudichevskaia A, Dunemann F, Peil A, Nowicka A, Lata B, Stankiewicz-Kosyl M, Jeziorek K, Pitera E, Soska A, Tomala K, Evans KM, Fernández-Fernández F, Guerra W, Korbin M, Keller S, Lewandowski M, Plocharski W, Rutkowski K, Zurawicz E, Costa F, Sansavini S, Tartarini S, Komjanc M, Mott D, Antofie A, Lateur M, Rondia A, Gianfranceschi L, van de Weg WE (2014) Bayesian QTL analyses using pedigreed families of an outcrossing species, with application to fruit firmness in apple. Theor Appl Genet 127:1073–1090. CrossRefPubMedGoogle Scholar
  7. Caruso M, Distefano G, Ye X, la Malfa S, Gentile A, Tribulato E, Roose ML (2008a) Generation of expressed sequence tags from carob (Ceratonia siliqua L.) flowers for gene identification and marker development. Tree Genet Genomes 4:869–879. CrossRefGoogle Scholar
  8. Caruso M, La Malfa S, Pavlíček T et al (2008b) Characterisation and assessment of genetic diversity in cultivated and wild carob (Ceratonia siliqua L.) genotypes using AFLP markers. J Hortic Sci Biotechnol 83:177–182. CrossRefGoogle Scholar
  9. Cavalli-Sforza LL, Edwards AWF (1967) Phylogenetic analysis models and estimation procedures. Am J Hum Genet 19:233–257. CrossRefPubMedPubMedCentralGoogle Scholar
  10. Cipriani G, Spadotto A, Jurman I, di Gaspero G, Crespan M, Meneghetti S, Frare E, Vignani R, Cresti M, Morgante M, Pezzotti M, Pe E, Policriti A, Testolin R (2010) The SSR-based molecular profile of 1005 grapevine (Vitis vinifera L.) accessions uncovers new synonymy and parentages, and reveals a large admixture amongst varieties of different geographic origin. Theor Appl Genet 121:1569–1585. CrossRefPubMedGoogle Scholar
  11. Coit JE (1951) Carob or St. John’s bread. Econ Bot 5:82–96. CrossRefGoogle Scholar
  12. Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal, Complex Systems 1695Google Scholar
  13. De Beukelaer H, Davenport GF, Fack V (2018) Core Hunter 3: flexible core subset selection. BMC Bioinformatics 19:203. CrossRefPubMedPubMedCentralGoogle Scholar
  14. Di Guardo M, Bink MCAM, Guerra W et al (2017) Deciphering the genetic control of fruit texture in apple by multiple family-based analysis and genome-wide association. J Exp Bot 68:1451–1466. CrossRefPubMedPubMedCentralGoogle Scholar
  15. Diez CM, Trujillo I, Martinez-Urdiroz N, Barranco D, Rallo L, Marfil P, Gaut BS (2015) Olive domestication and diversification in the Mediterranean Basin. New Phytol 206:436–447. CrossRefPubMedGoogle Scholar
  16. Earl DA, vonHoldt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour 4:359–361. CrossRefGoogle Scholar
  17. Escribano P, Viruel MA, Hormaza JI (2008) Comparison of different methods to construct a core germplasm collection in woody perennial species with simple sequence repeat markers. A case study in cherimoya (Annona cherimola, Annonaceae), an underutilised subtropical fruit tree species. Ann Appl Biol 153:25–32. CrossRefGoogle Scholar
  18. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620. CrossRefGoogle Scholar
  19. Fresnedo-Ramírez J, Frett TJ, Sandefur PJ, Salgado-Rojas A, Clark JR, Gasic K, Peace CP, Anderson N, Hartmann TP, Byrne DH, Bink MCAM, van de Weg E, Crisosto CH, Gradziel TM (2016) QTL mapping and breeding value estimation through pedigree-based analysis of fruit size and weight in four diverse peach breeding programs. Tree Genet Genomes 12:25. CrossRefGoogle Scholar
  20. Garcia-Ochoa F, Casas JA (1992) Viscosity of locust bean (Ceratonia siliqua) gum solutions. J Sci Food Agric 59:97–100. CrossRefGoogle Scholar
  21. Goldblatt P (1981) Chromosome numbers in legumes II. Ann Mo Bot Gard 68:551–557CrossRefGoogle Scholar
  22. Howard NP, Van De Weg E, Bedford DS et al (2017) Elucidation of the ‘Honeycrisp’ pedigree through haplotype analysis with a multi-family integrated SNP linkage map and a large apple (Malus × domestica) pedigree-connected SNP data set. Horticolture Research 4:1–7.
  23. Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806. CrossRefPubMedGoogle Scholar
  24. Jombart T (2008) Adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403–1405. CrossRefPubMedGoogle Scholar
  25. Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the computer program cervus accommodates genotyping error increases success in paternity assignment. Mol Ecol 16:1099–1106. CrossRefPubMedGoogle Scholar
  26. Kamvar Z, Tabima J, Grünwald N (2014) Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. PeerJ 2:e281. CrossRefPubMedPubMedCentralGoogle Scholar
  27. Kassambara A, Mundt F (2016) Factoextra: extract and visualize the results of multivariate data analyses. R Packag version 1:Google Scholar
  28. Konate I, Filali-Maltouf A, Berraho EB (2007) Diversity analysis of Moroccan carob (Ceratonia siliqua L.) accessions using phenotypic traits and RAPD markers. Acta botanica malacitana 32:79–90Google Scholar
  29. La Malfa S, Currò S, Bugeja Douglas A et al (2014) Genetic diversity revealed by EST-SSR markers in carob tree (Ceratonia siliqua L.). Biochem Syst Ecol 55:205–211. CrossRefGoogle Scholar
  30. Meyer RS, DuVal AE, Jensen HR (2012) Patterns and processes in crop domestication: an historical review and quantitative analysis of 203 global food crops. New Phytologist 196(1):29–48CrossRefGoogle Scholar
  31. Miranda C, Urrestarazu J, Santesteban LG, Royo JB, Urbina V (2010) Genetic diversity and structure in a collection of ancient Spanish pear cultivars assessed by microsatellite markers. Journal of the American Society of Horticultural Science 135:428–437CrossRefGoogle Scholar
  32. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959. CrossRefPubMedPubMedCentralGoogle Scholar
  33. R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  34. Raymond M, Rousset F (1995) GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. Journal of Heredity 86:248–249 CrossRefGoogle Scholar
  35. Sahle M, Coleou J, Haas C (1992) Carob pod (Ceratonia Siliqua) meal in geese diets. Br Poult Sci 33:531–541. CrossRefPubMedGoogle Scholar
  36. Schweinfurth G (1894) Sammlung arabisch-aethiopischer Pflanzen, Ergebnisse von Reisen in den Jahren 1881, 1888–89, 1891–92. Bullettin de l'herbier Boissier 2:1–114Google Scholar
  37. Shannon CE (1948) A mathematical theory of communication. Bell System Technical Journal 27:379–423. CrossRefGoogle Scholar
  38. Sidina MM, El Hansali M, Wahid N et al (2009) Fruit and seed diversity of domesticated carob (Ceratonia siliqua L.) in Morocco. Sci Hortic (Amsterdam) 123:110–116. CrossRefGoogle Scholar
  39. Talhouk SN, Van BP, Zurayk R et al (2005) Status and prospects for the conservation of remnant semi-natural carob Ceratonia siliqua L. populations in Lebanon. For Ecol Manag 206:49–59. CrossRefGoogle Scholar
  40. Thachuk C, Crossa J, Franco J, Dreisigacker S, Warburton M, Davenport GF (2009) Core hunter: an algorithm for sampling genetic resources based on multiple genetic measures. BMC Bioinformatics 10:1–13. CrossRefGoogle Scholar
  41. Tous J, Olarte C, Truco M, Arus P (1992) Isozyme polymorphisms in carob cultivars. HortScience 27:257–258CrossRefGoogle Scholar
  42. Tous J, Rovira M, Romero A et al (2006) Carob tree germplasm in Tunisia. NUCIS Newsl 13:55–59Google Scholar
  43. Tous J, Romero A, Hermoso JF et al (2008) Fruiting and kernel production characteristics of ten Mediterranean carob cultivars grown in northeastern Spain. J Am Pomol Soc 62(4):144Google Scholar
  44. Tous J, Romero A, Batlle I (2013) The carob tree: botany, horticulture, and genetic resources. In: Horticultural reviews, vol 41. Wiley-Blackwell, New York, pp 385–456CrossRefGoogle Scholar
  45. Tucker SC (1992) The developmental basis for sexual expression in Ceratonia siliqua (Leguminosae: Caesalpinioideae: Cassieae). Am J Bot 79:318–327CrossRefGoogle Scholar
  46. van Hintum T (1999) The general methodology for creating a core collection. In: John- son RC HT (ed) Core collections for today and tomorrow. International Plant Genetic Resources Instit., (IPGRI), Rome (Italy)Google Scholar
  47. van Hintum TJL, Brown AHD, Spillane C, Hodgkin T (2000) Core collections of plant genetic resources. In: IPGRI Technical Bulletin N. 3Google Scholar
  48. Vavilov NI (1951) The origin, variation, immunity, and breeding of cultivated plants. Ronald Press Co, New YorkCrossRefGoogle Scholar
  49. Viruel J, Médail F, Marianick J, et al. (2016) Mediterranean carob populations, native or naturalized? A continuing riddle. OPTIMA XV Montpellier, Fr. doi:
  50. Viruel J, Haguenauer A, Juin M, Mirleau F, Bouteiller D, Boudagher-Kharrat M, Ouahmane L, la Malfa S, Médail F, Sanguin H, Nieto Feliner G, Baumel A (2018) Advances in genotyping microsatellite markers through sequencing and consequences of scoring methods for Ceratonia siliqua (Leguminosae). Applications in Plant Sciences 6:e01201.
  51. Wang J (2011) COANCESTRY: a program for simulating, estimating and analysing relatedness and inbreeding coefficients. Mol Ecol Resour 11:141–145CrossRefGoogle Scholar
  52. Winer N (1980) The potential of the carob (Ceratonia siliqua). Int Tree Crop Journal 1:15–26. CrossRefGoogle Scholar
  53. Wright S (1969) Evolution and the genetics of populations. theory gene Freq Volume 2Google Scholar
  54. Zohary M (1973) Geobotanical foundations of the Middle East. Gustav Fisher Verlag, StuttgartGoogle Scholar
  55. Zohary D (2002) Domestication of the carob (Ceratonia siliqua L.). Israel Journal of Plant Sciences 50:141–145. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Dipartimento di Agricoltura, Alimentazione ed AmbienteUniversità di CataniaCataniaItaly
  2. 2.Institut de Recerca i Tecnologia Agroalimentàries, IRTA Fruit ProductionMas BovéTarragonaSpain

Personalised recommendations