Keywords

A key aim of developmental biology is to describe the molecular mechanisms underlying pattern formation, i.e., how gene expression patterns are established and how cell differentiation is orchestrated over time. Since the discovery of embryonic induction, which revealed that secreted molecules are capable of instructing and organizing cells in surrounding tissues (Waddington 1940; Spemann and Mangold 2001), cell-cell signaling has become a sine qua non mechanism of pattern formation in many (if not most) developmental systems (Meyerowitz 2002; Rogers and Schier 2011; Urdy 2012; Kicheva and Briscoe 2015). Experimental manipulations of extracellular signals can impact tissue patterning at a distance (Salazar-Ciudad 2006; Nahmad Bensusan 2011; Perrimon et al. 2012; Urdy et al. 2016). It follows that to understand how spatial information is deployed in differentiating tissues, it is critical to characterize the signals that mediate intercellular communication. A handful of genes coding extracellular proteins that act as signaling molecules between neighboring cells have been identified in animals (Nichols et al. 2006; Rokas 2008a; Perrimon et al. 2012): Wnt, TGF-beta, Hedgehog, Notch, EGF, RTK ligands, and TNFs, among other families. These signaling ligands are widely conserved and show highly regulated expression patterns (Salvador-Martínez and Salazar-Ciudad 2015).

In the 2000s it was proposed that the construction of multicellular organisms relies on a small set of conserved genes, referred to as the developmental genetic toolkit (DGT), which comprises a few hundred genes from a few dozen gene families involved in two major processes: cell differentiation and cell-cell communication (Carroll et al. 2005; Floyd and Bowman 2007; Rokas 2008b; Erwin 2009). On the other side, genes that are not part of the DGT were attached to vital routine functions such as metabolism, protein synthesis, or cell division. According to the DGT view, spatial information emerges from an interplay between genetic factors involved in signal transduction and transcriptional control. An inevitable consequence is that morphological evolution should be based, to a large extent, on reusing these toolkit components, and it follows that mutations in the DGT genes themselves should cause evolution of form (Carroll et al. 2005; Carroll 2008). Such proposition was formulated at the beginning of the twenty-first century, while few genes underlying morphological evolution had been identified – less than 50 cases in 2001 (Martin and Orgogozo 2013). As of today, the hypothesis that animal morphological evolution is mainly caused by mutations in DGT genes can now be tested further based on micro-evo-devo studies (Nunes et al. 2013) and the analyses of genotype-phenotype variation in nature (Orgogozo et al. 2015; Stern 2011). Here we investigate one aspect of the DGT view, the importance of genes encoding secreted signaling proteins in driving morphological evolution. We examine whether ligand-coding genes are preferential targets for the generation of morphological evolution. In addition, we confront existing data to predictions that the corresponding allelic variation should be (1) potentially adaptive (Barrett and Hoekstra 2011; Pardo-Diaz et al. 2015), (2) replicated over various phylogenetic levels (Gompel and Prud’homme 2009; Kopp 2009; Martin and Orgogozo 2013), and (3) cis-regulatory rather than coding (Prud’homme et al. 2007; Carroll 2008; Stern and Orgogozo 2008; Liao et al. 2010).

1 Gephebase: The Database of Genotype-Phenotype Variations

Experimental studies based on the manipulation of gene function in the laboratory – for instance, based on reverse genetics or on a mutant screen followed by forward genetics mapping – describe the overall architecture of the genotype-phenotype map in a given organism. However, the genetic causes of evolutionary change in nature do not necessarily equate to the mutations studied in the laboratory: evolutionary-relevant mutations may represent a particular subset of all possible mutations. To identify the genetic causes of natural differences between individuals, populations, and species, one can perform forward genetics studies that compare two naturally occurring phenotypic states – in general, using linkage mapping of quantitative trait loci or Mendelian genes or association mapping (Stern 2000). The so-called “loci of evolution” or “quantitative trait gene (QTG)” studies identify pairs of alleles linked to a specific phenotypic difference (Orgogozo et al. 2015), for instance between an ancestral and a derived state. These loci are typically genomic targets of selection when the variation is of adaptive or domesticating potential. Due to experimental limitations, the dataset is biased toward large-effect loci and thus misses a large fraction of what constitutes the total genetic template of evolution (Rockman 2012). Nevertheless, we think that it is crucial to gather the findings of this research program under the banner of a resource that would integrate, for comparative and meta-analytical purposes, our growing knowledge of genotype-phenotype relationships. To facilitate the curation and analysis of the relevant literature [see (Stern and Orgogozo 2008; Streisfeld and Rausher 2011; Martin and Orgogozo 2013) for previous examples], we have created Gephebase (http://www.gephebase.org), a database of genotype-phenotype relationships underlying natural and domesticated variation across Eukaryotes. Here, we use Gephebase to reflect on the importance of signaling ligand genes for morphological evolution in animals.

2 Method: Construction of Gephebase and Identification of Signaling Genes

Gephebase is a quality-controlled, manually curated database of published associations between genes and phenotypes in Eukaryotes – containing a total of 1400 entries as of December 31, 2016. For now, genes responsible for human disease and for aberrant mutant phenotypes in laboratory model organisms are excluded and can be found in other databases (OMIM, OMIA, FlyBase, etc.). QTL mapping studies whose resolution did not reach the level of the nucleotide or of the transcriptional unit are also excluded. In Gephebase, each genotype-phenotype association is attributed to only one type of experimental evidence among three possibilities: “association mapping,” “linkage mapping,” or “candidate gene.” This choice is made by Gephebase curators based on the best evidence available for a given genotype-phenotype relationship. Gene-to-phenotype associations identified by linkage mapping with resolutions below 500 kb have priority in the dataset (see Supplementary Materials in Martin and Orgogozo 2013). Association mapping studies are included based on individual judgment, with a strong bias toward SNP-to-phenotype associations that have been confirmed in reverse genetic studies. In other words, Gephebase intends to be more stringent than a compilation of statistically significant SNPs, and attempts to select studies where a given genotype-phenotype association is relatively well supported or understood.

Gephebase presents itself as a collection of entries, where each entry corresponds to an allelic difference at a given gene, either between two closely related species or between two individuals, its associated phenotypic change, and the relevant publications. As of today, the database contains a total of 391 entries related to animal morphological changes: 174 for domesticated or artificially selected traits, 172 for intraspecific trait variations, and 45 for interspecific changes (Table S1, available at http://virginiecourtier.wordpress.com/publications/. We identify 80 cases of natural morphological evolution and domestication in animals (out of 391) that involve 21 different ligand genes (Table 4.1; Table S2, available at http://virginiecourtier.wordpress.com/publications/).

Table 4.1 Twenty-one ligand genes with known gephe variations identified in Gephebase – 19 of which related to cases of morphological evolution (accessed December 31, 2016)

To estimate the proportion of genes encoding signaling ligands in genomes (Table 4.2), we used the BioMart portal from Ensembl (Smedley et al. 2015). All the genes, which have both the following Gene Ontology (GO) annotations, “receptor binding” (Molecular Function, GO:0005102) and “extracellular region” (Cellular Component, GO:0005576), were considered as ligand genes. To count the number of genes with two GO annotations, we used BioMart to extract text files containing Ensembl Gene ID for each GO and each species. We then counted the number of genes having both GO in each species with the following Linux command: comm -1 -2 <(sort human-GO0005102.txt) <(sort human-GO0005576.txt) | head -n -1 | wc -l (note that the title line had to be excluded from the count).

Table 4.2 Proportion of signaling ligand-encoding genes in the genome of several model species

Box 4.1: Definitions

Admixture Mapping: a method capitalizing on the current gene flow between two or more previously isolated populations to associate genetic loci to phenotypic traits. Admixture mapping is a form of association mapping.

Association Mapping: a forward genetics method for gephe identification based on a genome-wide statistical association between genetic variants and phenotypic traits, generally in a large cohort of unrelated individuals.

Candidate Gene Approach: a reverse genetics method that tests if a locus defined a priori, based on our current biological knowledge, underlies variation in a phenotype of interest. Example: opsin photoreceptor genes are typical candidate genes for differences in color vision.

Forward genetics: set of methods used to identify the genetic cause(s) of a given phenotypic trait (“from the phenotype to genes”).

Genetic hotspot: a group of orthologous loci that have been associated multiple times to phenotypic variation due to independent mutational events in each lineage (Martin and Orgogozo 2013).

Gephe (neologism for genotype-phenotype relationship; pronounced jay-fee): an abstract entity composed of three elements: a variation at a genetic locus (two alleles), its associated phenotypic change (two distinct phenotypic states, e.g., an ancestral and a derived state), and their relationship (Orgogozo et al. 2015). A gephe is usually defined for a given genetic background and environment.

Haplotype: a set of closely linked alleles found on the same chromosome, which is inherited as a single piece.

Heterotopy: change that occurred during evolution in the location of a particular molecular event within the developing organism.

Linkage Mapping: a forward genetics method for gephe identification based on chromosome shuffling and crossing-overs, using the progeny of a hybrid cross. This includes the mapping of quantitative trait loci (QTL) and Mendelian loci.

Mendelian Gene: a segregating genetic unit which is detected through phenotypic differences associated with different alleles at the same locus (Orgogozo et al. 2016).

Morphospace: an abstract representation of all possible morphologies and shapes of an organism.

Orthologous Loci: pieces of DNA that share ancestry because of a speciation event and that are thus found in different species.

Parallel Evolution: here defined as independent repeated sequence variation at a same locus, underlying variation in a similar phenotypic trait (Stern 2013). For other definitions, see (Scotland 2011).

Phenologue: a similar phenotype caused by a conserved genetic mechanism in distant lineages (McGary et al. 2010; Lehner 2013). Used here as the phenotypic counterpart of a gephe involving several cases of parallel evolution.

Quantitative Trait Locus: a portion of DNA (the locus) that is associated with variation in a quantitative phenotypic trait.

Reverse Genetics: set of methods used to alter a given gene in order to characterize its function (“from genes to phenotypes”).

3 A Few Select Genes for Body-Wide Switches in Melanin Production in Tetrapods

Among 294 Gephebase morphology entries for tetrapods (Gephebase search term “Tetrapoda,” including mammals and reptiles sensu largo), 206 genotype-phenotype relationships relate to pigment variation, including 193 entries identifying components of the melanocyte differentiation pathway. Both sampling and ascertainment biases explain this unusual enrichment. First, pigmentation shows a bulk of variation accessible to breeders and natural selection altogether (Protas and Patel 2008; Linderholm and Larson 2013). In combination with the fact that coloration variation often involves few genes, these features have made pigmentation a favorite target for exploring genotype-phenotype relationships (sampling bias). Second, there is predictability in the genetic basis of melanin pigment variation, as illustrated by the fact that the melanocortin 1 receptor (MC1R), a major regulator of melanocyte activation, is the most represented gene in Gephebase with 84 entries (6% of all 1400 entries). Interestingly, 80% of MC1R gephes (67/84) were identified by a candidate gene approach. This pattern illustrates well a latent ascertainment bias in the study of vertebrate pigment variation: when interested in the genetic basis of a color variation involving shifts in melanin types (mammalian coat, bird plumage, etc.), it has become a knee-jerk reflex for biologists to look for amino acid changes in MC1R, in particular in domains that had been functionally characterized. As a matter of fact, all of the 67 MC1R gephes based on a candidate gene approach involve mutations affecting the gene-coding region. Thus, both the phenotypic diversity of vertebrate pigmentation traits and their simple genetic basis explain the overrepresentation of MC1R to a large extent. This said, the fact that the remaining 20% of MC1R entries were identified by linkage or association mapping validates the idea that MC1R is a bona fide driver of color variation in vertebrates. As an explanation for this trend, it is likely that the MC1R protein hosts tuning sites that can modulate pigmentation without affecting other traits and that its mutations can show a dominant effect prone to a rapid adaptive spread (Mundy 2005; Kopp 2009; Kronforst et al. 2012; Reissmann and Ludwig 2013; Wolf Horrell et al. 2016). Other components of the melanocyte activation cascade also form gephes involved in natural and artificial selection of coloration traits (Fig. 4.1). This includes downstream targets of MC1R signal transduction such as the transcription factor gene MITF and the melanogenic genes TYR, TYRP1, and Pmel17, all involved in the biogenesis of eumelanosomes.

Fig. 4.1
figure 1

Alleles of secreted ligands associated to pigment variation in vertebrates. (a) The MC1R and cKIT signaling pathways each activate a signal transduction regulatory cascade converging on the MITF transcription factor that modulates the expression of melanogenic genes and ultimately activates the maturation and transport of dark eumelanin in melanosomes. Agouti and β-defensin3 are secreted extracellular modulators of MC1R, and KITLG is the agonist ligand of cKIT. Allelic variation at these three genes is associated to pigment variation in vertebrates. (b) Black panthers are leopards that carry a null mutation in Agouti. (c) Adaptive pigment variation in deer mice (Peromyscus spp.) has repeatedly involved sequence modifications at the Agouti locus. For instance, distinct populations of P. polionotus adapted to dark (mainland Florida; top panel) and light (coastal Florida; bottom panel) color backgrounds via cis-regulatory variants that modulate Agouti skin expression. (d) Black wolves can be seen at increasing frequencies in packs of the Yellowstone National Park (USA). The melanic allele corresponds to a single amino acid deletion, which was originally selected in domestic dogs and later introgressed in wild in North American wolves and coyotes by hybridization. Photo credits – (b) Emmanuel Keller (License CC BY-ND 2.0), (c) Roger Barbour (License CC BY-ND 2.0), (d) Doug Smith (Public Domain)

Upstream of MC1R, two signaling molecules that interact with receptor function are known as allelic sources of color variation in vertebrates. In particular, the antagonist ligand Agouti/ASIP is a genetic hotspot for pigment variation with a total of 28 entries in Gephebase. This includes numerous cases where this gene was identified by linkage or association mapping, both in natural and domesticated contexts (Fig. 4.1a–c), making Agouti one of the most commonly mapped genes in our dataset. Coding alleles of Agouti are recessive loss-of-function mutations resulting in melanic phenotypes. This contrasts with the melanic gain-of-function coding alleles of MC1R which are dominant, a difference in allelic effects that is used to infer the genetic basis of melanism (Eizirik et al. 2003). The Agouti ligand inhibits the basal activation of the MC1R pathway. In an Agouti-null context, MC1R is hyper-activated by its active ligand, the pituitary melanocortin hormone α-MSH, which triggers a melanocyte regulatory cascade that culminates with eumelanin production. It has been proposed that wild-type Agouti can become an agonist of MC1R melanic variants (McRobie et al. 2014), suggesting that certain gain-of-function MC1R alleles reverse the responsiveness of the receptor to the Agouti ligand itself. In addition to Agouti, the β-defensin 3/CBD103 peptide is secreted by skin epithelia, strongly binds to MC1R, and was shown to be responsible for melanism in dogs (Candille et al. 2007). In certain melanic dog breeds, one amino acid deletion in β-defensin 3/CBD103 results in dominant melanism, possibly by blocking the inhibitory activity of Agouti or by losing its blocking of α-MSH stimulatory binding (Nix et al. 2013). Of note, the CBD103ΔG23 melanic allele is revealing a complex history that blurs the boundary between wild and domesticated. First, based on ancient DNA studies, it probably originated through domestication from a possible wolflike gene pool as early as 10,000 years ago (Ollivier et al. 2013), introgressing into modern dog breeds. Second, it propagated back in the wild, resulting in relatively recent segregation of melanic phenotypes in North American gray wolves, North American coyotes, and Italian gray wolves (Anderson et al. 2009). The melanic allele shows signatures of positive selection, but it remains unclear if this is due to a fitness effect of the melanic coat or, alternatively, to the antimicrobial properties of β-defensin 3. A few other cases of organism-wide color changes have been found to be positively selected (Vignieri et al. 2010; Barrett and Hoekstra 2011; Laurent et al. 2016).

In conclusion, mutations in MC1R and Agouti account for 54% (112/206) of the gephes dealing with tetrapod pigmentation variation in our current dataset. Such an overrepresentation cannot be explained by experimental bias alone and suggests that MC1R and Agouti are preferential targets for pigmentation evolution in tetrapods.

4 cis-Regulatory Evolution Drives Regional Specific Color Shifts

While ligand- and receptor-coding changes likely modulate the strength of signaling, and, thus, pigment synthesis in melanocytes, such changes are likely to affect all the body regions where these genes are expressed. In contrast, region-specific changes in coat, skin, or plumage coloration are more likely to involve cis-regulatory mutations. In a previous meta-analysis of the gephe literature, it was established empirically that localized morphological changes almost always involve cis-regulatory rather than coding variation (Stern and Orgogozo 2008). Agouti is a hotspot of cis-regulatory evolution for pigment pattern modification and provides one of the most spectacular examples of QTL fractionation. Deer mice display extensive pigment variation matching the color of their environment (Manceau et al. 2010). Fine mapping of this variation revealed that not only the Agouti locus is the major driver of pigment variation (Manceau et al. 2011) but also this genetic region decomposes itself into multiple noncoding sub-loci, each tightly associated with parts of the total phenotype (Linnen et al. 2013). Various regulatory elements are involved in directing the expression of three alternative isoforms into different body regions (Mallarino et al. 2016). Each adaptive allele is a complex haplotype that is inherited as a package that underwent multiple local changes. This is of major importance to understand how small leaps in the morphospace occur, as it illustrates the principle that genetic hotspots, in addition to providing a somewhat predictable basis for phenotypic evolution between species, can also accumulate mutations that collectively result in large-effect variation within a single lineage (Stam and Laurie 1996; McGregor et al. 2007; Rebeiz et al. 2011; Martin and Orgogozo 2013; Linnen et al. 2013; Noon et al. 2016).

Thus, the studies of vertebrate pigment variation suggest that a receptor (MC1R) and its inverse agonist (Agouti/ASIP) are key regulators of melanocyte differentiation, driving adaptive variation in natural contexts as well as novel color features available to farmers and breeders. Coding evolution in either component results in body-wide color shifts, while cis-regulatory evolution of Agouti, by tuning the spatial deployment of an MC1R switch-off, permits subtle changes in morphology. The Agouti/MC1R axis is not a typical developmental pathway and plays little role during ontogenesis (e.g., see Gene Ontology annotations in Gephebase). In contrast, the endothelin-3 ligand/endothelin-receptor B (EDN3/EDNRB) signaling axis has pleiotropic roles in the differentiation and migration of neural crest cells, and mutations in both EDN3 and EDNRB have been found to cause pigmentation changes in domesticated chicken, cattle, and horse (Santschi et al. 1998; Dorshorst et al. 2011; Qanbari et al. 2014). So far, only domesticated alleles of EDN3/EDNRB that may be under unrealistic selective regimes have been mapped. Thus, while it represents perhaps a genuine DGT component, it remains ambiguous if endothelin pathway genes can be a mutational target of evolution in a natural context. To truly assess the role of signaling ligand genes in morphological evolution, it is useful to focus on radiating lineages that allow a trait-by-trait dissection by forward genetics (i.e., taking advantage of natural variation between closely related lineages – populations and sister species) and, sometimes, natural experiments of replicated evolution (Kopp 2009; Powell and Mariscal 2015). In the next sections, we focus primarily on stickleback fishes and Heliconius butterflies, for which numerous linkage mapping efforts have been uncovering the genetic basis of several morphological adaptations.

5 Recent Stickleback Fish Adaptations Repeatedly Recruited Ligand Alleles

Three-spined sticklebacks (Gasterosteus aculeatus) are a species of marine fishes that repeatedly colonized freshwater environments following the retreat of the Pleistocene glaciers. Adapting to these novel niches involved numerous morphological, physiological, and behavioral modifications all available to genetic dissection by QTL mapping and population scans. Among the 14 gephes that have been mapped in sticklebacks (Pitx1, TSHBeta2, KCNH4, KITLG, EDA, GDF6, BMP6, PRKCD, SOD3, KCNH4, ATP6V0A1, ATP1A1, Mucin, IGK), 4 involve a secreted ligand gene. Analysis of well-annotated genomes indicates that secreted ligand genes represent less than 5% of the total number of genes within an animal genome (Table 4.2). The proportion of ligand gephes in sticklebacks (28%) is thus higher than expected with the null hypothesis that mutations responsible for phenotypic evolution occur randomly at any gene within a genome (chi2 test: chi2 > 20; p < 10−5).

A single large-effect locus was identified as driving melanin pigment reduction in freshwater populations (Fig. 4.2a). Contrary to expectations, this trait mapped neither to the MC1R pathway nor at its downstream targets, but at the Kit-ligand (KITLG) locus (Miller et al. 2007; Jones et al. 2012), which encode the secreted signaling component of a parallel pathway (Fig. 4.1). KITLG is the ligand of the KIT receptor, which triggers a MAPK tyrosine kinase transduction cascade that modulates the differentiation and activity of melanocytes (Wehrle-Haller 2003). While the KIT receptor has been identified in a total of 17 color-related gephes, it is only linked to domesticated alleles in the cattle, pig, horse, donkey, domesticated fox, and domestic cat (see Advanced Search “Gene name and synonyms” = “KIT” at www.gephebase.org for a complete list). In contrast, cis-regulatory alleles of KITLG have been shown to underlie natural pigment variation not only in stickleback fishes but also in humans (Miller et al. 2007; Guenther et al. 2014). An Ala193Asp mutation in KITLG has also been shown to cause piebald coat color phenotypes in cattle breeds (Seitz et al. 1999; Qanbari et al. 2014). Of note, cis-regulatory KITLG variation may provide tissue-specific effects that limit its potential deleterious pleiotropic effects on cancer risks, as observed in other variant forms of this locus in humans (Karyadi et al. 2013; Litchfield et al. 2016).

Fig. 4.2
figure 2

Secreted ligand loci involved in marine-to-lake adaptations in sticklebacks. (a) A KITLG cis-regulatory variant causes reduced melanization in lake populations (bottom) compared to marine alleles (top). (b) MicroCT images of the tooth plates of a marine vs. a lake-adapted ecotype. The freshwater cis-regulatory BMP6-derived allele causes increased tooth area and density. (c–d) Armor plates are lateral bony structures, here stained by Alizarin Red (c) and false-colored in MicroCT rendering (d, pink), which were repeatedly reduced or lost in freshwater populations. cis-Regulatory alleles of EDA and GDF6 cause distinct effects on plate distribution, number, and size (Photo credits – (a) Frank Chan and David Kingsley, (b) Craig Miller and David Kingsley, (c) Nicholas Ellis and Craig Miller, (d) Catherine Guenther, Vahan Indjeian, and David Kingsley)

Another locus, encoding the bone morphogenetic protein 6 (BMP6) ligand, was found to cause tooth gain in freshwater stickleback population (Cleves et al. 2014; Erickson et al. 2015) (Fig. 4.2b). The causal change is cis-regulatory and downregulates BMP6 expression, late during oral development (see Cleves et al. 2014 correction). Surprisingly, genetic mapping of a second freshwater population revealed that another genomic locus has driven a similar phenotypic output (Ellis et al. 2015). BMP ligands belong to the TGF-β family, are shared by all bilaterian animals, and play important roles for the regulation of development (De Robertis 2008). Compilation of current data suggests that mutations in TGF-β family genes are often involved in the tinkering of reproductive and skeletal traits during evolution and domestication. Several BMP alleles have been associated to increased fertility in domestic sheeps (BMP15 and its paralog GDF9) (Monestier et al. 2014) and to fecundity and bone allocation in chicken (BMP2) (Johnsson et al. 2012). Genetic studies of craniofacial diversity mapped a QTL interval containing the BMP4 gene in cichlid fishes (Albertson et al. 2005) and found a strong association between a single amino acid change in BMP3 and brachycephalic (short-skulled) dog breeds (Schoenebeck et al. 2012).

Body armor loss, via the reduction of lateral bony plates, has been a recurring adaptation to freshwater in sticklebacks. Two major loci have been characterized. The tumor necrosis factor superfamily gene Ectodysplasin A (EDA) harbors cis-regulatory variation existing at low frequency in the marine population that has been repeatedly recruited in continental populations to drive plate number reduction (Colosimo et al. 2005; Jones et al. 2012; O’Brown et al. 2015). The same locus also triggers a change in schooling behavior, as fishes from lake habitats have lost the ability to precisely align their body axis when swimming in a group, an effect that is reversed by transgenic overexpression of EDA (Greenwood et al. 2016). In addition, a combination of QTL mapping and genome scan has identified a freshwater-specific allele at the growth/differentiation factor 6 (GDF6) locus, which results in a gain of expression of that gene in the developing epithelium and, ultimately, in a reduction of lateral plate size (Indjeian et al. 2016). Like for KITLG, this case also opened a window into human evolution as it was found that a GDF6 hindlimb-specific enhancer was lost in the human lineage, with skeletal modifications obtained in mice that suggest a potential role in the evolution of bipedalism (Indjeian et al. 2016).

Forward genetics efforts in sticklebacks thus show that ligand genes belonging to classical developmental pathways are an important source of morphological variation of adaptive relevance. Noticeably, all the stickleback gephes described here are cis-regulatory, in accordance with the prediction that tinkering of developmental genes is more likely to involve cis-regulatory changes than coding mutations (Carroll 2008; Stern and Orgogozo 2008). Next, we focus on how accumulated changes in signaling ligand loci have enlarged the landscape of possible morphologies in insect wings.

6 The Wnt Beneath My Wings

There are few case studies that characterize adaptive variation for a same set of traits both within and between species. Butterflies of the Heliconius genus provide a rich phylogenetic template for such micro-evo-devo studies (Papa et al. 2008; Supple et al. 2014; Kronforst and Papa 2015; Merrill et al. 2015). They display a range of highly variable wing color pattern phenotypes involved in Müllerian mimicry (the collaborative display of similar morphologies to predators from multiple unpalatable species) and sexual selection that are amenable to hybrid crosses followed by linkage mapping. In addition, their natural hybrid zones form a system of choice for high-resolution admixture mapping, looking for SNP-phenotype associations and the smoking guns of selection that are the handful of Mendelian loci that keep adjacent populations phenotypically distinct in the face of constant gene flow and recombination. The Wnt-family signaling ligand WntA has emerged as a key genetic driver of wing pattern evolution in butterflies. Originally discovered as a Mendelian locus responsible for discrete shifts in pattern shapes in the Heliconius erato mimetic radiation, this gene shows striking expression differences in larval wing disks that correlate tightly with the position of presumptive color elements and defines the black contours of forewing color patterns (Martin et al. 2012). Both linkage and admixture mapping approaches have revealed that a versatile pool of WntA alleles underlie marked phenotypic differences in at least six geographic races of H. erato (Fig. 4.3a, b). Following this discovery, additional mapping efforts discovered that WntA variants control pattern variation in four other Heliconius species, as well as in Limenitis arthemis, a species which diverged from the Heliconius genus 65 million years ago (Fig. 4.3a) (Gallant et al. 2014; Huber et al. 2015). All the mapped WntA alleles not only underlie phenotypic divergence within species but also convergence between sympatric morphs that evolved in distinct species, thus providing clear examples of adaptive tinkering and repeated evolution of similar patterns. As expected, the causative changes are not found in the WntA coding exons, which show little variation in amino acid sequence, but in the adjoining regulatory loci that control WntA expression during wing development. The role of WntA cis-regulatory mutations may very well extend to much broader phylogenetic levels, as WntA expression, which shows spectacular shifts in expression in all the butterfly species assessed so far, always correlates with color pattern features (Martin and Reed 2014). With a total of 18 alleles in 7 species, all associated with wing color pattern variation, WntA can be seen as a genuine genetic hotspot of adaptation (Martin and Orgogozo 2013) and a case model for linking regulatory sequence variation, pattern formation, and morphological evolution at multiple time scales.

Fig. 4.3
figure 3

Mapped cis-regulatory alleles of WntA, a genetic hotspot of wing pattern shape variation. (a) A total of 18 WntA cis-regulatory variants have been identified by linkage mapping (orange dots) and admixture mapping in natural hybrid zones (green dots). Each allele is associated with spatial shifts in WntA expression that drive pattern shape variations, in particular, in the median region of butterfly forewings. Each half-butterfly corresponds to a natural morph. WntA-independent color patterns were manually masked and shaded in gray to better highlight the wing pattern areas influenced by WntA. (b) Fractionation of the H. erato WntA locus at several haplotypic blocks, each perfectly associated with pattern shape variation across three natural hybrid zones (Van Belleghem et al. 2017). (c) Three novel cis-regulatory regions underlie the evolution of novel pigmentation traits in D. guttifera. (d) Fine QTL mapping of wing size variation in male Nasonia wasps identifies three intervals responsible for the differential spatiotemporal recruitment of the upd-like growth factor (Photo credit (use with permission) – (c) Nicolas Gompel and Shigeyuki Koshikawa and (d) David Loehlin)

7 Ligand Gene Modularity Allows Interspecific Differences

The current data suggest that the WntA locus contains multiple control regions and haplotypes, each being able to reconfigurate part of WntA expression and the overall organization of wing patterns. Association mapping reveals at least three adjacent haplotype regions with distinct patterning effects in H. erato (Fig. 4.3b) and a single 1.8 kb indel perfectly associated to a polymorphic variant in a sympatric H. cydno alithea population (Gallant et al. 2014; Van Belleghem et al. 2017). This said, the functional dissection of these genetic elements is reaching a technical limitation at this moment due to the inability to test for the function of each cis-regulatory region in butterflies, and we must gain insight into the evolution of ligand gene expression in analog models to explore the logic of cis-regulatory control. Interestingly, detailed analyses of the cis-regulatory region of another Wnt locus, this time encompassing wingless (syn. Wnt1; wg) and its tandem paralogs Wnt6 and Wnt10 (Fig. 4.3c), show that three novel, tissue-specific cis-regulatory elements drive wingless expression and underlie novel color patterns on the wings and thorax of Drosophila guttifera fruit flies (Werner et al. 2010; Koshikawa et al. 2015). While these studies lack the phylogenetic resolution and replication observed in butterflies, they provide one of the most detailed mechanistic accounts of truly novel traits, where the deployment of Wnt expression in three different body regions is driven by independent cis-regulatory changes. Of note, wg is also associated to color patterns and wing contours in both flies and butterflies (Macdonald et al. 2010; Martin and Reed 2010; Koshikawa et al. 2015), and a redeployment of this gene to new body regions is likely to drive the evolution of new patterns as well, as it seemed to have occurred during the evolution of larval cuticle patterns in Lepidoptera (Yamaguchi et al. 2013). We note that while Koshikawa et al. did not detect any pattern-related Wnt6 and Wnt10 expression in D. guttifera developing wings (Koshikawa et al. 2015; S. Koshikawa, personal communication), these two paralogs are co-deployed with wg in butterflies where they may underlie a more complex architecture, with partially redundant ligand activities (Martin and Reed 2014). Beyond their obvious parallels (wing pigmentation traits; Wnt loci), the butterfly and D. guttifera data collectively depict a modular landscape of pattern evolution where acquisitions and modifications of cis-regulatory elements allow a fine-tuning of color patterns (Koshikawa 2015).

Another case study provides further support for linking gene regulatory region modularity at a ligand locus and interspecific variation (Loehlin and Werren 2012). Using two Nasonia wasp sister species, Loehlin and Werren mapped a male wing size variation QTL to the JAK/STAT pathway ligand gene unpaired-like (upd-like) and, by a genetic tour de force, were able to genetically break down this locus into three regulatory intervals, each with complementary effects on wing size. In fact, each mapped interval affects various complementary spatiotemporal expression patterns of upd-like, ultimately affecting wing growth. Thus, whether the phenotypic output is a growth trait (the upd-like case) or a color pattern (the WntA and wg cases), we have empirical evidence that morphological evolvability depends in these cases on the capacity to modify an expression pattern. In a nutshell, the different case studies linking insect wing variation and ligand genes highlight the importance of modular cis-regulatory architecture in the tinkering of anatomy.

8 How, When, and Why Ligand Genes Are Likely Drivers of Pattern Variation, or Not

Our cumulative knowledge of evolutionary genetics foreshadows a relative predictability in the genetic mechanisms that drive phenotypic change (Stern and Orgogozo 2009; Martin and Orgogozo 2013; Orgogozo 2015): by laying out what seems to be common mechanisms or trends in the generation of novelty, we can formulate post hoc expectations that can be generalized over broad taxonomic ranges. The cases of Wnt-based color pattern variation discussed above, WntA in nymphalid butterflies and wg in D. guttifera, both provide a useful model framework for understanding the molecular logic of pattern evolution due to their relative simplicity, as they take place in the two-dimensional canvas of the insect wing epithelium. To the best of our knowledge, these patterning systems are uncoupled from tissue growth, which prevents the complex dynamics found in many other morphological contexts (Salazar-Ciudad 2006; Salazar-Ciudad 2009; Urdy et al. 2016). As simplified spatial output of cellular differentiation, color patterns can be used as a proxy for more complex morphologies, providing fundamental insights that can be applied across all animals. A simple ascertainment emerges from the fly and butterfly data: cis-regulatory evolution of pattern-inducing signaling genes has repeatedly driven the evolution of new patterns and derived pattern shapes. We can elaborate upon a simple gradient model of positional information (Wolpert 1969) generating threshold-dependent pattern boundaries (Fig. 4.4a), to derive five types of ligand gene signaling that can produce morphological outcomes (Fig. 4.4b–f). Since cis-regulatory variation modulates gene expression in time and space, it can affect tissue patterning in multiple ways, and its effect on a ligand gene can be sufficient to induce a new pattern (Fig. 4.4b) or simply change its shape (Fig. 4.4c). In addition, cis-regulatory acquisition of localized repressors can dislocate a pattern and thus affect both pattern number and shape (Fig. 4.4d). Pattern size can also be affected by quantitative or temporal changes in the expression of a secreted factor, without requiring a change in the number of source cells, or, alternatively, by trans-interactions upstream of the ligand that would affect its secretion and transport (Fig. 4.4e). Finally, modification in the tissue responsiveness to the signal or its concentration or time-dependent interpretation may modulate the pattern thresholds (e.g., color composition) without affecting the overall size and shape of the pattern (Fig. 4.4f).

Fig. 4.4
figure 4

Distinct aspects of pattern variation may rely on different modes of ligand gene modification. (a) A three-step model of pattern formation. Ligand-expressing cells (red hexagons) deploy a signal that is interpreted by neighboring cells in a concentration-dependent manner, resulting in a three-state output (yellow, low signal; black, intermediate; orange, high). (b) Discrete gain of a novel ligand gene expression domain can generate novel pattern elements. (c) Continuous spatial modulation of ligand expression can generate new pattern shapes. (d) Local loss of ligand expression can result in pattern dislocation. (e) Upregulation of a ligand gene can generate enlarged patterns. (f) Pattern composition may vary based on modifications of the signal interpretation process, downstream of the ligand gene itself (without affecting its expression or protein). (g) Sulfated polysaccharide injections in the variable checkerspot butterfly, performed within 24 h after pupation, affect the size of Wnt-positive patterns. Dextran sulfate results in Wnt pattern contractions, while heparin results in Wnt gain-of-function effects that expand the same patterns. Both compounds illustrate how genetic modulations of the extracellular environment can modulate pattern size. (g’) Temperature shocks during early pupal life result in pattern distortions (similar to G panel), indicating a sensitivity of the signaling step to physiological conditions. (h) The variable checkerspot is named after its color pattern polymorphism, involved in adaptive mimicry (Bowers et al. 1985; Long et al. 2014b). Differences in red patterns may be due to changes in genes modulating Wnt signal, rather than at a Wnt gene locus itself (see f)

These distinct dimensions of pattern variation can be used to generate hypotheses on the molecular targets underlying a given phenotypic state. Below we illustrate this principle, building upon a set of observations made on the variable checkerspot (Euphydryas chalcedona). E. chalcedona checkerspots display a set of orange patterns outlined by black scales that are each expressing WntA or wg/Wnt6/Wnt10 (Martin and Reed 2014). Each of these patterns can be contracted or expanded by an injection of dextran sulfate or heparin, respectively (Fig. 4.4g). These two sulfated polysaccharide compounds possess a high molecular weight, which restrict them to the extracellular space, and injections are only effective when performed within 24 h after pupation, revealing a short time window for pattern formation (Serfas and Carroll 2005; Martin and Reed 2014). Finally, both heparin and endogenous, heparin-like heparan sulfate proteoglycans (HSPGs) are known to bind Wnt ligands in the extracellular space, where they are of critical importance for signal secretion, stability, and transport (Lin 2004). These observations provide a simple alternative mechanism for modifying pattern size: rather than affecting signal strength directly, variation at genes involved in HSPG synthesis could also modulate the spread of Wnt ligands. Similarly, temperature shocks experienced during early pupal life create analogous pattern aberrations (Fig. 4.4g’), suggesting that specific physiological conditions are critical for normal patterning and that, here again, a broad range of molecular mechanisms taking place during cell-cell signaling (e.g., signal secretion, transport, reception, and degradation) could affect pattern size. The variable checkerspot takes its name from the extensive color pattern variations (Bowers et al. 1985; Long et al. 2014b) that can be observed between populations (Fig. 4.4h). Can we predict whether a ligand locus is involved in driving the difference between these Wnt-positive black vs. red/black patterns? Based on the framework developed above, we believe this is in fact an unlikely scenario. Indeed, the variation involves little differences in pattern shape or number and instead consists in color composition differences. A difference in signal sensitivity rather than signal strength between the two forms is more likely to explain the phenomenon, resulting in a threshold trait variation (see Allen et al. 2008 for a discussion of pattern size vs. color composition). We thus predict that this polymorphism could map to a Wnt-pathway gene or to a gene that can modify the output of the Wnt signaling pathway and that this gene should be active during the extracellular signaling phase or shortly thereafter. Alternatively, the threshold traits could also depend on signal temporal dynamics (Sorre et al. 2014). To be formally tested, these competing hypotheses will require linkage or association mapping between natural morphs and illustrate how our current knowledge can guide a different set of predictions, based on the type of observed trait variation.

9 Synthesis: Variations of Morphological Relevance in Ligand-Coding Genes Are cis-Regulatory, Complex, and Multiallelic

We have seen in this review that cis-regulatory alleles in ligand genes can drive morphological evolution in nature. Four cases stand out by the level of scrutiny at which they have been examined, as their experimental dissection shines by exceptional levels of phylogenetic replication or genetic resolution: Agouti (Peromyscus maniculatus – Nebraska Sandhills: light and dark alleles), WntA (Heliconius spp. and Limenitis arthemis butterflies: wing pattern shape variation), wg (Drosophila guttifera: acquisition of novel pigmented patterns), and upd-like (Nasonia spp.: wing size differences). Based on the data at hand, we propose a set of hypotheses that can now be confronted to future experimentation:

  1. 1.

    Ligand cis-regulatory variation underlies heterotopies. The four loci above provide clear illustrations of the principle that a local modification of morphology (heterotopy) is likely to be based on cis-regulatory variation. Due to their direct role in cell fate induction, ligand genes can be expressed in new places to influence developmental patterning and eventually anatomical phenotypes.

  2. 2.

    Gene expression shifts require the accumulation of multiple changes clustered into complex alleles. Fine mapping of the Agouti and upd-like loci reveals multiple sub-genic regions which independently contribute to the total phenotype (Loehlin and Werren 2012; Linnen et al. 2013). The same is true for WntA in a recent hybrid zone study (Fig. 4.3b), where three noncoding regions were each associated to pattern variation in distinct subareas of the butterfly wing, with their combination constituting the complete phenotype (Van Belleghem et al. 2017). Finally, the wg study reveals the modular evolution of three tissue-specific enhancers that collectively explain the pigmentation features of D. guttifera (Koshikawa et al. 2015; Koshikawa 2015). These four cases are conceptually similar and show that cis-regulatory evolution relies on the accumulation of multiple changes to generate large effects on ligand expression and final morphology.

  3. 3.

    Parallel evolution is pervasive, even across distant lineages. The repeated finding of the same orthologous gene causing similar visible trait changes across distinct lineages may be expected under the candidate gene approach, as a result of ascertainment bias. The replicated identification of coding alleles of MC1R and Agouti is of that order. However, when independent experiments happen to pinpoint the same locus by taking a linkage or association mapping approach, then we can firmly infer that gene reuse underlies a phenomenon of evolutionary repetition (Martin and Orgogozo 2013; Orgogozo et al. 2015). We have seen that cis-regulatory alleles of Agouti have been repeatedly mapped in several populations and species of Peromyscus deer mice as well as in humans. The stickleback KITLG cis-regulatory changes were mirrored by other cis-regulatory variants driving both skin and hair color variation in human populations (Miller et al. 2007; Guenther et al. 2014). Finally, the WntA locus was mapped as a hotspot of wing pattern evolution in five Heliconius species as well as in a clade distant by about 65MY (Gallant et al. 2014). This implies that for a given phenotypic trait, the genetic basis of phenotypic variation may be relatively predictable in a post hoc fashion.

  4. 4.

    Multiallelism could precede the aggregation of complex alleles. The identification of multiallelism (syn. polyallelism, genetic heterogeneity) by forward genetic approaches is difficult in spite of their suspected importance in human disease (McClellan and King 2010). Indeed, detecting multiallelism requires a multiple-parent QTL scheme, and this has only been recently implemented in a handful of model organisms (Huang et al. 2011; Long et al. 2014a). Furthermore, GWAS studies typically underestimate the contributions of mixed alleles (Thornton et al. 2013). Several studies have nonetheless found that the pool of cis-regulatory variation influencing gene expression levels is multiallelic, to an overwhelming extent (Gruber and Long 2009; Zhang et al. 2011; King et al. 2014). Does this observation hold up for the spatial shift alleles considered here? As it turns out, replicated mapping within the H. erato and H. cydno radiations has identified six and four noncoding WntA alleles underlying ten distinct wing color shapes in these two species groups, respectively (Martin et al. 2012; Papa et al. 2013; Gallant et al. 2014). WntA thus exemplifies how repeated cis-regulatory modification of a ligand gene can replicate both within and between species, spanning a phylogenetic spectrum ranging from recently evolved populations (Van Belleghem et al. 2017) to distant lineages (Gallant et al. 2014). Importantly, this multiallelism probably acts as a prerequisite for the formation of complex alleles, as it is likely that adjacent regulatory regions evolve by recombination between blocks that exist as standing variation, rather than solely by cumulative de novo mutations on the same DNA molecule (Rebeiz et al. 2011; Martin and Orgogozo 2013). A chimeric, polyallelic origin can explain the cis-regulatory evolution of optix (Wallbank et al. 2015) (see also chapter by CD Jiggins in this volume), a transcription factor locus that, like WntA, shows extensive parallelism and multiallelism in the Heliconius genus (Reed et al. 2011; Papa et al. 2013; Martin et al. 2014; Kronforst and Papa 2015; Zhang et al. 2016). We expect that further examples of phenotypic radiations will uncover a multiallelic basis, as recently proposed in cichlid fishes (Roberts et al. 2016). The fact that we observe genetic heterogeneity shows that multiple variants can swarm in a gene pool and may thus provide the bricks of change to build novel cis-regulatory activities. We suggest that the large Agouti, upd-like, WntA, and wg haplotypes were agglomerated by recombination between multiple alleles segregating in ancestral populations (Martin and Orgogozo 2013).

10 Conclusion

In less than a decade, the DGT hypothesis has found validation in the forward genetics literature, where investigations that focused on a morphological difference (without a strong initial bias on the underlying genetics) eventually identified genetic toolkit loci. This is particularly true for signaling genes: four out of seven morphological gephes in sticklebacks involve secreted signaling ligands, and 18 WntA alleles have been associated to wing pattern variation in butterflies. We hope that the continuous compilation of the genetic basis of phenotypic evolution into Gephebase will facilitate similarly minded questions of broad interest and perhaps yield to broader insights and meta-analytical thinking in evolutionary genetics.