Sequence Analyses to Study the Evolutionary History and Cis-Regulatory Elements of Hedgehog Genes

Müller, Ferenc; Borycki, Anne-Gaelle

doi:10.1007/978-1-59745-516-9_16

Sequence Analyses to Study the Evolutionary History and Cis-Regulatory Elements of Hedgehog Genes

Ferenc Müller² &
Anne-Gaelle Borycki³

Protocol

2047 Accesses

Part of the book series: Methods Inmolecular Biology™ ((MIMB,volume 397))

Abstract

Sequence analysis and comparative genomics are powerful tools to gain knowledge on multiple aspects of gene and protein regulation and function. These have been widely used to understand the evolutionary history and the biochemistry of Hedgehog (Hh) proteins, and the molecular control of Hedgehog gene expression. Here, we report on some of the methods available to retrieve protein and genomic sequences. We describe how protein sequence comparison can produce information on the evolutionary history of Hh proteins. Moreover, we describe the use of genomic sequence analysis including phylogenetic footprinting and transcription factor-binding site search tools, techniques that allow for the characterization of cis-regulatory elements of developmental genes such as the Hedgehog genes.

Download protocol PDF

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

1 Introduction

The use of bioinformatics to analyze protein and genomic sequences is based on the principle that functional regions in proteins and genomes are less likely to undergo random mutational changes, hence conserved sequences are candidates for important structural or cis-regulatory function (1–7). The application of this principle to Hedgehog (hh) genes and proteins is particularly relevant. Not only hh genes are often highly conserved in their protein-coding sequence, but they have also highly conserved expression patterns among distantly related phylogenetic groups (8–12). This implies that homologs can be searched in different taxa based on the conservation of protein domains. A history of the evolution of this protein family can then be deduced from analysis of the number of homologs in each taxa, the rate of amino acid substitutions and the evolutionary distance between orthologs. In this chapter, we focus on the use of sequence analysis and comparative genomics for the identification of Hedgehog (Hh) family members in different taxa and the analysis of their evolutionary history.

Cis-regulatory elements (CRMs) of genes play a crucial role in the correct spatial and temporal expression of genes. Mutations in CRMs can cause gene misexpression and disease or expose individuals to higher risk of multifactorial diseases. For example mutations mapping in the vicinity of sonic hedgehog-regulatory elements have been suggested to cause preaxial polydactily (13,14). Therefore, identification of CRMs is an important step in understanding the genetic basis of human diseases. We describe here the current methods for the identification of cis-acting-regulatory elements of genes. Although no hh gene-specific protocols can be established for cis-regulatory sequence analysis, this chapter provides examples related to hh genes from the published literature. Rather than providing detailed protocols, we aim to give the reader general considerations and advice to apply best, these biocomputing tools to the study of Hh proteins and genes.

2 Materials

All software and algorithms cited in this chapter can be downloaded from the internet. Some of these are commercial packages, but most are free. We have listed their web sites in Table 1 . Moreover, a selection of useful websites with more software for phylogenetic analyses and tools for analysis of CRMs are also listed in Table 2 .

Table 1 Web Sites for Sequence, Phylogenetic and Cis-Regulatory Element Analyses

Full size table

Table 2 Websites with Bioinformatic Tools Mentioned in this Article

Full size table

3 Methods

3.1 Evolutionary Analysis of Hedgehog Proteins

The phylogenetic relationship and evolution of Hh proteins have been analyzed in considerable detail (15,16). Recently, further members of the Hh gene family have been reported in teleosts with the description of a second indian hedgehog and a desert hedgehog homologs (17).

3.1.1 Retrieving Protein Sequences for Phylogenetic Analyses

Protein sequences of conserved genes used to be predicted from a cDNA sequence, isolated either by degenerate polymerase chain reaction or by screening of cDNA libraries. Although these methods are still used in the case of nonmodel organisms, protein sequences are now mostly isolated in silico. There are numerous possibilities to find the sequences of interest by searching protein databases or genomic databases. Searches can be performed with keywords (i.e., Hh or Shh) and/or using Blast searches. NCBI and EBI have search tools to scan GenBank and Swiss-Prot.

Alternatively, animal model genomes can be searched using Ensembl, which in its newest version (v.37) contains several genomes, although not all complete and annotated (see Table 3 ).

Table 3 Genomes Available in Ensembl

Full size table

3.1.2 Protein Sequence Alignment

Protein sequences must then be aligned. For our purpose, a global alignment method, which performs progressive pairwise alignments should be used. ClustalW (18) or Clustal X (19) software have been widely used. However, with the recent growth of sequence databases, it has been necessary to develop other algorithms that can align large protein families with speed and accuracy. Thus, new software for multiple sequence alignment have been designed and include: T-Coffee (20) which is slower than Clustal but tends to perform better in sequence alignments. MAFFT (21) is another program, which performs very well with sequences of different lengths (see Note 1 ) and appears to be faster than Clustal. Finally, MUSCLE (22) is advertised as faster than T-Coffee or Clustal.

Before proceeding with the inference of a phylogenetic tree, sequence alignments should be checked and edited to realign sequences and eliminate gaps. Jalview provided in Clustal, MUSCLE, and T-Coffee allows you to edit your sequence alignment, whereas the PHYLIP package contains its own sequence editing program. Once the alignment has been performed, the tree file should be saved in the appropriate format (see Note 2 ).

3.1.3 Building a Phylogenetic Tree

There are three methods which make up two main classes to infer a phylogenetic tree: Character-based methods, which include maximum parsimony (MP) and maximum likelihood (ML) (23), and distance-based methods, which include the neighbor-joining (NJ) method (24). The former relies on character states, such as the position of an amino acid at a specific place, whereas with the latter method evolutionary distances are calculated as the number of amino acid replacements between two proteins. None of these methods provide entire satisfaction (i.e., will infer a true tree) because they rely on several assumptions; for instance, a constant rate of divergence of a taxa from an ancestor. NJPlot algorithms will build a tree based on the NJ method, whereas PHYLIP and PAUP allow for the inference of an evolutionary tree using NJ, MP, or ML methods. Because distance-based methods are more amenable to molecular data (such as protein sequences) and several methods including bootstrap analyses have been designed to establish the reliability of an evolutionary tree. NJ methods tend to be more widely used and have been the preferred method for analyses of Hh proteins (15,25). If using NJ Plot open the tree file (.nj) previously saved. If using PHYLIP, a tree can be drawn using DRAWGRAM. Both will draw rooted trees, which allow for evolutionary analyses, in contrast to unrooted trees, which only display the degree of relationship with no mention of the most recent ancestor. TREEVIEW is another software package to draw trees. It supports tree files in pretty much any format and will display bootstrap values.

If the assumption of rate constancy among taxa does not account for the actual rate of divergence, the inferred tree may appear erroneous (i.e., misplace a species or a group of species). These errors can be remedied by choosing an outgroup as a reference (i.e., a species for which we have previous knowledge that it diverged from a common ancestor prior to the other species listed). A new tree is then built based on a new distance matrix established from the reference ( Fig. 1 A).

3.1.4 Phylogenetic Tree Analyses

Tree reliability: One of the advantages of using the NJ method is that it allows for bootstrap analysis, a computational method to apply statistics on a tree topology (26). This technique calculates the level of confidence for each clade of an inferred tree. This is done through a resampling technique where a series of pseudosamples are generated (usually between 500 and 1000, see Note 3 ) and the deduced trees are compared with the inferred one. A bootstrap value, expressed as the percentage of trees having the same topology as the inferred tree, is then calculated. It is usually admitted that a bootstrap value of >95 corresponds to a high level of confidence in the clade, whereas values <70 show a low level of confidence. Bootstrap can be run from PHYLIP using Seqboot or Clustal.

Estimating divergence time: An estimation of the evolutionary divergence time can be calculated from a distance-based tree ( Fig. 1 A). This calculation is based on the hypothesis that the rate of amino acid substitutions is constant during evolution. First, the rate of divergence per site per million years, r, is calculated for two species for which the divergence time, T ₁, is known from other data (paleontological records, molecular data). Usually, vertebrates are a better choice because there are many records available providing the best approximate divergence time (see Note 4 ).

r = d/2T ₁, where d is the average distance between the two species chosen and the distance is directly proportional to the rate of amino acid substitution. Once r is determined, it can be applied to the equation T ₂ = d _avg/2r, where T ₂ is the unknown divergence time between two species/events we are interested in and d _avg is the average distance between these two species/event.

Using similar calculations, it was found that the divergence time between Shh and Ihh, and Shh and Dhh was 563 and 662 my, respectively (15), which suggests that the first duplication of the Hh gene to give rise to the Dhh family occurred prior to the emergence of chordates (550 my) (27,28). This is not consistent with the fact that prior to the emergence of vertebrates, a single Hh gene is found in all three phyla, Deuterostomia, Ecdysozoa, and Lophotrochozoa ( Fig. 1 A,B). In particular, the presence of a single Hh gene in the cephalochordate amphioxus Branchiostoma floridae (12) suggests that the duplication event that gave rise to Hh1 and Hh2 in the urochordate Ciona intestinalis occurred independently from the duplication events leading to the Dhh, Ihh, and Shh families ( Fig. 1 A–C) (29). An interesting exception to the existence of a single Hh is that of the nematode Caenorhabditis elegans for which no true Hh ortholog was found. In contrast, closer sequence comparisons with subdomains of the Hh protein unraveled that several C. elegans proteins were homologs to the C-terminal region of Hh and formed a family of proteins, the inteins, with endonuclease activity (30). Because earlier taxa such as the mollusc Proteus vulgaris and the Annelid P. capitella do have a single Hh gene, this would suggest that nematodes have had Hh proteins but lost them during evolution. Alternatively, there is the possibility that nematodes do not belong to Ecdysozoa and form an earlier taxon (31). There are data consistent with a grouping of Arthropods and vertebrates together (protostome and deuterostome), called Coelomata that leave out the nematodes, which form an earlier phylum, the Pseudocoelomata (32). If this were the case, Hh would have evolved after the emergence of nematodes and before the Coelomata group.

3.2 Detection of Cis-Regulatory Elements of Hedgehog Genes by Sequence Analysis

CRMs do not have stringent directional, positional, and compositional constraints such as coding exons, which makes their automated detection with bioinformatics tools more difficult. One technique often used is phylogenetic footprinting (33), which, is based on the principle that alignment of noncoding sequences from different species reveals evolutionarily conserved segments that are candidates for cis-regulatory function (1,3,5,7,34). Bioinformatic tools which utilize phylogenetic footprinting to detect such regions have been reviewed recently (35–38). Phylogenetic footprinting has been used extensively to identify putative CRMs of sonic hedgehog orthologs (36,39–42).

3.2.1 Choice of Sequence Alignment and Visualization Tools

Two main strategies can be followed in sequence alignment: The local alignment protocol (e.g., BLASTZ [43]) searches for short stretches of similarity between the sequences, which are then extended, whereas global alignment tools (e.g., LAGAN [44]) search for best alignment over the entire length of the sequence using local similarities as anchors (see Note 5 ). A recent addition to LAGAN also allows for the detection of inversions between the two compared sequences (shuffle-LAGAN [44]). Global alignment tools have a higher sensitivity, whereas local tools provide better specificity in detection of shorter conserved blocks (45). Results of sequence alignments are usually displayed through web-based graphical tools, such as PipMaker (46), ECR browser (47), and VISTA (48,49), which indicate conservations above certain threshold levels. Because of their distinct designs, the performance of global and local alignment algorithms differs in the detection of conservation. Notably, the DiAlign tool (50,51) allows for both local and global alignment output modes.

3.2.2 Choice of Genomes for Cross Species Comparison

Comparisons of multiple species (“phylogenetic shadowing”) (38), using a set of closely related species (e.g., Refs. [50,52]), may be applied for the identification of conserved elements. However, the efficiency of finding conserved CRMs by phylogenetic footprinting (both in terms of number and level of conservation) is dependent on the evolutionary distance between the species compared (38,53). Comparisons between mouse and human (approx. 90 million years, Fig. 1 C) provide close evolutionary distance with high degree of conservation among functionally relevant binding sites placed in conserved blocks (54–58). However, the slow rate of neutral divergence among vertebrates, may result in the retention of conserved sequences with no regulatory role between species with short evolutionary distance (59). Several vertebrate genomes representing most major classes have recently been sequenced (see Table 3 ), providing the raw material for comparative analyses of species with greater evolutionary distances than mammals. A note of caution must be applied though, the greater the evolutionary distance, the more likely regulatory elements will have diverged. Thus, a lower number of regulatory elements will have retained conserved transcriptional activities, reducing the likelihood of identifying conserved CRMs (60). However, it is generally observed that developmentally regulated genes (including hh genes) and transcription factors tend to be more conserved in their CRMs than other genes (40,61). This was particularly striking in CRMs of fish and mammals sonic hedgehog orthologs that are separated by 450 my and still show remarkable conservation (36).

3.2.3 Variable Divergence of CRMs Within a Locus

CRMs within one gene locus may have different rates of change, as is the case for the shh locus itself. For example, four enhancers named ar-A to ar-D, are involved in shh activation in the zebrafish midline tissues. These four CRMs show varying degree of conservation between pufferfish and mouse (36,62–64). Interestingly, ar-A and ar-C are conserved between fish and mouse, whereas ar-B also shows significant sequence similarity when compared with zebrafish and pufferfish (Tetraodon nigroviridis), indicating that the phylogenetic footprinting approach can result in the detection of additional functional regulatory elements when the evolutionary distance between the species used in the analysis matches the rate of change in regulatory sequences. The enhancer ar-C is significantly conserved in mouse but less than ar-A, and is active in the midline in zebrafish and mouse. Strikingly, no function has been assigned to the well-conserved ar-A in mouse. This may indicate a conservation due to functional constraints other than CRM (reviewed in Ref. [65]). Significant sequence similarity in the 3′ UTR region of shh genes has also been observed between fish and mouse. However, no published data is available for a putative function of these conserved sequences.

3.2.4 Identification of Long Distance Regulatory Elements

It is not always trivial to assign a predicted conserved regulatory element to its cognate gene. The distance limit of regulatory elements from their regulated gene is not at all deciphered, and looping of chromatin over 40 Mb to sites of transcriptional activity has been demonstrated (66). Bacterial or phage artificial chromosome vectors provide a technology for analysis of regulatory elements over large distances (42). This approach allowed for the detection of shh-regulatory elements that lay several hundred kilobases away from the coding sequence in the mouse. Several of the elements identified in the mouse (SBE 2, 3, and 4) are well conserved among human, chicken, and frog, but not teleost fish sequences (42). Interestingly, the function of these long distance elements is to drive shh expression in the ventral diencephalon, an activity covered by the intronic ar-C enhancer in the fish. This functional divergence of enhancers may explain the lack of conservation of SBE2-4 and suggests that subfunctionalization mechanisms may be involved in the evolution of shh CRMs (67).

A large number of genes are likely to contain CRMs at very long distance from the gene locus (68). An extreme example is the case of the sonic hedgehog limb enhancer, which lies 1 Mb away from the shh coding sequence in the intron of the lbmr1 gene (69). This enhancer is highly conserved among vertebrates both in terms of its sequence and its interdigital position in the lmbr1 gene (70). This example suggests that further regulatory elements placed at a large distance may function in the regulation of shh. Indeed, several conserved noncoding elements were found at long distances from shh (up to 50 kb in fugu) and when tested in zebrafish embryos, provided enhancer activity (41). It may be possible to identify these elements by limiting the search to chromosomal regions that remain unchanged during evolution. The interdigitation of coding genes with embedded regulatory elements of other neighboring genes also implies an evolutionary constraint on chromosomal rearrangements to avoid breakpoints in such regions. Conserved chromosomal synteny has been suggested to aid in predicting the limits of the regulatory regions of a gene (71,72). Thus, comparisons between multiple species should establish the furthest, long distance CRMs are located from the promoter by analyzing the breakpoints of syntenic fragments. To assist researchers in these analyses, the Ensembl genome server database provides mammalian and chick chromosomal synteny, whereas an independent web server provides fugu and human synteny analysis (73).

3.2.5 Identification of the Transcriptional Start Site and the Core Promoter

Core or basal promoters are positionally defined regulatory regions, which are located about 50–100 base pairs (bp) up- and/or downstream of the transcriptional start site (TSS), and are required for the formation of preinitiation complexes for subsequent transcription initiation (74) (see Note 6 ). The absence of experimental approaches to characterize TSSs and the diversity of promoter types made it relatively difficult to predict accurately core promoter regions using sequence analysis, despite the large number of programs available on the internet (see Tables 1 and 2 for a selection of tools). Prediction of core promoters has recently improved substantially, due to the accumulation of large-scale data on TSS (75,76). Promoter predictors based on searching for motifs such as the TATA box (reviewed in Ref. [74]) failed, as it is now known that only a subset of human genes whose transcription is initiated by the RNA polymerase II contain a TATA box (77). The characterization of motifs involved in transcription initiation of the remaining genes is still in progress (77,78). A TATA box is however present in vertebrate shh genes (79,80). Interestingly, transcription factors and brain-specific genes were found to have shorter conserved blocks than other genes (81). The core promoter of vertebrate shh genes have been characterized in fish and human (79,80) and were shown to contain two TSSs and to be regulated by retinoic acid and Foxa2 (HNF3β).

3.2.6 Transcription Factor-Binding Site Analysis

Information on transcription factor-binding sites are available in either commercial (like TRANSFAC (82), Genomatix) or open access (JASPAR [83]) databases. Binding-site clustering is a feature of CRMs (84), which is utilized by several algorithms (85–91). The predictive value of such clustering approaches is enhanced by incorporating sequence conservation criteria (see Ref. (92) for example). Ahab also detects clusters of weak sites (93,94), and this can be further improved with Stubb, which includes comparative information and allows for the prediction of regulatory modules (95,96). To search entire genomes for coexpressed genes, a software package (CisOrtho [97]) was developed which evaluates the co-occurrence of motifs in orthologs regions. CRMs of coregulated genes show “signatures”, i.e., transcription factor-binding site combinations with distinct spacing and orientation requirements (90,98), which seem to be retained between species even when the overall sequence similarity is low (90). On the basis of this finding, TraFaC identifies conserved TF-binding sites by scanning regions of conserved sequence similarity to detect co-occurrence of binding sites (99), whereas rVista (100,101) and ConSite (57) score aligned binding sites in conserved regions. CONREAL (102) applies a similar approach and uses binding- site predictions as anchors for sequence alignment, and performs better than other sequence alignment programs when aligning sequences from distant species. As more algorithms for motif detection that take into account phylogenetic conservation (e.g., PhyloCon [103], CompareProspector [104], Footprinter [105]) become available, functional-binding sites in hedgehog genes and other developmentally regulated genes will be identified.

4 Notes

1.
It has been reported that variations in sequence length affect the accuracy of sequence alignments. ClustalW seems to be more sensitive to this issue than MAFFT. Thus, it is recommended to include sequences covering regions of similar length, although a sufficiently large portion of the protein sequence should be included to make the analysis meaningful. Comparing fragments of Hh protein to other full-length Hh proteins, for instance, can only lead to unmeaningful data.
2.
Take care of saving the tree file corresponding to the sequence alignment in the correct format (.nj if you are to use NJPlot to draw the tree or .ph if you are to use PHYLIP).
3.
It is common in the literature to see bootstrap samples of 100 or 200. It is recommended to use 500–1000, especially if many species are involved.
4.
Listed here are some evolutionary divergence times commonly used (see Fig. 1 C). Rat/mouse, 41 my; mammals/fishes, 450 my; mammals/amphibians, 360 my; mammals/birds, 310 my.
5.
A consideration when choosing a particular program is that many algorithms have been optimized for specific-species comparisons (e.g., BlastZ for human-mouse, WABA (106) for C. elegans-C. briggsae) and may not perform well with other species.
6.
A recent larger-scale analysis of mouse and human promoters identified conserved blocks within 500 bp from the start site, thereby defining the likely 5′ limit of proximal promoter regions (58).

References

Bejerano, G., Pheasant, M., Makunin, I., et al. (2004) Ultraconserved elements in the human genome. Science 304, 1321–1325.
Article CAS PubMed Google Scholar
Dermitzakis, E. T. and Clark, A. G. (2002) Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19, 1114–1121.
CAS PubMed Google Scholar
Frazer, K. A., Sheehan, J. B., Stokowski, R. P., et al. (2001) Evolutionarily conserved sequences on human chromosome 21. Genome Res. 11, 1651–1659.
Article CAS PubMed Google Scholar
Hillier, L. W., Miller, W., Birney, E., (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716.
Article CAS Google Scholar
Mural, R. J., Adams, M. D., Myers, E. W., et al. (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296, 1661–1671.
Article CAS PubMed Google Scholar
Rubin, G. M., Yandell, M. D., Wortman, J. R., et al. (2000) Comparative genomics of the eukaryotes. Science 287, 2204–2215.
Article CAS PubMed Google Scholar
Waterston, R. H., Lindblad-Toh, K., Birney, E., Rogers, J., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562.
Article CAS PubMed Google Scholar
Echelard, Y., Epstein, D. J., St-Jacques, B., et al. (1993) Sonic Hedgehog, a member of a family of putative signaling molecules, is implicated in the regulation of CNS polarity. Cell 75, 1417–1430.
Article CAS PubMed Google Scholar
Johnson, R. L., Laufer, E., Riddle, R. D., and Tabin, C. (1994) Ectopic expression of Sonic Hedgehog alters dorsal-ventral patterning of somites. Cell 79, 1165–1173.
Article PubMed Google Scholar
Krauss, S., Concordet, J. P., and Ingham, P. W. (1993) A functionally conserved homolog of the Drosophila segment polarity gene hh is expressed in tissues with polarizing activity in zebrafish embryos. Cell 75, 1431–1444.
Article CAS PubMed Google Scholar
Ruiz I Altaba, A., Jessell, T. M., and Roelink, H. (1995) Restrictions to floor plate induction by Hedgehog and winged-helix genes in the neural tube of frog embryos. Mol. Cell. Neurosci. 6, 106–121.
Article CAS PubMed Google Scholar
Shimeld, S. M. (1999) The evolution of the Hedgehog gene family in chordates: insights from amphioxus hedgehog. Dev. Genes Evol. 209, 40–47.
Article CAS PubMed Google Scholar
Tsukurov, O., Boehmer, A., Flynn, J., et al. (1994) A complex bilateral polysyndactyly disease locus maps to chromosome 7q36. Nat. Genet. 6, 282–286.
Article CAS PubMed Google Scholar
Lettice, L. A., Horikoshi, T., Heaney, S. J., et al. (2002). Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc. Natl Acad. Sci. USA 99, 7548–7553.
Article CAS PubMed Google Scholar
Kumar, S., Balczarek, K. A., and Lai, Z. C. (1996) Evolution of the Hedgehog gene family. Genetics 142, 965–972.
CAS PubMed Google Scholar
Zardoya, R., Abouheif, E., and Meyer, A. (1996) Evolutionary analyses of Hedgehog and Hoxd-10 genes in fish species closely related to the zebrafish. Proc. Natl Acad. Sci. USA 93, 13,036–13,041.
Article CAS PubMed Google Scholar
Avaron, F., Hoffman, L., Guay, D., and Akimenko, M. A. (2006) Characterization of two new zebrafish members of the Hedgehog family: atypical expression of a zebrafish indian hedgehog gene in skeletal elements of both endochondral and dermal origins. Dev. Dyn. 235, 478–489.
Article CAS PubMed Google Scholar
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.
Article CAS PubMed Google Scholar
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882.
Article CAS PubMed Google Scholar
Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217.
Article CAS PubMed Google Scholar
Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066.
Article CAS PubMed Google Scholar
Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113.
Article PubMed CAS Google Scholar
Felsenstein, J. (1988) Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet 22, 521–565.
Article CAS PubMed Google Scholar
Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425.
CAS PubMed Google Scholar
Zardoya, R., Abouheif, E., and Meyer, A. (1996) Evolution and orthology of Hedgehog genes. Trends Genet. 12, 496–497.
Article CAS PubMed Google Scholar
Felsenstein, J. (1985) Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39, 783–791.
Article Google Scholar
Conway Morris, S. (1993) The fossil record and the early evolution of the Metazoa. Nature 361, 219–225.
Article Google Scholar
Dehal, P., Satou, Y., Campbell, R.K., et al. (2002) The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298, 2157–2167.
Article CAS PubMed Google Scholar
Takatori, N., Satou, Y., and Satoh, N. (2002) Expression of Hedgehog genes in Ciona intestinalis embryos. Mech. Dev. 116, 235–238.
Article CAS PubMed Google Scholar
Aspock, G., Kagoshima, H., Niklaus, G., and Burglin, T. R. (1999) Caenorhabditis elegans has scores of Hedgehog-related genes: sequence and expression analysis. Genome Res. 9, 909–923.
Article CAS PubMed Google Scholar
Hedges, S. B. (2002). The origin and evolution of model organisms. Nat. Rev. Genet. 3, 838–849.
Article CAS PubMed Google Scholar
Blair, J. E., Ikeo, K., Gojobori, T., and Hedges, S. B. (2002) The evolutionary position of nematodes. BMC Evol. Biol. 2, 7.
Article PubMed Google Scholar
Tagle, D. A., Koop, B. F., Goodman, M., Slightom, J. L., Hess, D. L., and Jones, R. T. (1988) Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 439–455.
Article CAS PubMed Google Scholar
Dermitzakis, E. T., Reymond, A., Lyle, R., Scamuffa, N., et al. (2002) Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 420, 578–582.
Article CAS PubMed Google Scholar
Wasserman, W. W. and Sandelin, A. (2004) Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287.
Article CAS PubMed Google Scholar
Muller, F., Blader, P., and Strahle, U. (2002). Search for enhancers: teleost models in comparative genomic and transgenic analysis of cis regulatory elements. Bioessays 24, 564–572.
Article PubMed CAS Google Scholar
Nardone, J., Lee, D. U., Ansel, K. M., and Rao, A. (2004) Bioinformatics for the ‘bench biologist’: how to find regulatory regions in genomic DNA. Nat. Immunol. 5, 768–774
Article CAS PubMed Google Scholar
Boffelli, D., Nobrega, M. A., and Rubin, E. M. (2004). Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 5, 456–465.
Article CAS PubMed Google Scholar
Lemos, B., Yunes, J. A., Vargas, F. R., Moreira, M. A., Cardoso, A. A., and Seuanez, H. N. (2004) Phylogenetic footprinting reveals extensive conservation of Sonic Hedgehog (SHH) regulatory elements. Genomics 84, 511–523.
Article CAS PubMed Google Scholar
Woolfe, A., Goodson, M., Goode, D. K., et al. (2004) Highly conserved noncoding sequences are associated with vertebrate development. PLoS Biol. 3, e7.
Article PubMed CAS Google Scholar
Goode, D. K., Snell, P., Smith, S. F., Cooke, J. E., and Elgar, G. (2005) Highly conserved regulatory elements around the SHH gene may contribute to the maintenance of conserved synteny across human chromosome 7q36.3. Genomics 86, 172–181.
Article CAS PubMed Google Scholar
Jeong, Y. and Epstein, D. J. (2003) Distinct regulators of Shh transcription in the floor plate and notochord indicate separate origins for these tissues in the mouse node. Development 130, 3891–3902.
Article CAS PubMed Google Scholar
Schwartz, S., Kent, W. J., Smit, A., et al. (2003) Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107.
Article CAS PubMed Google Scholar
Brudno, M., Do, C. B., Cooper, G. M., et al. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731.
Article CAS PubMed Google Scholar
Pollard, D. A., Bergman, C. M., Stoye, J., Celniker, S. E., and Eisen, M. B. (2004) Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 5, 6.
Article PubMed Google Scholar
Schwartz, S., Zhang, Z., Frazer, K. A., et al. (2000) PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586.
Article CAS PubMed Google Scholar
Ovcharenko, I., Nobrega, M. A., Loots, G. G., and Stubbs, L. (2004) ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res. 32(Web Server issue), W280–W286.
Article CAS PubMed Google Scholar
Mayor, C., Brudno, M., Schwartz, J. R., et al. (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16, 1046–1047.
Article CAS PubMed Google Scholar
Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M., and Dubchak, I. (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32(Web Server issue), W273–W279.
Article CAS PubMed Google Scholar
Brudno, M., Steinkamp, R., and Morgenstern, B. (2004) The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences. Nucleic Acids Res. 32(Web Server issue), W41–W44.
Article CAS PubMed Google Scholar
Morgenstern, B. (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218.
Article CAS PubMed Google Scholar
Ovcharenko, I., Boffelli, D., and Loots, G. G. (2004) eShadow: a tool for comparing closely related sequences. Genome Res. 14, 1191–1198.
Article CAS PubMed Google Scholar
Cooper, G. M. and Sidow, A. (2003) Genomic regulatory regions: insights from comparative sequence analysis. Curr. Opin. Genet. Dev. 13, 604–610.
Article CAS PubMed Google Scholar
Hardison, R. C. (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16, 369–372.
Article CAS PubMed Google Scholar
Oeltjen, J. C., Malley, T. M., Muzny, D. M., Miller, W., Gibbs, R. A., and Belmont, J. W. (1997) Large-scale comparative sequence analysis of the human and murine Bruton’s tyrosine kinase loci reveals conserved regulatory domains. Genome Res. 7, 315–329.
CAS PubMed Google Scholar
Brickner, A. G., Koop, B. F., Aronow, B. J., and Wiginton, D. A. (1999) Genomic sequence comparison of the human and mouse adenosine deaminase gene regions. Mamm Genome 10, 95–101.
Article CAS PubMed Google Scholar
Lenhard, B., Sandelin, A., Mendoza, L., Engstrom, P., Jareborg, N., and Wasserman, W. W. (2003) Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2, 13.
Article PubMed Google Scholar
Suzuki, Y., Yamashita, R., Shirota, M., et al. (2004) Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions. Genome Res. 14, 1711–1718.
Article PubMed Google Scholar
Tautz, D. (2000) Evolution of transcriptional regulation. Curr. Opin. Genet. Dev. 10, 575–579.
Article CAS PubMed Google Scholar
Thomas, J. W., Touchman, J. W., Blakesley, R. W., et al. (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793.
Article CAS PubMed Google Scholar
Plessy, C., Dickmeis, T., Chalmel, F., and Strähle, U. (2005) Enhancer sequence conservation between vertebrates is favoured in developmental regulator genes. Trends Genet. 21, 207–210.
Article CAS PubMed Google Scholar
Muller, F., Chang, B., Albert, S., Fischer, N., Tora, L., and Strahle, U. (1999) Intronic enhancers control expression of zebrafish sonic Hedgehog in floor plate and notochord. Development 126, 2103–2116.
CAS PubMed Google Scholar
Epstein, D. J., McMahon, A. P., and Joyner, A. L. (1999) Regionalization of Sonic Hedgehog transcription along the anteroposterior axis of the mouse central nervous system is regulated by Hnf3-dependent and-independent mechanisms. Development 126, 281–292.
CAS PubMed Google Scholar
Goode, D. K., Snell, P. K., and Elgar, G. K. (2003) Comparative analysis of vertebrate Shh genes identifies novel conserved non-coding sequence. Mamm Genome 14, 192–201.
Article CAS PubMed Google Scholar
Adams, M. D. (2005) Conserved sequences and the evolution of gene regulatory signals. Curr. Opin. Genet. Dev. 15, 628–633.
Article CAS PubMed Google Scholar
Osborne, C. S., Chakalova, L., Brown, K. E., et al. (2004) Active genes dynamically colocalize to shared sites of ongoing transcription. Nat. Genet.
Google Scholar
Force, A., Lynch, M., Pickett, F. B., Amores, A., Yan, Y. L., and Postlethwait, J. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545.
CAS PubMed Google Scholar
Vavouri, T., McEwen, G. K., Woolfe, A., Gilks, W. R., and Elgar, G. (2006) Defining a genomic radius for long-range enhancer action: duplicated conserved non-coding elements hold the key. Trends Genet. 22, 5–10.
Article CAS PubMed Google Scholar
Lettice, L. A., Heaney, S. J., Purdie, L. A., et al. (2003) A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735.
Article CAS PubMed Google Scholar
Sagai, T., Hosoya, M., Mizushina, Y., Tamura, M., and Shiroishi, T. (2005) Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb. Development 132, 797–803.
Article CAS PubMed Google Scholar
Mackenzie, A., Miller, K. A., and Collinson, J. M. (2004) Is there a functional link between gene interdigitation and multi-species conservation of synteny blocks? Bioessays 26, 1217–1224.
Article CAS PubMed Google Scholar
Flint, J., Tufarelli, C., Peden, J., et al. (2001) Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the alpha globin cluster. Hum. Mol. Genet. 10, 371–382.
Article CAS PubMed Google Scholar
Halling-Brown, M., Sansom, C., Moss, D. S., Elgar, G., and Edwards, Y. J. (2004) A Fugu-Human Genome Synteny Viewer: web software for graphical display and annotation reports of synteny between Fugu genomic sequence and human genes. Nucleic Acids Res. 32, 2618–2622.
Article CAS PubMed Google Scholar
Butler, J. E. and Kadonaga, J. T. (2002) The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16, 2583–2592.
Article CAS PubMed Google Scholar
Hashimoto, S., Suzuki, Y., Kasai, Y., et al. (2004) 5′-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22, 1146–1149.
Article CAS PubMed Google Scholar
Kawaji, H., Kasukawa, T., Fukuda, S., et al. (2006) CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis. Nucleic Acids Res. 34 (Database issue), D632–D636.
Article CAS PubMed Google Scholar
FitzGerald, P. C., Shlyakhtenko, A., Mir, A. A., and Vinson, C. (2004) Clustering of DNA sequences in human promoters. Genome Res. 14, 1562–1574.
Article CAS PubMed Google Scholar
Kadonaga, J. T. (2002) The DPE, a core promoter element for transcription by RNA polymerase II. Exp. Mol. Med. 34, 259–264.
CAS PubMed Google Scholar
Kitazawa, S., Kitazawa, R., Tamada, H., and Maeda, S. (1998) Promoter structure of human sonic Hedgehog gene. Biochim. Biophys. Acta 1443, 358–363.
CAS PubMed Google Scholar
Chang, B. E., Blader, P., Fischer, N., Ingham, P. W., and Strahle, U. (1997) Axial (HNF3beta) and retinoic acid receptors are regulators of the zebrafish sonic Hedgehog promoter. EMBO J. 16, 3955–3964.
Article CAS PubMed Google Scholar
Suzuki, Y., Yamashita, R., Sugano, S., and Nakai, K. (2004) DBTSS, DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Res. 32(Database issue), D78–D81.
Article CAS PubMed Google Scholar
Wingender, E., Dietze, P., Karas, H., and Knuppel, R. (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241.
Article CAS PubMed Google Scholar
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32(Database issue), D91–D94.
Article CAS PubMed Google Scholar
Arnone, M. I. and Davidson, E. H. (1997) The hardwiring of development: organization and function of genomic regulatory systems. Development 124, 1851–1864.
CAS PubMed Google Scholar
Markstein, M., Markstein, P., Markstein, V., and Levine, M. S. (2002) Genomewide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl Acad. Sci. USA 99, 763–768.
Article CAS PubMed Google Scholar
Stathopoulos, A., Van Drenth, M., Erives, A., Markstein, M., and Levine, M. (2002) Whole-genome analysis of dorsal-ventral patterning in the Drosophila embryo. Cell 111, 687–701.
Article CAS PubMed Google Scholar
Rebeiz, M., Reeves, N. L., and Posakony, J. W. (2002) SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc. Natl Acad. Sci. USA 99, 9888–9893.
Article CAS PubMed Google Scholar
Berman, B. P., Nibu, Y., Pfeiffer, B. D., et al. (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA 99, 757–762.
Article CAS PubMed Google Scholar
Halfon, M. S., Grad, Y., Church, G. M., and Michelson, A. M. (2002) Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12, 1019–1028.
CAS PubMed Google Scholar
Erives, A. and Levine, M. (2004). Coordinate enhancers share common organizational features in the Drosophila genome. Proc. Natl Acad. Sci. USA 101, 3851–3856.
Article CAS PubMed Google Scholar
Markstein, M., Zinzen, R., Markstein, P., et al. (2004) A regulatory code for neurogenic gene expression in the Drosophila embryo. Development 131, 2387–2394.
Article CAS PubMed Google Scholar
Berman, B. P., Pfeiffer, B. D., Laverty, T. R., et al. (2004) Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5, R61.
Article PubMed Google Scholar
Rajewsky, N., Vergassola, M., Gaul, U., and Siggia, E. D. (2002) Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3, 30.
Article PubMed Google Scholar
Schroeder, M. D., Pearce, M., Fak, J., et al. (2004) Transcriptional control in the segmentation gene network of Drosophila. PLoS Biol. 2, E271.
Article PubMed CAS Google Scholar
Sinha, S., van Nimwegen, E., and Siggia, E. D. (2003) A probabilistic method to detect regulatory modules. Bioinformatics 19(Suppl 1), i292–i301.
Article PubMed Google Scholar
Sinha, S., Schroeder, M. D., Unnerstall, U., Gaul, U., and Siggia, E. D. (2004) Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics 5, 129.
Article PubMed CAS Google Scholar
Bigelow, H. R., Wenick, A. S., Wong, A., and Hobert, O. (2004) CisOrtho: a program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting. BMC Bioinformatics 5, 27.
Article PubMed Google Scholar
Senger, K., Armstrong, G. W., Rowell, W. J., Kwan, J. M., Markstein, M., and Levine, M. (2004) Immunity regulatory DNAs share common organizational features in Drosophila. Mol. Cell. 13, 19–32.
Article CAS PubMed Google Scholar
Jegga, A. G., Sherwood, S. P., Carman, J. W., et al. (2002) Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. Genome Res. 12, 1408–1417.
Article CAS PubMed Google Scholar
Loots, G. G. and Ovcharenko, I. (2004) rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32 (Web Server issue), W217–W221.
Article CAS PubMed Google Scholar
Loots, G. G., Ovcharenko, I., Pachter, L., Dubchak, I., and Rubin, E. M. (2002) rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 12, 832–839.
PubMed Google Scholar
Berezikov, E., Guryev, V., Plasterk, R. H., and Cuppen, E. (2004) CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res. 14, 170–178.
Article CAS PubMed Google Scholar
Wang, T. and Stormo, G. D. (2003) Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19, 2369–2380.
Article CAS PubMed Google Scholar
Liu, Y., Liu, X. S., Wei, L., Altman, R. B., and Batzoglou, S. (2004) Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res. 14, 451–458.
Article CAS PubMed Google Scholar
Blanchette, M. and Tompa, M. (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12, 739–748.
Article CAS PubMed Google Scholar
Kent, W. J. and Zahler, A. M. (2000) Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res. 10, 1115–1125.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Toxicology and Genetics, Forschungszentrum, Karlsruhe. Postfach., Karlsruhe, Germany
Ferenc Müller
Department of Biomedical Science, University of Sheffield, Centre for Developmental Genetics, Sheffield, UK
Anne-Gaelle Borycki

Authors

Ferenc Müller
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Gaelle Borycki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deptment of Biomedical Sciences College of Medicine, Florida State University, Tallahassee, Florida, USA
Jamila I. Horabin (Associate Professor) (Associate Professor)

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Müller, F., Borycki, AG. (2007). Sequence Analyses to Study the Evolutionary History and Cis-Regulatory Elements of Hedgehog Genes. In: Horabin, J.I. (eds) Hedgehog Signaling Protocols. Methods Inmolecular Biology™, vol 397. Humana Press. https://doi.org/10.1007/978-1-59745-516-9_16

Download citation

DOI: https://doi.org/10.1007/978-1-59745-516-9_16
Publisher Name: Humana Press
Print ISBN: 978-1-58829-692-4
Online ISBN: 978-1-59745-516-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Abstract

1 Introduction

2 Materials

3 Methods

3.1 Evolutionary Analysis of Hedgehog Proteins

3.1.1 Retrieving Protein Sequences for Phylogenetic Analyses

3.1.2 Protein Sequence Alignment

3.1.3 Building a Phylogenetic Tree

3.1.4 Phylogenetic Tree Analyses

3.2 Detection of Cis-Regulatory Elements of Hedgehog Genes by Sequence Analysis

3.2.1 Choice of Sequence Alignment and Visualization Tools

3.2.2 Choice of Genomes for Cross Species Comparison

3.2.3 Variable Divergence of CRMs Within a Locus

3.2.4 Identification of Long Distance Regulatory Elements

3.2.5 Identification of the Transcriptional Start Site and the Core Promoter

3.2.6 Transcription Factor-Binding Site Analysis

4 Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation