Abstract
Sequence analysis and comparative genomics are powerful tools to gain knowledge on multiple aspects of gene and protein regulation and function. These have been widely used to understand the evolutionary history and the biochemistry of Hedgehog (Hh) proteins, and the molecular control of Hedgehog gene expression. Here, we report on some of the methods available to retrieve protein and genomic sequences. We describe how protein sequence comparison can produce information on the evolutionary history of Hh proteins. Moreover, we describe the use of genomic sequence analysis including phylogenetic footprinting and transcription factor-binding site search tools, techniques that allow for the characterization of cis-regulatory elements of developmental genes such as the Hedgehog genes.
1 Introduction
The use of bioinformatics to analyze protein and genomic sequences is based on the principle that functional regions in proteins and genomes are less likely to undergo random mutational changes, hence conserved sequences are candidates for important structural or cis-regulatory function (1–7). The application of this principle to Hedgehog (hh) genes and proteins is particularly relevant. Not only hh genes are often highly conserved in their protein-coding sequence, but they have also highly conserved expression patterns among distantly related phylogenetic groups (8–12). This implies that homologs can be searched in different taxa based on the conservation of protein domains. A history of the evolution of this protein family can then be deduced from analysis of the number of homologs in each taxa, the rate of amino acid substitutions and the evolutionary distance between orthologs. In this chapter, we focus on the use of sequence analysis and comparative genomics for the identification of Hedgehog (Hh) family members in different taxa and the analysis of their evolutionary history.
Cis-regulatory elements (CRMs) of genes play a crucial role in the correct spatial and temporal expression of genes. Mutations in CRMs can cause gene misexpression and disease or expose individuals to higher risk of multifactorial diseases. For example mutations mapping in the vicinity of sonic hedgehog-regulatory elements have been suggested to cause preaxial polydactily (13,14). Therefore, identification of CRMs is an important step in understanding the genetic basis of human diseases. We describe here the current methods for the identification of cis-acting-regulatory elements of genes. Although no hh gene-specific protocols can be established for cis-regulatory sequence analysis, this chapter provides examples related to hh genes from the published literature. Rather than providing detailed protocols, we aim to give the reader general considerations and advice to apply best, these biocomputing tools to the study of Hh proteins and genes.
2 Materials
All software and algorithms cited in this chapter can be downloaded from the internet. Some of these are commercial packages, but most are free. We have listed their web sites in Table 1 . Moreover, a selection of useful websites with more software for phylogenetic analyses and tools for analysis of CRMs are also listed in Table 2 .
3 Methods
3.1 Evolutionary Analysis of Hedgehog Proteins
The phylogenetic relationship and evolution of Hh proteins have been analyzed in considerable detail (15,16). Recently, further members of the Hh gene family have been reported in teleosts with the description of a second indian hedgehog and a desert hedgehog homologs (17).
3.1.1 Retrieving Protein Sequences for Phylogenetic Analyses
Protein sequences of conserved genes used to be predicted from a cDNA sequence, isolated either by degenerate polymerase chain reaction or by screening of cDNA libraries. Although these methods are still used in the case of nonmodel organisms, protein sequences are now mostly isolated in silico. There are numerous possibilities to find the sequences of interest by searching protein databases or genomic databases. Searches can be performed with keywords (i.e., Hh or Shh) and/or using Blast searches. NCBI and EBI have search tools to scan GenBank and Swiss-Prot.
Alternatively, animal model genomes can be searched using Ensembl, which in its newest version (v.37) contains several genomes, although not all complete and annotated (see Table 3 ).
3.1.2 Protein Sequence Alignment
Protein sequences must then be aligned. For our purpose, a global alignment method, which performs progressive pairwise alignments should be used. ClustalW (18) or Clustal X (19) software have been widely used. However, with the recent growth of sequence databases, it has been necessary to develop other algorithms that can align large protein families with speed and accuracy. Thus, new software for multiple sequence alignment have been designed and include: T-Coffee (20) which is slower than Clustal but tends to perform better in sequence alignments. MAFFT (21) is another program, which performs very well with sequences of different lengths (see Note 1 ) and appears to be faster than Clustal. Finally, MUSCLE (22) is advertised as faster than T-Coffee or Clustal.
Before proceeding with the inference of a phylogenetic tree, sequence alignments should be checked and edited to realign sequences and eliminate gaps. Jalview provided in Clustal, MUSCLE, and T-Coffee allows you to edit your sequence alignment, whereas the PHYLIP package contains its own sequence editing program. Once the alignment has been performed, the tree file should be saved in the appropriate format (see Note 2 ).
3.1.3 Building a Phylogenetic Tree
There are three methods which make up two main classes to infer a phylogenetic tree: Character-based methods, which include maximum parsimony (MP) and maximum likelihood (ML) (23), and distance-based methods, which include the neighbor-joining (NJ) method (24). The former relies on character states, such as the position of an amino acid at a specific place, whereas with the latter method evolutionary distances are calculated as the number of amino acid replacements between two proteins. None of these methods provide entire satisfaction (i.e., will infer a true tree) because they rely on several assumptions; for instance, a constant rate of divergence of a taxa from an ancestor. NJPlot algorithms will build a tree based on the NJ method, whereas PHYLIP and PAUP allow for the inference of an evolutionary tree using NJ, MP, or ML methods. Because distance-based methods are more amenable to molecular data (such as protein sequences) and several methods including bootstrap analyses have been designed to establish the reliability of an evolutionary tree. NJ methods tend to be more widely used and have been the preferred method for analyses of Hh proteins (15,25). If using NJ Plot open the tree file (.nj) previously saved. If using PHYLIP, a tree can be drawn using DRAWGRAM. Both will draw rooted trees, which allow for evolutionary analyses, in contrast to unrooted trees, which only display the degree of relationship with no mention of the most recent ancestor. TREEVIEW is another software package to draw trees. It supports tree files in pretty much any format and will display bootstrap values.
If the assumption of rate constancy among taxa does not account for the actual rate of divergence, the inferred tree may appear erroneous (i.e., misplace a species or a group of species). These errors can be remedied by choosing an outgroup as a reference (i.e., a species for which we have previous knowledge that it diverged from a common ancestor prior to the other species listed). A new tree is then built based on a new distance matrix established from the reference ( Fig. 1 A).
3.1.4 Phylogenetic Tree Analyses
Tree reliability: One of the advantages of using the NJ method is that it allows for bootstrap analysis, a computational method to apply statistics on a tree topology (26). This technique calculates the level of confidence for each clade of an inferred tree. This is done through a resampling technique where a series of pseudosamples are generated (usually between 500 and 1000, see Note 3 ) and the deduced trees are compared with the inferred one. A bootstrap value, expressed as the percentage of trees having the same topology as the inferred tree, is then calculated. It is usually admitted that a bootstrap value of >95 corresponds to a high level of confidence in the clade, whereas values <70 show a low level of confidence. Bootstrap can be run from PHYLIP using Seqboot or Clustal.
Estimating divergence time: An estimation of the evolutionary divergence time can be calculated from a distance-based tree ( Fig. 1 A). This calculation is based on the hypothesis that the rate of amino acid substitutions is constant during evolution. First, the rate of divergence per site per million years, r, is calculated for two species for which the divergence time, T 1, is known from other data (paleontological records, molecular data). Usually, vertebrates are a better choice because there are many records available providing the best approximate divergence time (see Note 4 ).
r = d/2T 1, where d is the average distance between the two species chosen and the distance is directly proportional to the rate of amino acid substitution. Once r is determined, it can be applied to the equation T 2 = d avg/2r, where T 2 is the unknown divergence time between two species/events we are interested in and d avg is the average distance between these two species/event.
Using similar calculations, it was found that the divergence time between Shh and Ihh, and Shh and Dhh was 563 and 662 my, respectively (15), which suggests that the first duplication of the Hh gene to give rise to the Dhh family occurred prior to the emergence of chordates (550 my) (27,28). This is not consistent with the fact that prior to the emergence of vertebrates, a single Hh gene is found in all three phyla, Deuterostomia, Ecdysozoa, and Lophotrochozoa ( Fig. 1 A,B). In particular, the presence of a single Hh gene in the cephalochordate amphioxus Branchiostoma floridae (12) suggests that the duplication event that gave rise to Hh1 and Hh2 in the urochordate Ciona intestinalis occurred independently from the duplication events leading to the Dhh, Ihh, and Shh families ( Fig. 1 A–C) (29). An interesting exception to the existence of a single Hh is that of the nematode Caenorhabditis elegans for which no true Hh ortholog was found. In contrast, closer sequence comparisons with subdomains of the Hh protein unraveled that several C. elegans proteins were homologs to the C-terminal region of Hh and formed a family of proteins, the inteins, with endonuclease activity (30). Because earlier taxa such as the mollusc Proteus vulgaris and the Annelid P. capitella do have a single Hh gene, this would suggest that nematodes have had Hh proteins but lost them during evolution. Alternatively, there is the possibility that nematodes do not belong to Ecdysozoa and form an earlier taxon (31). There are data consistent with a grouping of Arthropods and vertebrates together (protostome and deuterostome), called Coelomata that leave out the nematodes, which form an earlier phylum, the Pseudocoelomata (32). If this were the case, Hh would have evolved after the emergence of nematodes and before the Coelomata group.
3.2 Detection of Cis-Regulatory Elements of Hedgehog Genes by Sequence Analysis
CRMs do not have stringent directional, positional, and compositional constraints such as coding exons, which makes their automated detection with bioinformatics tools more difficult. One technique often used is phylogenetic footprinting (33), which, is based on the principle that alignment of noncoding sequences from different species reveals evolutionarily conserved segments that are candidates for cis-regulatory function (1,3,5,7,34). Bioinformatic tools which utilize phylogenetic footprinting to detect such regions have been reviewed recently (35–38). Phylogenetic footprinting has been used extensively to identify putative CRMs of sonic hedgehog orthologs (36,39–42).
3.2.1 Choice of Sequence Alignment and Visualization Tools
Two main strategies can be followed in sequence alignment: The local alignment protocol (e.g., BLASTZ [43]) searches for short stretches of similarity between the sequences, which are then extended, whereas global alignment tools (e.g., LAGAN [44]) search for best alignment over the entire length of the sequence using local similarities as anchors (see Note 5 ). A recent addition to LAGAN also allows for the detection of inversions between the two compared sequences (shuffle-LAGAN [44]). Global alignment tools have a higher sensitivity, whereas local tools provide better specificity in detection of shorter conserved blocks (45). Results of sequence alignments are usually displayed through web-based graphical tools, such as PipMaker (46), ECR browser (47), and VISTA (48,49), which indicate conservations above certain threshold levels. Because of their distinct designs, the performance of global and local alignment algorithms differs in the detection of conservation. Notably, the DiAlign tool (50,51) allows for both local and global alignment output modes.
3.2.2 Choice of Genomes for Cross Species Comparison
Comparisons of multiple species (“phylogenetic shadowing”) (38), using a set of closely related species (e.g., Refs. [50,52]), may be applied for the identification of conserved elements. However, the efficiency of finding conserved CRMs by phylogenetic footprinting (both in terms of number and level of conservation) is dependent on the evolutionary distance between the species compared (38,53). Comparisons between mouse and human (approx. 90 million years, Fig. 1 C) provide close evolutionary distance with high degree of conservation among functionally relevant binding sites placed in conserved blocks (54–58). However, the slow rate of neutral divergence among vertebrates, may result in the retention of conserved sequences with no regulatory role between species with short evolutionary distance (59). Several vertebrate genomes representing most major classes have recently been sequenced (see Table 3 ), providing the raw material for comparative analyses of species with greater evolutionary distances than mammals. A note of caution must be applied though, the greater the evolutionary distance, the more likely regulatory elements will have diverged. Thus, a lower number of regulatory elements will have retained conserved transcriptional activities, reducing the likelihood of identifying conserved CRMs (60). However, it is generally observed that developmentally regulated genes (including hh genes) and transcription factors tend to be more conserved in their CRMs than other genes (40,61). This was particularly striking in CRMs of fish and mammals sonic hedgehog orthologs that are separated by 450 my and still show remarkable conservation (36).
3.2.3 Variable Divergence of CRMs Within a Locus
CRMs within one gene locus may have different rates of change, as is the case for the shh locus itself. For example, four enhancers named ar-A to ar-D, are involved in shh activation in the zebrafish midline tissues. These four CRMs show varying degree of conservation between pufferfish and mouse (36,62–64). Interestingly, ar-A and ar-C are conserved between fish and mouse, whereas ar-B also shows significant sequence similarity when compared with zebrafish and pufferfish (Tetraodon nigroviridis), indicating that the phylogenetic footprinting approach can result in the detection of additional functional regulatory elements when the evolutionary distance between the species used in the analysis matches the rate of change in regulatory sequences. The enhancer ar-C is significantly conserved in mouse but less than ar-A, and is active in the midline in zebrafish and mouse. Strikingly, no function has been assigned to the well-conserved ar-A in mouse. This may indicate a conservation due to functional constraints other than CRM (reviewed in Ref. [65]). Significant sequence similarity in the 3′ UTR region of shh genes has also been observed between fish and mouse. However, no published data is available for a putative function of these conserved sequences.
3.2.4 Identification of Long Distance Regulatory Elements
It is not always trivial to assign a predicted conserved regulatory element to its cognate gene. The distance limit of regulatory elements from their regulated gene is not at all deciphered, and looping of chromatin over 40 Mb to sites of transcriptional activity has been demonstrated (66). Bacterial or phage artificial chromosome vectors provide a technology for analysis of regulatory elements over large distances (42). This approach allowed for the detection of shh-regulatory elements that lay several hundred kilobases away from the coding sequence in the mouse. Several of the elements identified in the mouse (SBE 2, 3, and 4) are well conserved among human, chicken, and frog, but not teleost fish sequences (42). Interestingly, the function of these long distance elements is to drive shh expression in the ventral diencephalon, an activity covered by the intronic ar-C enhancer in the fish. This functional divergence of enhancers may explain the lack of conservation of SBE2-4 and suggests that subfunctionalization mechanisms may be involved in the evolution of shh CRMs (67).
A large number of genes are likely to contain CRMs at very long distance from the gene locus (68). An extreme example is the case of the sonic hedgehog limb enhancer, which lies 1 Mb away from the shh coding sequence in the intron of the lbmr1 gene (69). This enhancer is highly conserved among vertebrates both in terms of its sequence and its interdigital position in the lmbr1 gene (70). This example suggests that further regulatory elements placed at a large distance may function in the regulation of shh. Indeed, several conserved noncoding elements were found at long distances from shh (up to 50 kb in fugu) and when tested in zebrafish embryos, provided enhancer activity (41). It may be possible to identify these elements by limiting the search to chromosomal regions that remain unchanged during evolution. The interdigitation of coding genes with embedded regulatory elements of other neighboring genes also implies an evolutionary constraint on chromosomal rearrangements to avoid breakpoints in such regions. Conserved chromosomal synteny has been suggested to aid in predicting the limits of the regulatory regions of a gene (71,72). Thus, comparisons between multiple species should establish the furthest, long distance CRMs are located from the promoter by analyzing the breakpoints of syntenic fragments. To assist researchers in these analyses, the Ensembl genome server database provides mammalian and chick chromosomal synteny, whereas an independent web server provides fugu and human synteny analysis (73).
3.2.5 Identification of the Transcriptional Start Site and the Core Promoter
Core or basal promoters are positionally defined regulatory regions, which are located about 50–100 base pairs (bp) up- and/or downstream of the transcriptional start site (TSS), and are required for the formation of preinitiation complexes for subsequent transcription initiation (74) (see Note 6 ). The absence of experimental approaches to characterize TSSs and the diversity of promoter types made it relatively difficult to predict accurately core promoter regions using sequence analysis, despite the large number of programs available on the internet (see Tables 1 and 2 for a selection of tools). Prediction of core promoters has recently improved substantially, due to the accumulation of large-scale data on TSS (75,76). Promoter predictors based on searching for motifs such as the TATA box (reviewed in Ref. [74]) failed, as it is now known that only a subset of human genes whose transcription is initiated by the RNA polymerase II contain a TATA box (77). The characterization of motifs involved in transcription initiation of the remaining genes is still in progress (77,78). A TATA box is however present in vertebrate shh genes (79,80). Interestingly, transcription factors and brain-specific genes were found to have shorter conserved blocks than other genes (81). The core promoter of vertebrate shh genes have been characterized in fish and human (79,80) and were shown to contain two TSSs and to be regulated by retinoic acid and Foxa2 (HNF3β).
3.2.6 Transcription Factor-Binding Site Analysis
Information on transcription factor-binding sites are available in either commercial (like TRANSFAC (82), Genomatix) or open access (JASPAR [83]) databases. Binding-site clustering is a feature of CRMs (84), which is utilized by several algorithms (85–91). The predictive value of such clustering approaches is enhanced by incorporating sequence conservation criteria (see Ref. (92) for example). Ahab also detects clusters of weak sites (93,94), and this can be further improved with Stubb, which includes comparative information and allows for the prediction of regulatory modules (95,96). To search entire genomes for coexpressed genes, a software package (CisOrtho [97]) was developed which evaluates the co-occurrence of motifs in orthologs regions. CRMs of coregulated genes show “signatures”, i.e., transcription factor-binding site combinations with distinct spacing and orientation requirements (90,98), which seem to be retained between species even when the overall sequence similarity is low (90). On the basis of this finding, TraFaC identifies conserved TF-binding sites by scanning regions of conserved sequence similarity to detect co-occurrence of binding sites (99), whereas rVista (100,101) and ConSite (57) score aligned binding sites in conserved regions. CONREAL (102) applies a similar approach and uses binding- site predictions as anchors for sequence alignment, and performs better than other sequence alignment programs when aligning sequences from distant species. As more algorithms for motif detection that take into account phylogenetic conservation (e.g., PhyloCon [103], CompareProspector [104], Footprinter [105]) become available, functional-binding sites in hedgehog genes and other developmentally regulated genes will be identified.
4 Notes
-
1.
It has been reported that variations in sequence length affect the accuracy of sequence alignments. ClustalW seems to be more sensitive to this issue than MAFFT. Thus, it is recommended to include sequences covering regions of similar length, although a sufficiently large portion of the protein sequence should be included to make the analysis meaningful. Comparing fragments of Hh protein to other full-length Hh proteins, for instance, can only lead to unmeaningful data.
-
2.
Take care of saving the tree file corresponding to the sequence alignment in the correct format (.nj if you are to use NJPlot to draw the tree or .ph if you are to use PHYLIP).
-
3.
It is common in the literature to see bootstrap samples of 100 or 200. It is recommended to use 500–1000, especially if many species are involved.
-
4.
Listed here are some evolutionary divergence times commonly used (see Fig. 1 C). Rat/mouse, 41 my; mammals/fishes, 450 my; mammals/amphibians, 360 my; mammals/birds, 310 my.
-
5.
A consideration when choosing a particular program is that many algorithms have been optimized for specific-species comparisons (e.g., BlastZ for human-mouse, WABA (106) for C. elegans-C. briggsae) and may not perform well with other species.
-
6.
A recent larger-scale analysis of mouse and human promoters identified conserved blocks within 500 bp from the start site, thereby defining the likely 5′ limit of proximal promoter regions (58).
References
Bejerano, G., Pheasant, M., Makunin, I., et al. (2004) Ultraconserved elements in the human genome. Science 304, 1321–1325.
Dermitzakis, E. T. and Clark, A. G. (2002) Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19, 1114–1121.
Frazer, K. A., Sheehan, J. B., Stokowski, R. P., et al. (2001) Evolutionarily conserved sequences on human chromosome 21. Genome Res. 11, 1651–1659.
Hillier, L. W., Miller, W., Birney, E., (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716.
Mural, R. J., Adams, M. D., Myers, E. W., et al. (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296, 1661–1671.
Rubin, G. M., Yandell, M. D., Wortman, J. R., et al. (2000) Comparative genomics of the eukaryotes. Science 287, 2204–2215.
Waterston, R. H., Lindblad-Toh, K., Birney, E., Rogers, J., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562.
Echelard, Y., Epstein, D. J., St-Jacques, B., et al. (1993) Sonic Hedgehog, a member of a family of putative signaling molecules, is implicated in the regulation of CNS polarity. Cell 75, 1417–1430.
Johnson, R. L., Laufer, E., Riddle, R. D., and Tabin, C. (1994) Ectopic expression of Sonic Hedgehog alters dorsal-ventral patterning of somites. Cell 79, 1165–1173.
Krauss, S., Concordet, J. P., and Ingham, P. W. (1993) A functionally conserved homolog of the Drosophila segment polarity gene hh is expressed in tissues with polarizing activity in zebrafish embryos. Cell 75, 1431–1444.
Ruiz I Altaba, A., Jessell, T. M., and Roelink, H. (1995) Restrictions to floor plate induction by Hedgehog and winged-helix genes in the neural tube of frog embryos. Mol. Cell. Neurosci. 6, 106–121.
Shimeld, S. M. (1999) The evolution of the Hedgehog gene family in chordates: insights from amphioxus hedgehog. Dev. Genes Evol. 209, 40–47.
Tsukurov, O., Boehmer, A., Flynn, J., et al. (1994) A complex bilateral polysyndactyly disease locus maps to chromosome 7q36. Nat. Genet. 6, 282–286.
Lettice, L. A., Horikoshi, T., Heaney, S. J., et al. (2002). Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc. Natl Acad. Sci. USA 99, 7548–7553.
Kumar, S., Balczarek, K. A., and Lai, Z. C. (1996) Evolution of the Hedgehog gene family. Genetics 142, 965–972.
Zardoya, R., Abouheif, E., and Meyer, A. (1996) Evolutionary analyses of Hedgehog and Hoxd-10 genes in fish species closely related to the zebrafish. Proc. Natl Acad. Sci. USA 93, 13,036–13,041.
Avaron, F., Hoffman, L., Guay, D., and Akimenko, M. A. (2006) Characterization of two new zebrafish members of the Hedgehog family: atypical expression of a zebrafish indian hedgehog gene in skeletal elements of both endochondral and dermal origins. Dev. Dyn. 235, 478–489.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882.
Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217.
Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066.
Edgar, R. C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113.
Felsenstein, J. (1988) Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet 22, 521–565.
Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425.
Zardoya, R., Abouheif, E., and Meyer, A. (1996) Evolution and orthology of Hedgehog genes. Trends Genet. 12, 496–497.
Felsenstein, J. (1985) Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39, 783–791.
Conway Morris, S. (1993) The fossil record and the early evolution of the Metazoa. Nature 361, 219–225.
Dehal, P., Satou, Y., Campbell, R.K., et al. (2002) The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298, 2157–2167.
Takatori, N., Satou, Y., and Satoh, N. (2002) Expression of Hedgehog genes in Ciona intestinalis embryos. Mech. Dev. 116, 235–238.
Aspock, G., Kagoshima, H., Niklaus, G., and Burglin, T. R. (1999) Caenorhabditis elegans has scores of Hedgehog-related genes: sequence and expression analysis. Genome Res. 9, 909–923.
Hedges, S. B. (2002). The origin and evolution of model organisms. Nat. Rev. Genet. 3, 838–849.
Blair, J. E., Ikeo, K., Gojobori, T., and Hedges, S. B. (2002) The evolutionary position of nematodes. BMC Evol. Biol. 2, 7.
Tagle, D. A., Koop, B. F., Goodman, M., Slightom, J. L., Hess, D. L., and Jones, R. T. (1988) Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 439–455.
Dermitzakis, E. T., Reymond, A., Lyle, R., Scamuffa, N., et al. (2002) Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 420, 578–582.
Wasserman, W. W. and Sandelin, A. (2004) Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287.
Muller, F., Blader, P., and Strahle, U. (2002). Search for enhancers: teleost models in comparative genomic and transgenic analysis of cis regulatory elements. Bioessays 24, 564–572.
Nardone, J., Lee, D. U., Ansel, K. M., and Rao, A. (2004) Bioinformatics for the ‘bench biologist’: how to find regulatory regions in genomic DNA. Nat. Immunol. 5, 768–774
Boffelli, D., Nobrega, M. A., and Rubin, E. M. (2004). Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 5, 456–465.
Lemos, B., Yunes, J. A., Vargas, F. R., Moreira, M. A., Cardoso, A. A., and Seuanez, H. N. (2004) Phylogenetic footprinting reveals extensive conservation of Sonic Hedgehog (SHH) regulatory elements. Genomics 84, 511–523.
Woolfe, A., Goodson, M., Goode, D. K., et al. (2004) Highly conserved noncoding sequences are associated with vertebrate development. PLoS Biol. 3, e7.
Goode, D. K., Snell, P., Smith, S. F., Cooke, J. E., and Elgar, G. (2005) Highly conserved regulatory elements around the SHH gene may contribute to the maintenance of conserved synteny across human chromosome 7q36.3. Genomics 86, 172–181.
Jeong, Y. and Epstein, D. J. (2003) Distinct regulators of Shh transcription in the floor plate and notochord indicate separate origins for these tissues in the mouse node. Development 130, 3891–3902.
Schwartz, S., Kent, W. J., Smit, A., et al. (2003) Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107.
Brudno, M., Do, C. B., Cooper, G. M., et al. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731.
Pollard, D. A., Bergman, C. M., Stoye, J., Celniker, S. E., and Eisen, M. B. (2004) Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 5, 6.
Schwartz, S., Zhang, Z., Frazer, K. A., et al. (2000) PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586.
Ovcharenko, I., Nobrega, M. A., Loots, G. G., and Stubbs, L. (2004) ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res. 32(Web Server issue), W280–W286.
Mayor, C., Brudno, M., Schwartz, J. R., et al. (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16, 1046–1047.
Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M., and Dubchak, I. (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32(Web Server issue), W273–W279.
Brudno, M., Steinkamp, R., and Morgenstern, B. (2004) The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences. Nucleic Acids Res. 32(Web Server issue), W41–W44.
Morgenstern, B. (1999) DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211–218.
Ovcharenko, I., Boffelli, D., and Loots, G. G. (2004) eShadow: a tool for comparing closely related sequences. Genome Res. 14, 1191–1198.
Cooper, G. M. and Sidow, A. (2003) Genomic regulatory regions: insights from comparative sequence analysis. Curr. Opin. Genet. Dev. 13, 604–610.
Hardison, R. C. (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16, 369–372.
Oeltjen, J. C., Malley, T. M., Muzny, D. M., Miller, W., Gibbs, R. A., and Belmont, J. W. (1997) Large-scale comparative sequence analysis of the human and murine Bruton’s tyrosine kinase loci reveals conserved regulatory domains. Genome Res. 7, 315–329.
Brickner, A. G., Koop, B. F., Aronow, B. J., and Wiginton, D. A. (1999) Genomic sequence comparison of the human and mouse adenosine deaminase gene regions. Mamm Genome 10, 95–101.
Lenhard, B., Sandelin, A., Mendoza, L., Engstrom, P., Jareborg, N., and Wasserman, W. W. (2003) Identification of conserved regulatory elements by comparative genome analysis. J. Biol. 2, 13.
Suzuki, Y., Yamashita, R., Shirota, M., et al. (2004) Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions. Genome Res. 14, 1711–1718.
Tautz, D. (2000) Evolution of transcriptional regulation. Curr. Opin. Genet. Dev. 10, 575–579.
Thomas, J. W., Touchman, J. W., Blakesley, R. W., et al. (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793.
Plessy, C., Dickmeis, T., Chalmel, F., and Strähle, U. (2005) Enhancer sequence conservation between vertebrates is favoured in developmental regulator genes. Trends Genet. 21, 207–210.
Muller, F., Chang, B., Albert, S., Fischer, N., Tora, L., and Strahle, U. (1999) Intronic enhancers control expression of zebrafish sonic Hedgehog in floor plate and notochord. Development 126, 2103–2116.
Epstein, D. J., McMahon, A. P., and Joyner, A. L. (1999) Regionalization of Sonic Hedgehog transcription along the anteroposterior axis of the mouse central nervous system is regulated by Hnf3-dependent and-independent mechanisms. Development 126, 281–292.
Goode, D. K., Snell, P. K., and Elgar, G. K. (2003) Comparative analysis of vertebrate Shh genes identifies novel conserved non-coding sequence. Mamm Genome 14, 192–201.
Adams, M. D. (2005) Conserved sequences and the evolution of gene regulatory signals. Curr. Opin. Genet. Dev. 15, 628–633.
Osborne, C. S., Chakalova, L., Brown, K. E., et al. (2004) Active genes dynamically colocalize to shared sites of ongoing transcription. Nat. Genet.
Force, A., Lynch, M., Pickett, F. B., Amores, A., Yan, Y. L., and Postlethwait, J. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545.
Vavouri, T., McEwen, G. K., Woolfe, A., Gilks, W. R., and Elgar, G. (2006) Defining a genomic radius for long-range enhancer action: duplicated conserved non-coding elements hold the key. Trends Genet. 22, 5–10.
Lettice, L. A., Heaney, S. J., Purdie, L. A., et al. (2003) A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735.
Sagai, T., Hosoya, M., Mizushina, Y., Tamura, M., and Shiroishi, T. (2005) Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb. Development 132, 797–803.
Mackenzie, A., Miller, K. A., and Collinson, J. M. (2004) Is there a functional link between gene interdigitation and multi-species conservation of synteny blocks? Bioessays 26, 1217–1224.
Flint, J., Tufarelli, C., Peden, J., et al. (2001) Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the alpha globin cluster. Hum. Mol. Genet. 10, 371–382.
Halling-Brown, M., Sansom, C., Moss, D. S., Elgar, G., and Edwards, Y. J. (2004) A Fugu-Human Genome Synteny Viewer: web software for graphical display and annotation reports of synteny between Fugu genomic sequence and human genes. Nucleic Acids Res. 32, 2618–2622.
Butler, J. E. and Kadonaga, J. T. (2002) The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16, 2583–2592.
Hashimoto, S., Suzuki, Y., Kasai, Y., et al. (2004) 5′-end SAGE for the analysis of transcriptional start sites. Nat. Biotechnol. 22, 1146–1149.
Kawaji, H., Kasukawa, T., Fukuda, S., et al. (2006) CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis. Nucleic Acids Res. 34 (Database issue), D632–D636.
FitzGerald, P. C., Shlyakhtenko, A., Mir, A. A., and Vinson, C. (2004) Clustering of DNA sequences in human promoters. Genome Res. 14, 1562–1574.
Kadonaga, J. T. (2002) The DPE, a core promoter element for transcription by RNA polymerase II. Exp. Mol. Med. 34, 259–264.
Kitazawa, S., Kitazawa, R., Tamada, H., and Maeda, S. (1998) Promoter structure of human sonic Hedgehog gene. Biochim. Biophys. Acta 1443, 358–363.
Chang, B. E., Blader, P., Fischer, N., Ingham, P. W., and Strahle, U. (1997) Axial (HNF3beta) and retinoic acid receptors are regulators of the zebrafish sonic Hedgehog promoter. EMBO J. 16, 3955–3964.
Suzuki, Y., Yamashita, R., Sugano, S., and Nakai, K. (2004) DBTSS, DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Res. 32(Database issue), D78–D81.
Wingender, E., Dietze, P., Karas, H., and Knuppel, R. (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241.
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32(Database issue), D91–D94.
Arnone, M. I. and Davidson, E. H. (1997) The hardwiring of development: organization and function of genomic regulatory systems. Development 124, 1851–1864.
Markstein, M., Markstein, P., Markstein, V., and Levine, M. S. (2002) Genomewide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl Acad. Sci. USA 99, 763–768.
Stathopoulos, A., Van Drenth, M., Erives, A., Markstein, M., and Levine, M. (2002) Whole-genome analysis of dorsal-ventral patterning in the Drosophila embryo. Cell 111, 687–701.
Rebeiz, M., Reeves, N. L., and Posakony, J. W. (2002) SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc. Natl Acad. Sci. USA 99, 9888–9893.
Berman, B. P., Nibu, Y., Pfeiffer, B. D., et al. (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA 99, 757–762.
Halfon, M. S., Grad, Y., Church, G. M., and Michelson, A. M. (2002) Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12, 1019–1028.
Erives, A. and Levine, M. (2004). Coordinate enhancers share common organizational features in the Drosophila genome. Proc. Natl Acad. Sci. USA 101, 3851–3856.
Markstein, M., Zinzen, R., Markstein, P., et al. (2004) A regulatory code for neurogenic gene expression in the Drosophila embryo. Development 131, 2387–2394.
Berman, B. P., Pfeiffer, B. D., Laverty, T. R., et al. (2004) Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5, R61.
Rajewsky, N., Vergassola, M., Gaul, U., and Siggia, E. D. (2002) Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3, 30.
Schroeder, M. D., Pearce, M., Fak, J., et al. (2004) Transcriptional control in the segmentation gene network of Drosophila. PLoS Biol. 2, E271.
Sinha, S., van Nimwegen, E., and Siggia, E. D. (2003) A probabilistic method to detect regulatory modules. Bioinformatics 19(Suppl 1), i292–i301.
Sinha, S., Schroeder, M. D., Unnerstall, U., Gaul, U., and Siggia, E. D. (2004) Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics 5, 129.
Bigelow, H. R., Wenick, A. S., Wong, A., and Hobert, O. (2004) CisOrtho: a program pipeline for genome-wide identification of transcription factor target genes using phylogenetic footprinting. BMC Bioinformatics 5, 27.
Senger, K., Armstrong, G. W., Rowell, W. J., Kwan, J. M., Markstein, M., and Levine, M. (2004) Immunity regulatory DNAs share common organizational features in Drosophila. Mol. Cell. 13, 19–32.
Jegga, A. G., Sherwood, S. P., Carman, J. W., et al. (2002) Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. Genome Res. 12, 1408–1417.
Loots, G. G. and Ovcharenko, I. (2004) rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32 (Web Server issue), W217–W221.
Loots, G. G., Ovcharenko, I., Pachter, L., Dubchak, I., and Rubin, E. M. (2002) rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 12, 832–839.
Berezikov, E., Guryev, V., Plasterk, R. H., and Cuppen, E. (2004) CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res. 14, 170–178.
Wang, T. and Stormo, G. D. (2003) Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19, 2369–2380.
Liu, Y., Liu, X. S., Wei, L., Altman, R. B., and Batzoglou, S. (2004) Eukaryotic regulatory element conservation analysis and identification using comparative genomics. Genome Res. 14, 451–458.
Blanchette, M. and Tompa, M. (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12, 739–748.
Kent, W. J. and Zahler, A. M. (2000) Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res. 10, 1115–1125.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Müller, F., Borycki, AG. (2007). Sequence Analyses to Study the Evolutionary History and Cis-Regulatory Elements of Hedgehog Genes. In: Horabin, J.I. (eds) Hedgehog Signaling Protocols. Methods Inmolecular Biology™, vol 397. Humana Press. https://doi.org/10.1007/978-1-59745-516-9_16
Download citation
DOI: https://doi.org/10.1007/978-1-59745-516-9_16
Publisher Name: Humana Press
Print ISBN: 978-1-58829-692-4
Online ISBN: 978-1-59745-516-9
eBook Packages: Springer Protocols