Skip to main content

Exon Detection by Similarity Searches

  • Protocol
Gene Isolation and Mapping Protocols

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 68))

  • 732 Accesses

Abstract

Other chapters of this volume have presented the various experimental methods (mainly exon trapping and recombination-based and hybridization-based approaches) used for the identification of transcribed sequences within cloned genomic fragments. None of those methods require detailed sequence information on the genomic region of interest. However, since generating large genomic sequences is becoming more routine, identifying transcribed regions by computer analysis of large genomic sequence (i.e., “software trapping”) is also becoming a viable alternative. After an overview of the various computational methods at hand, this chapter focuses on the use of database similarity searches for the identification of exons in mammalian genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Senapathy, P., Shapiro, M. B., and Harris, N. L. (1990) Splice junctions, Branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods Enzymol. 183, 252–278.

    Article  PubMed  CAS  Google Scholar 

  2. Stormo, G. D. (1990) Consensus patterns in DNA. Methods Enzymol. 183, 211–221.

    Article  PubMed  CAS  Google Scholar 

  3. Brunak, S., Engelbrecht, J., and Knudsen, S. (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220, 49–65.

    Article  PubMed  CAS  Google Scholar 

  4. Legouis, R., Hardelin, J.-P., Levilliers, J., Claverie, J.-M., Compain, S., Wunderle, V., Millasseau, P., Le Paslier, D., Cohen, D., Caterina, D., Bougueleret, L., Lutfalla, G., Weissenbach, J., and Petit, C. (1991) The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules. Cell 67, 423–435.

    Article  PubMed  CAS  Google Scholar 

  5. Hawkins, J. D. (1988) A survey of intron and exon lengths. Nucleic Acids Res. 21, 9893–9908.

    Article  Google Scholar 

  6. Snyder, E. E. and Stormo, G. D. (1995) Identification of protein coding regions in genomic DNA. J. Mol. Biol 248, 1–18.

    Article  PubMed  CAS  Google Scholar 

  7. Grantham, R., Gautier, C., Gouy, M., Mercier, R., and Pavé, A. (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8, r49–r60.

    PubMed  CAS  Google Scholar 

  8. Staden, R. (1990) Finding protein coding regions in genomic sequences. Methods Enzymol. 183, 163–180.

    Article  PubMed  CAS  Google Scholar 

  9. Shepherd, J. C. W. (1981) Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc. Natl. Acad. Sci. USA 78, 1596–1600.

    Article  PubMed  CAS  Google Scholar 

  10. Shepherd, J. C. W. (1990) Ancient patterns in nucleic acid sequences. Methods Enzymol. 183, 180–192.

    Article  PubMed  CAS  Google Scholar 

  11. Fickett, J. W. (1982) Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303–5318.

    Article  PubMed  CAS  Google Scholar 

  12. Claverie, J.-M. and Bougueleret, L. (1986) Heuristic informational analysis of sequences. Nucleic Acids Res. 14, 179–196.

    Article  PubMed  CAS  Google Scholar 

  13. Beckmann, J. S., Brendel, V., and Trifonov, E. N. (1986) Intervening sequences exhibit distinct vocabulary. J. Biomol Struct Dynamics 4, 391–400.

    CAS  Google Scholar 

  14. Borodovsky, M., Sprizhitskn, Y. A., Golovanov, E. I., and Aleksandrov, A. A. (1986) Statistical patterns in primary structure of the functional regions of the genome in E. coli III. Computer recognition of coding regions. Molekulyarnaya Biologiya 20, 1390–1398.

    Google Scholar 

  15. Fickett, J. W. and Tung, C.-S. (1992) Assessment of protein coding measures. Nucleic Acids Res. 20, 6441–6450.

    Article  PubMed  CAS  Google Scholar 

  16. Claverie, J.-M., Sauvaget, I., and Bougueleret, L. (1990) k-Tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping. Methods Enzymol. 183, 237–252.

    Article  PubMed  CAS  Google Scholar 

  17. Bougueleret, L., Tekaia F., Sauvaget, I., and Claverie, J.-M (1988) Objective comparison of exon and intron sequences by the mean of 2-dimensional data analysis methods. Nucleic Acids Res. 16, 1729–1738.

    Article  PubMed  CAS  Google Scholar 

  18. Borodovsky, M. Y., Rudd, K. E., and Koonin, E. V. (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res. 22, 4756–4767.

    Article  PubMed  CAS  Google Scholar 

  19. Fields, C. A. and Soderlund, C. A. (1990) Gm: a practical tool for automating DNA sequence analysis. Comp. Appl. Biol Sci. 6, 263–270.

    CAS  Google Scholar 

  20. Iris, F. J. M., Bougueleret, L., Prieur, S., Caterina, D., Primas, G., Perrot, V., Jurka, J., Rodriguez-tome, P., Claverie, J.-M., Cohen, D., and Dausset, J. (1993) Dense Alu clustering and a potential new member of the NF-kappa B family within a 90 kb HLA class III segment. Nature Genet. 3, 137–145.

    Article  PubMed  CAS  Google Scholar 

  21. Uberbacher, E. C. and Mural, R. J. (1991) Locating protein-coding regions in DNA sequences by a multiple sensor-neural approach. Proc. Natl. Acad. Sci. USA 88, 11,261–11,265.

    Article  PubMed  CAS  Google Scholar 

  22. Xu, Y., Einstein, J. R., Mural, R. J., Shah, M. B., and Uberbacher, E. C. (1994) Recognizing exons in genomic sequence using grail II, in Genetic Engineering: Principles and Methods (Setlow, J., ed.) Plenum, New York, pp. 241–253.

    Google Scholar 

  23. Sulston, J., Du, Z., Thomas, K., Wilson, R., Hillier, L., Staden, R., Halloran, N., Green, P., Thierry-Mieg, J., Qiu, L., et al. (1992) The C. elegans genome sequencing project a beginning. Nature 356, 37–41.

    Article  PubMed  Google Scholar 

  24. Guigo, R., Knudsen, S., Drake, N., and Smith, T. F. (1992) Prediction of gene structure. J. Mol. Biol. 226, 141–157.

    Article  PubMed  CAS  Google Scholar 

  25. Snyder, E. E. and Stormo, G. D. (1993) Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res. 21, 607–613.

    Article  PubMed  CAS  Google Scholar 

  26. Claverie, J.-M. (1995) Progress in large scale sequence analysis, in Advances in Computatzonal Biology, vol. 2 (Villar, H., ed.) JAI, London, pp. 161–208.

    Google Scholar 

  27. Lopez, R., Larsen, F., and Prydz, H. (1994) Evaluation of the exon prediction of the Grail software. Genomics 24, 133–136.

    Article  PubMed  CAS  Google Scholar 

  28. Hunkapiller, T., Kaiser, R. J., Koop, B. F., and Hood, L. (1991) Large-scale and automated DNA sequence determination. Science 254, 59–67.

    Article  PubMed  CAS  Google Scholar 

  29. Olson, M. V. (1993) The human genome project. Proc. Natl. Acad. Sci. USA 90, 4338–4344.

    Article  PubMed  CAS  Google Scholar 

  30. Nowak, R. (1995) Bacterial genome sequence bagged [news]. Science 269, 468–470.

    Article  PubMed  CAS  Google Scholar 

  31. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J.-F., Dougherty, B. A., Merrick, J. M., et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512.

    Article  PubMed  CAS  Google Scholar 

  32. Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656.

    Article  PubMed  CAS  Google Scholar 

  33. Adams, M. D., Dubnick, M., Kerlavage, A. R., Moreno, R. F., Kelley, J. M., Utterback, T. R., Nagle, J. W., Fields, C. A., and Venter, J. C. (1992) Sequence identification of 2,375 human brain genes. Nature 355, 632–634.

    Article  PubMed  CAS  Google Scholar 

  34. Adams, M. D., Kerlavage, A. R., Fields, C., and Venter, J. C. (1993) 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nature Genet. 4, 256–267.

    Article  PubMed  CAS  Google Scholar 

  35. Adams, M. D., Soares, M. B., Kerlavage, A. R., Fields, C., and Venter, J. C. (1993) Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nature Genet. 4, 373–380.

    Article  PubMed  CAS  Google Scholar 

  36. Merck releases first “gene index” sequences [news] (1995) Nature 373, 549.

    Google Scholar 

  37. Benson, D. A., Boguski, M., Lipman, D. J., and Ostell, J. (1994) GenBank. Nucleic Acids Res. 22, 3441–3444.

    Article  PubMed  CAS  Google Scholar 

  38. Boguski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993) dbEST—database for “expressed sequence tags.” Nature Genet. 4, 332,333.

    Article  PubMed  CAS  Google Scholar 

  39. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  40. Claverie, J.-M. (1992) Identifying coding exons by similarity search: Alu-derived and other potentially misleading protein sequences. Genomics 12, 838–841.

    Article  PubMed  CAS  Google Scholar 

  41. Gish, W. and States, D. J. (1993) Identification of protein coding regions by database similarity search. Nature Genet. 3, 266–272.

    Article  PubMed  CAS  Google Scholar 

  42. Claverie, J.-M (1994) A streamlined random sequencing strategy for finding coding exons. Genomics 23, 575–581.

    Article  PubMed  CAS  Google Scholar 

  43. Oliver, S. G., van der Aart, Q. J., Agostoni-Carbone, M. L., Aigle, M., Alberghina, L., Alexandraki, D., Antoine, G., Anwar, R., Ballesta, J. P., Benit, P., et al. (1992) The complete DNA sequence of yeast chromosome III. Nature 357, 38–46.

    Article  PubMed  CAS  Google Scholar 

  44. Dujon, B., Alexandraki, D., Andre, B., Ansorge, W., Baladron, V., Ballesta, J. P., Banrevi, A., Bolle, P. A., Bolotin-Fukuhara, M., Bossier, P., et al. (1994) Complete DNA sequence of yeast chromosome XI. Nature 369, 371–378.

    Article  PubMed  CAS  Google Scholar 

  45. Wilson, R., Ainscough, R., Anderson, K., Baynes, C., Berks, M., Bonfield, J., Burton, J., Connell, M., Copsey, T., Cooper, J., et al. (1994) 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature 368, 32–38.

    Article  PubMed  CAS  Google Scholar 

  46. Green, P., Lipman, D., Hillier, L., Waterston, R., States, D., and Claverie, J.-M. (1993) Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711–1716.

    Article  PubMed  CAS  Google Scholar 

  47. Claverie, J.-M. (1993) Database of ancient sequences. Nature 364, 19,20.

    PubMed  CAS  Google Scholar 

  48. Bairoch, A. and Boeckmann, B. (1994) The SWISS-PROT protein sequence database: current status. Nucleic Acids Res. 22, 3578–3580.

    Article  PubMed  CAS  Google Scholar 

  49. Brockdorff, N., Ashworth, A., Kay, G. F., McCabe, V. M., Norris, D. P., Cooper, P. J., Swift, S., and Rastan, S. (1992) The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71, 515–526.

    Article  PubMed  CAS  Google Scholar 

  50. Brannan, C. I., Dees, E. C., Ingram, R. S., and Tilghman, S. M. (1990) The product of the H19 gene may function as an RNA. Mol. Cell Biol. 10, 28–36.

    PubMed  CAS  Google Scholar 

  51. Velleca, M. A., Wallace, M. C., and Merlie, J. P. (1994) A novel synapse-associated noncoding RNA. Mol. Cell Biol. 14, 7095–7104.

    PubMed  CAS  Google Scholar 

  52. Askew, D. S., Li, J., and Ihle, J. N. (1994) Retroviral insertions in the murine His-1 locus activate the expression of a novel RNA that lacks an extensive open reading frame. Mol. Cell. Biol. 14, 1743–1751.

    PubMed  CAS  Google Scholar 

  53. Fichant, G. A. and Burks, C. (1991) Identifying potential genes in genomic DNA sequences J. Mol Biol. 220, 659–671.

    Article  PubMed  CAS  Google Scholar 

  54. States, D. J., Gish, W., and Altschul, S. F. (1991) Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 3, 66–70.

    Article  CAS  Google Scholar 

  55. Altschul, S. F. (1991) Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–565.

    Article  PubMed  CAS  Google Scholar 

  56. Claverie, J.-M. (1993) Detecting frame shifts by amino acid sequence comparison. J. Mol. Biol. 234, 1140–1157.

    Article  PubMed  CAS  Google Scholar 

  57. Henikoff, S. and Henikoff, J. G. (1993) Performance evaluation of amino acid substitution matrices. Proteins 17, 49–61.

    Article  PubMed  CAS  Google Scholar 

  58. Claverie, J.-M. (1994) A streamlined random sequencing strategy for finding coding exons. Genomics 23, 575–581.

    Article  PubMed  CAS  Google Scholar 

  59. Rice, C. M. and Cameron, G. N. (1994) Submission of nucleotide sequences data to EMB/Genbank/DDBJ. Methods Mol. Biol. 24, 355–366.

    PubMed  CAS  Google Scholar 

  60. Pearson, W. R. (1990) rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 4698–4702.

    Google Scholar 

  61. Sturrock, S. and Collins, J. (1993) MPsrch version 1.3. Biocomputing Research Unit, University of Edinburgh, UK.

    Google Scholar 

  62. Claverie, J. M. and Makalowski, W. (1994) Alu alert. Nature 371, 752–752.

    Article  PubMed  CAS  Google Scholar 

  63. Claverie, J. M. and States, D. (1993) Information enhancement methods for large scale sequence analysis. Computers Chem. 17, 191–201.

    Article  CAS  Google Scholar 

  64. Wootton, J. C. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. Computers Chem. 17, 149–163.

    Article  CAS  Google Scholar 

  65. Claverie, J.-M (1994) Large scale sequence analysis, in Automated DNA Sequencing and Analysis Techniques (Adams, M. D., Fields, C., and Venter, J. C., eds.) Academic, New York, pp 267–279.

    Google Scholar 

  66. Claverie, J. M. (1996) Effective large scale sequence similarity searches. Methods Enzymol. 266, 212–227.

    Article  PubMed  CAS  Google Scholar 

  67. Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nature Genet. 6, 119–129.

    Article  PubMed  CAS  Google Scholar 

  68. Kehoe, B. P. (1996) Zen and the Art of the Internet. A Beginner’s Guide, 4th ed. Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  69. Swindell, S. R., Miller, R. R., and Myers, G., eds. (1996) Internet for the Molecular Biologist, Horizon Scientific, London, UK.

    Google Scholar 

  70. Burglin, T. R. and Barnes, T. M. (1992) Introns in sequence tags. Nature 357, 367–367.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Claverie, JM. (1997). Exon Detection by Similarity Searches. In: Boultwood, J. (eds) Gene Isolation and Mapping Protocols. Methods in Molecular Biology™, vol 68. Humana Press. https://doi.org/10.1385/0-89603-482-8:283

Download citation

  • DOI: https://doi.org/10.1385/0-89603-482-8:283

  • Publisher Name: Humana Press

  • Print ISBN: 978-0-89603-482-2

  • Online ISBN: 978-1-59259-554-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics