Advertisement

FASTA Servers for Sequence Similarity Search

  • Biju Issac
  • Gajendra P. S. Raghava
Protocol
  • 2.2k Downloads
Part of the Springer Protocols Handbooks book series (SPH)

Abstract

In the last few years, many eukaryotic (including human and mouse) and prokaryotic genomes have been either completely sequenced or are under sequencing (1, 2, 3). In the coming 5–10 yr, most of the known organisms will have been sequenced. This has and will lead to exponential growth in nucleotide and protein databases over the years; for example, International Nucleotide Sequence Databases (INSD), composed of DDBJ (http://www.ddbj.nig.ac.jp/), EMBL Bank (http://www.ebi.ac.uk/embl/), and GenBank (http://www.ncbi.nlm.nih.gov/), had released more than 30 million entries by the end of 2003 (4). The availability of these increasingly expanding databases poses a major challenge to bioinformatics experts for developing effective programs or Web servers that extract maximum information from these databases. Database similarity search is perhaps the fastest, cheapest, and most powerful such experiment a biologist can conduct. As the databases become more complete, a sequence similarity search is more likely to reveal database sequences with statistically significant similarity, and thus inferred homology, to a query sequence. Though sharing significant sequence similarity is no guarantee of shared function, the availability of similar sequences is proving useful in discovering relationships between newly sequenced proteins or genes and various classes in the databases (5, 6, 7).

Keywords

Multiple Sequence Alignment Query Sequence Basic Local Alignment Search Tool Library Sequence Initial Region 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Venter, J. C., Adams, M. D., Myers, E. W., et al. (2001) The sequence of the human genome. Science 291, 1304–1351.PubMedCrossRefGoogle Scholar
  2. 2.
    Lander, E. S., Linton, L. M., Birren, B., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.PubMedCrossRefGoogle Scholar
  3. 3.
    Waterson, R. H., Lindblad-Toh, K., Birney, E., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562.CrossRefGoogle Scholar
  4. 4.
    Miyazaki, A., Sugawara, H., Gojobori, T., and Tateno, Y. (2003) DNA DataBank of Japan (DDBJ) in XML. Nucleic Acids Res. 31, 13–16.PubMedCrossRefGoogle Scholar
  5. 5.
    Manuel, A., Beaupain, D., Romeo, P.H., and Raich, N. (2000) Molecular characterization of a novel gene family (PHTF) conserved from Drosophila to mammals. Genomics 64, 216–220.PubMedCrossRefGoogle Scholar
  6. 6.
    Soliveri, J. A., Gomez, J., Bishai, W.R., and Chater, K. F. (2000) Multiple paralogous genes related to the Streppomyces coelicolor developmental regulatory gene whiB are present in Streppomyces and other actinomycetes. Microbiology 146, 333–343.PubMedGoogle Scholar
  7. 7.
    Komeda, H. and Asano, Y. (2003) Genes for an alkaline D-stereospecific endopeppidase and its homolog are located in tandem on Bacillus cereus genome. FEMS Microbiol Lett. 228, 1–9.PubMedCrossRefGoogle Scholar
  8. 8.
    Gibbs, A.J. and McIntyre, G. A. (1970), The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16, 1–11.CrossRefGoogle Scholar
  9. 9.
    Needleman, S. and Wunsch, C. (1970) A general method applicable to search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48, 444–453.CrossRefGoogle Scholar
  10. 10.
    Smith, T. and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.CrossRefGoogle Scholar
  11. 11.
    Pearson, W.R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.CrossRefGoogle Scholar
  12. 12.
    Lipman, D.J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.PubMedCrossRefGoogle Scholar
  13. 13.
    Altschul, S. F., Gish, W., Miller, W., Myers, E.W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  14. 14.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402.Google Scholar
  15. 15.
    Pearson, W. (2000), Flexible sequence similarity searching with FASTA3 program package. “In Bioinformatics Methods and Protocols”, Misener, S., and Krawety, S. A. (eds.), Humana Press, Inc., Totowa, NJ, pp. 185–219.Google Scholar
  16. 16.
    Wilbur, W.J. and Lipman, D. J. (1983), Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci., USA 80, 726–730.PubMedCrossRefGoogle Scholar
  17. 17.
    Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.CrossRefGoogle Scholar
  18. 18.
    Anderson, I. and Brass, A. (1998), Searching DNA databases for similarities to DNA sequences: when is a match significant? Bioinformatics 14, 349–356.PubMedCrossRefGoogle Scholar
  19. 19.
    Pearson, W. R. (1995) Comparison of methods for searching protein sequence databases. Protein Sci. 4, 1150–1160.Google Scholar
  20. 20.
    Pearson, W. R. (1996) Effective protein sequence comparison. Methods Enzymol. 266, 227–258.CrossRefGoogle Scholar
  21. 21.
    Pearson, W. R. (1998), Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84.CrossRefGoogle Scholar
  22. 22.
    Miller, W. (2000), Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17, 391–397.CrossRefGoogle Scholar
  23. 23.
    Issac, B. and Raghava, G. P. S. (2002), GWFASTA: a server for FASTA search in eukaryotic and microbial genomes. BioTechniques 33, 548–556.Google Scholar
  24. 24.
    Thompson, J. D., Higgins, D.G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680.CrossRefGoogle Scholar
  25. 25.
    Brown, N. P., Leroy, C., and Sander, C. (1998) MView: a Web-compatible database search or multiple alignment viewer. Bioinformatics 14, 380–381.CrossRefGoogle Scholar
  26. 26.
    Gogarten, J.P. and Olendzenski, L. (1999) Orthologs, paralogs and genome composition. Curr. Opin. Genet. Dev. 9, 630–636.PubMedCrossRefGoogle Scholar
  27. 27.
    Raghava, G.P. S. (2001), A graphical Web server for the analysis of protein sequences and alignment. Biotech. Software and Internet Report. 2, 255–258.Google Scholar
  28. 28.
    Livingstone, C.D. and Barton, G. J. (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci. 9, 745–756.PubMedGoogle Scholar
  29. 29.
    Barton, G. J. (1993) Alscripp: a tool to format multiple sequence alignments. Prot. Eng. 6, 37–40.CrossRefGoogle Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2005

Authors and Affiliations

  • Biju Issac
    • 1
  • Gajendra P. S. Raghava
    • 1
  1. 1.Institute of Microbial TechnologyChandigarhIndia

Personalised recommendations