Abstract
In the last few years, many eukaryotic (including human and mouse) and prokaryotic genomes have been either completely sequenced or are under sequencing (1–3). In the coming 5–10 yr, most of the known organisms will have been sequenced. This has and will lead to exponential growth in nucleotide and protein databases over the years; for example, International Nucleotide Sequence Databases (INSD), composed of DDBJ (http://www.ddbj.nig.ac.jp/), EMBL Bank (http://www.ebi.ac.uk/embl/), and GenBank (http://www.ncbi.nlm.nih.gov/), had released more than 30 million entries by the end of 2003 (4). The availability of these increasingly expanding databases poses a major challenge to bioinformatics experts for developing effective programs or Web servers that extract maximum information from these databases. Database similarity search is perhaps the fastest, cheapest, and most powerful such experiment a biologist can conduct. As the databases become more complete, a sequence similarity search is more likely to reveal database sequences with statistically significant similarity, and thus inferred homology, to a query sequence. Though sharing significant sequence similarity is no guarantee of shared function, the availability of similar sequences is proving useful in discovering relationships between newly sequenced proteins or genes and various classes in the databases (5–7).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Venter, J. C., Adams, M. D., Myers, E. W., et al. (2001) The sequence of the human genome. Science 291, 1304–1351.
Lander, E. S., Linton, L. M., Birren, B., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921.
Waterson, R. H., Lindblad-Toh, K., Birney, E., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562.
Miyazaki, A., Sugawara, H., Gojobori, T., and Tateno, Y. (2003) DNA DataBank of Japan (DDBJ) in XML. Nucleic Acids Res. 31, 13–16.
Manuel, A., Beaupain, D., Romeo, P.H., and Raich, N. (2000) Molecular characterization of a novel gene family (PHTF) conserved from Drosophila to mammals. Genomics 64, 216–220.
Soliveri, J. A., Gomez, J., Bishai, W.R., and Chater, K. F. (2000) Multiple paralogous genes related to the Streppomyces coelicolor developmental regulatory gene whiB are present in Streppomyces and other actinomycetes. Microbiology 146, 333–343.
Komeda, H. and Asano, Y. (2003) Genes for an alkaline D-stereospecific endopeppidase and its homolog are located in tandem on Bacillus cereus genome. FEMS Microbiol Lett. 228, 1–9.
Gibbs, A.J. and McIntyre, G. A. (1970), The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16, 1–11.
Needleman, S. and Wunsch, C. (1970) A general method applicable to search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48, 444–453.
Smith, T. and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Pearson, W.R. and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Lipman, D.J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.
Altschul, S. F., Gish, W., Miller, W., Myers, E.W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402.
Pearson, W. (2000), Flexible sequence similarity searching with FASTA3 program package. “In Bioinformatics Methods and Protocols”, Misener, S., and Krawety, S. A. (eds.), Humana Press, Inc., Totowa, NJ, pp. 185–219.
Wilbur, W.J. and Lipman, D. J. (1983), Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci., USA 80, 726–730.
Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.
Anderson, I. and Brass, A. (1998), Searching DNA databases for similarities to DNA sequences: when is a match significant? Bioinformatics 14, 349–356.
Pearson, W. R. (1995) Comparison of methods for searching protein sequence databases. Protein Sci. 4, 1150–1160.
Pearson, W. R. (1996) Effective protein sequence comparison. Methods Enzymol. 266, 227–258.
Pearson, W. R. (1998), Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84.
Miller, W. (2000), Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17, 391–397.
Issac, B. and Raghava, G. P. S. (2002), GWFASTA: a server for FASTA search in eukaryotic and microbial genomes. BioTechniques 33, 548–556.
Thompson, J. D., Higgins, D.G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22, 4673–4680.
Brown, N. P., Leroy, C., and Sander, C. (1998) MView: a Web-compatible database search or multiple alignment viewer. Bioinformatics 14, 380–381.
Gogarten, J.P. and Olendzenski, L. (1999) Orthologs, paralogs and genome composition. Curr. Opin. Genet. Dev. 9, 630–636.
Raghava, G.P. S. (2001), A graphical Web server for the analysis of protein sequences and alignment. Biotech. Software and Internet Report. 2, 255–258.
Livingstone, C.D. and Barton, G. J. (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci. 9, 745–756.
Barton, G. J. (1993) Alscripp: a tool to format multiple sequence alignments. Prot. Eng. 6, 37–40.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Issac, B., Raghava, G.P.S. (2005). FASTA Servers for Sequence Similarity Search. In: Walker, J.M. (eds) The Proteomics Protocols Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1385/1-59259-890-0:503
Download citation
DOI: https://doi.org/10.1385/1-59259-890-0:503
Publisher Name: Humana Press
Print ISBN: 978-1-58829-343-5
Online ISBN: 978-1-59259-890-8
eBook Packages: Springer Protocols