Abstract
The current pace of functional genomic initiatives and genome sequencing projects has provided researchers with a bewildering array of sequence and biological data to analyze. The disease system-driven approach to identifying key genes frequently identifies nucleotide and protein sequences for which the gene and protein function are not known in sufficient detail to allow informed follow-up. Using a range of bioinformatic tools and sequence-based clues, most of unassigned sequences can now be annotated. This chapter takes as an example an unannotated expressed sequence tag, describing how to identify its related gene, and how to annotate the encoded protein using sequence, profile, and structure-based annotation methodologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boeckmann B., Bairoch A., Apweiler R., et al. (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.
Barker, W. C, Garavelli, J. S., Huang, H., et al. (2001) Protein Information Resource: a community resource for expert annotation of protein data. Nucleic Acids Res. 29, 29–32.
Karolchik, D., Baertsch, R., Diekhans, M., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51–54.
Hubbard, T., Barker, D., Birney, E., et al. (2002) The Ensembl genome database project. Nucleic Acids Res. 30, 38–41.
Bateman A., Birney E., Cerruti L., et al. (2002) The Pfam Protein Families Database. Nucleic Acids Res. 30, 276–280.
Letunic I., Goodstadt L., Dickens N. J., et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242–244.
Attwood T. K., Bradley P., Flower D. R., et al. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31, 400–402.
Mulder N. J., Apweiler R., Attwood T. K., et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman D. J (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Marchler-Bauer, A., Anderson, J. B., DeWeese-Scott, C, et al. (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31, 383–387
Murzin, A. G. (1993) Sweet-tasting protein monellin is related to the cystatin family of thiol proteinase inhibitors. J. Mol. Biol. 230, 689–694.
Russell, R. B., Saqi, M. A., Sayle, R. A., Bates, P. A., and Sternberg, M. J. (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J. Mol. Biol. 269, 423–439.
Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499–520.
Shi, J., Blundell, T. L., and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257.
Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 358, 86–89
Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797–815.
Jones, D. T., Tress, M., Bryson, K., and Hadley, C. (1999) Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins 37, 104–111.
Gibas, C. and Jambeck, P. (2001). Developing Bioinformatic Computer Skills. O’Reilly & Associates, Sebastopol, CA.
Kent, W. J. (2002) BLAT-The BLAST-Like Alignment Tool. Genome Res. 12, 656–664.
Peisach, D., Gee, P., Kent, C, and Xu, Z. (2003) The crystal structure of choline kinase reveals a eukaryotic protein kinase fold. Structure (Camb.) 6, 703–713.
Falquet L., Pagni M., Bucher P., et al. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238.
Corpet F., Servant F., Gouzy J., and Kahn D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269.
Michalovich, D., Overington, J., and Fagan, R. (2002) Protein sequence analysis in silico: application of structure-based bioinformatics to genomics initiatives. Curr. Opin. Pharmacol. 2, 574–580.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Michalovich, D., Fagan, R. (2005). Bioinformatic Approaches to Assigning Protein Function From Novel Sequence Data. In: Read, S.J., Virley, D. (eds) Stroke Genomics. Methods in Molecular Medicine, vol 104. Humana Press. https://doi.org/10.1385/1-59259-836-6:313
Download citation
DOI: https://doi.org/10.1385/1-59259-836-6:313
Publisher Name: Humana Press
Print ISBN: 978-1-58829-333-6
Online ISBN: 978-1-59259-836-6
eBook Packages: Springer Protocols