Bioinformatic Approaches to Assigning Protein Function From Novel Sequence Data

Michalovich, David; Fagan, Richard

doi:10.1385/1-59259-836-6:313

David Michalovich³ &
Richard Fagan³

Part of the book series: Methods in Molecular Medicine ((MIMM,volume 104))

343 Accesses

Abstract

The current pace of functional genomic initiatives and genome sequencing projects has provided researchers with a bewildering array of sequence and biological data to analyze. The disease system-driven approach to identifying key genes frequently identifies nucleotide and protein sequences for which the gene and protein function are not known in sufficient detail to allow informed follow-up. Using a range of bioinformatic tools and sequence-based clues, most of unassigned sequences can now be annotated. This chapter takes as an example an unannotated expressed sequence tag, describing how to identify its related gene, and how to annotate the encoded protein using sequence, profile, and structure-based annotation methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boeckmann B., Bairoch A., Apweiler R., et al. (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.
Article PubMed CAS Google Scholar
Barker, W. C, Garavelli, J. S., Huang, H., et al. (2001) Protein Information Resource: a community resource for expert annotation of protein data. Nucleic Acids Res. 29, 29–32.
Article PubMed CAS Google Scholar
Karolchik, D., Baertsch, R., Diekhans, M., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51–54.
Article PubMed CAS Google Scholar
Hubbard, T., Barker, D., Birney, E., et al. (2002) The Ensembl genome database project. Nucleic Acids Res. 30, 38–41.
Article PubMed CAS Google Scholar
Bateman A., Birney E., Cerruti L., et al. (2002) The Pfam Protein Families Database. Nucleic Acids Res. 30, 276–280.
Article PubMed CAS Google Scholar
Letunic I., Goodstadt L., Dickens N. J., et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242–244.
Article PubMed CAS Google Scholar
Attwood T. K., Bradley P., Flower D. R., et al. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31, 400–402.
Article PubMed CAS Google Scholar
Mulder N. J., Apweiler R., Attwood T. K., et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318.
Article PubMed CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman D. J (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
PubMed CAS Google Scholar
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Article PubMed CAS Google Scholar
Marchler-Bauer, A., Anderson, J. B., DeWeese-Scott, C, et al. (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31, 383–387
Article PubMed CAS Google Scholar
Murzin, A. G. (1993) Sweet-tasting protein monellin is related to the cystatin family of thiol proteinase inhibitors. J. Mol. Biol. 230, 689–694.
Article PubMed CAS Google Scholar
Russell, R. B., Saqi, M. A., Sayle, R. A., Bates, P. A., and Sternberg, M. J. (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J. Mol. Biol. 269, 423–439.
Article PubMed CAS Google Scholar
Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499–520.
Article PubMed CAS Google Scholar
Shi, J., Blundell, T. L., and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257.
Article PubMed CAS Google Scholar
Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 358, 86–89
Article PubMed CAS Google Scholar
Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797–815.
Article PubMed CAS Google Scholar
Jones, D. T., Tress, M., Bryson, K., and Hadley, C. (1999) Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins 37, 104–111.
Article Google Scholar
Gibas, C. and Jambeck, P. (2001). Developing Bioinformatic Computer Skills. O’Reilly & Associates, Sebastopol, CA.
Google Scholar
Kent, W. J. (2002) BLAT-The BLAST-Like Alignment Tool. Genome Res. 12, 656–664.
PubMed CAS Google Scholar
Peisach, D., Gee, P., Kent, C, and Xu, Z. (2003) The crystal structure of choline kinase reveals a eukaryotic protein kinase fold. Structure (Camb.) 6, 703–713.
Article Google Scholar
Falquet L., Pagni M., Bucher P., et al. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238.
Article PubMed CAS Google Scholar
Corpet F., Servant F., Gouzy J., and Kahn D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269.
Article PubMed CAS Google Scholar
Michalovich, D., Overington, J., and Fagan, R. (2002) Protein sequence analysis in silico: application of structure-based bioinformatics to genomics initiatives. Curr. Opin. Pharmacol. 2, 574–580.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Target Discovery, Inpharmatica Ltd., London, UK
David Michalovich & Richard Fagan

Authors

David Michalovich
View author publications
You can also search for this author in PubMed Google Scholar
Richard Fagan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AstraZeneca Pharmaceuticals Macclesfield, Cheshire, UK
Simon J. Read
Neurology Centre for Excellence in Drug Discovery GlaxoSmithKline Pharmaceuticals, Harlow, Essex, UK
David Virley

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Michalovich, D., Fagan, R. (2005). Bioinformatic Approaches to Assigning Protein Function From Novel Sequence Data. In: Read, S.J., Virley, D. (eds) Stroke Genomics. Methods in Molecular Medicine, vol 104. Humana Press. https://doi.org/10.1385/1-59259-836-6:313

Download citation

DOI: https://doi.org/10.1385/1-59259-836-6:313
Publisher Name: Humana Press
Print ISBN: 978-1-58829-333-6
Online ISBN: 978-1-59259-836-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics