Skip to main content

Bioinformatic Approaches to Assigning Protein Function From Novel Sequence Data

  • Protocol
Stroke Genomics

Part of the book series: Methods in Molecular Medicine ((MIMM,volume 104))

  • 343 Accesses

Abstract

The current pace of functional genomic initiatives and genome sequencing projects has provided researchers with a bewildering array of sequence and biological data to analyze. The disease system-driven approach to identifying key genes frequently identifies nucleotide and protein sequences for which the gene and protein function are not known in sufficient detail to allow informed follow-up. Using a range of bioinformatic tools and sequence-based clues, most of unassigned sequences can now be annotated. This chapter takes as an example an unannotated expressed sequence tag, describing how to identify its related gene, and how to annotate the encoded protein using sequence, profile, and structure-based annotation methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boeckmann B., Bairoch A., Apweiler R., et al. (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.

    Article  PubMed  CAS  Google Scholar 

  2. Barker, W. C, Garavelli, J. S., Huang, H., et al. (2001) Protein Information Resource: a community resource for expert annotation of protein data. Nucleic Acids Res. 29, 29–32.

    Article  PubMed  CAS  Google Scholar 

  3. Karolchik, D., Baertsch, R., Diekhans, M., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51–54.

    Article  PubMed  CAS  Google Scholar 

  4. Hubbard, T., Barker, D., Birney, E., et al. (2002) The Ensembl genome database project. Nucleic Acids Res. 30, 38–41.

    Article  PubMed  CAS  Google Scholar 

  5. Bateman A., Birney E., Cerruti L., et al. (2002) The Pfam Protein Families Database. Nucleic Acids Res. 30, 276–280.

    Article  PubMed  CAS  Google Scholar 

  6. Letunic I., Goodstadt L., Dickens N. J., et al. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 30, 242–244.

    Article  PubMed  CAS  Google Scholar 

  7. Attwood T. K., Bradley P., Flower D. R., et al. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31, 400–402.

    Article  PubMed  CAS  Google Scholar 

  8. Mulder N. J., Apweiler R., Attwood T. K., et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318.

    Article  PubMed  CAS  Google Scholar 

  9. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman D. J (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  10. Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  11. Marchler-Bauer, A., Anderson, J. B., DeWeese-Scott, C, et al. (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31, 383–387

    Article  PubMed  CAS  Google Scholar 

  12. Murzin, A. G. (1993) Sweet-tasting protein monellin is related to the cystatin family of thiol proteinase inhibitors. J. Mol. Biol. 230, 689–694.

    Article  PubMed  CAS  Google Scholar 

  13. Russell, R. B., Saqi, M. A., Sayle, R. A., Bates, P. A., and Sternberg, M. J. (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J. Mol. Biol. 269, 423–439.

    Article  PubMed  CAS  Google Scholar 

  14. Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499–520.

    Article  PubMed  CAS  Google Scholar 

  15. Shi, J., Blundell, T. L., and Mizuguchi, K. (2001) FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257.

    Article  PubMed  CAS  Google Scholar 

  16. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature 358, 86–89

    Article  PubMed  CAS  Google Scholar 

  17. Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287, 797–815.

    Article  PubMed  CAS  Google Scholar 

  18. Jones, D. T., Tress, M., Bryson, K., and Hadley, C. (1999) Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure. Proteins 37, 104–111.

    Article  Google Scholar 

  19. Gibas, C. and Jambeck, P. (2001). Developing Bioinformatic Computer Skills. O’Reilly & Associates, Sebastopol, CA.

    Google Scholar 

  20. Kent, W. J. (2002) BLAT-The BLAST-Like Alignment Tool. Genome Res. 12, 656–664.

    PubMed  CAS  Google Scholar 

  21. Peisach, D., Gee, P., Kent, C, and Xu, Z. (2003) The crystal structure of choline kinase reveals a eukaryotic protein kinase fold. Structure (Camb.) 6, 703–713.

    Article  Google Scholar 

  22. Falquet L., Pagni M., Bucher P., et al. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238.

    Article  PubMed  CAS  Google Scholar 

  23. Corpet F., Servant F., Gouzy J., and Kahn D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28, 267–269.

    Article  PubMed  CAS  Google Scholar 

  24. Michalovich, D., Overington, J., and Fagan, R. (2002) Protein sequence analysis in silico: application of structure-based bioinformatics to genomics initiatives. Curr. Opin. Pharmacol. 2, 574–580.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Michalovich, D., Fagan, R. (2005). Bioinformatic Approaches to Assigning Protein Function From Novel Sequence Data. In: Read, S.J., Virley, D. (eds) Stroke Genomics. Methods in Molecular Medicine, vol 104. Humana Press. https://doi.org/10.1385/1-59259-836-6:313

Download citation

  • DOI: https://doi.org/10.1385/1-59259-836-6:313

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-333-6

  • Online ISBN: 978-1-59259-836-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics