Summary
Current proteomics technologies generate large number of data among which the investigator has to identify the promising diagnostic/prognostic biomarkers as well as potential therapeutic targets. For the latter, classification of proteins into meaningful families is needed. Current databases, featuring a high level of interconnectivity (cross referencing), provide the tools necessary to bring various data together, facilitating protein classification and elucidation of protein function and interoperativity. This chapter provides guidelines to explore the informationally rich peptide sequences generated by the application of the proteomics methodologies by the use of web-based tools, with the objective to predict potential protein function. After proper preprocessing (e.g., for internal repeats) of a query protein sequence, known domains can be identified, which aid in dividing the query into smaller meaningful parts. Any unclassified remainder of the protein provides the material for low-level comparative analysis for the discovery of distant homologues or candidate novel domain types to be verified experimentally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Richardson J.S. and Richardson D.C. (1989) “Principles and patterns of protein conformation.” In: Fasman G. (ed) “Prediction of Protein Structure and the Principles of Protein Conformation.” Plenum Press, NY, pp 1–98
Orengo C.A. and Thornton J.M. (2005) “Protein families and their evolution – a structural perspective.” Annu. Rev. Biochem. 74, 867–900
Paliakasis C.D. and Kokkinidis M. (1992) “Relationships between sequence and structure for the four-α-helix bundle tertiary motif in proteins.” Protein Eng. 5, 739–748
Lattman E.E., Fiebig K.M. and Dill K.A. (1994) “Modeling compact denatured states in proteins.” Biochemistry 33, 6158–6166
Lupas A., vanDyke M. and Stock J. (1991) “Predicting coiled-coils from protein sequences.” Science 252, 1162–1164
Chothia C. (1992) “One thousand families for the molecular biologist.” Nature 357, 543–544
Schwede T., Kopp J., Guex N. and Peitsch M.C. (2003) “SWISS MODEL: an automated protein homology modeling server.” Nucleic Acids Res. 31, 3381–3385
Burge C. and Karlin S. (1997) “Prediction of complete gene structures in human genomic DNA.” J. Mol. Biol. 268, 78–94
Hubbard T., Andrews D., Caccamo M., et al. (2005) “Ensembl 2005.” Nucleic Acids Res. 33, D447–D453
Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W. and Lipman D.J. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Res. 25, 3389–3402
Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O’Donovan C., Redaschi N. and Yeh L-S.L. (2005) “The universal protein resource (UniProt).” Nucleic Acids Res. 33, D154–D159
Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N. and Bourne P.E. (2000) “The protein data bank.” Nucleic Acids Res. 28, 235–242
Boguski M.S., Lowe T.M.J. and Tolstoshev C.M. (1993) “dbEST – database for expressed sequence tags.” Nature Genet. 4, 332–333
Apic G., Gough J. and Teichman S.A. (2001) “Domain combinations in archaeal, eubacterial and eukaryotic proteomes.” J. Mol. Biol. 310, 311–325
Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths-Jones S., Khanna A., Marshall M., Moxon S., Sonnhammer E.L.L., Studholme D.J., Yates C. and Eddy S.R. (2004) “The Pfam protein families database.” Nucleic Acids Res. 32, D138–D141
Letunic I., Copley R.R., Pils B., Pinkert S., Schultz J. and Bork P. (2006) “SMART 5: domains in the context of genomes and networks.” Nucleic Acids Res. 34, D257–D260
The InterPro Consortium; Mulder N.J., Apweiler R., Atwood T.K., et al. (2005) “InterPro, Progress and Status in 2005.” Nucleic Acids Res. 33, D201-D205
Madera M., Vogel C., Kummerfeld S.K., Chothia C. and Gough J. (2004) “The SUPERFAMILY database in 2004: additions and improvements.” Nucleic Acids Res. 32, D235-D239
Attwood T.K., Bradley P., Flower D.R., Gaulton A., Maudling N., Mitchell A.L., Moulton G., Nordle A., Paine K., Taylor P., Uddin A. and Zygouri C. (2003) “PRINTS and its automatic supplement, preprints.” Nucleic Acids Res. 31, 400-402
Hulo N., Bairoch A., Bulliard B., Cerutti L., de Castro E., Langendijk-Genevaux P.S., Pagni M. and Sigrist C.J.A. (2006) “The PROSITE database.” Nucleic Acids Res. 34, D227-D230
Rawlings N.D., Morton F.R. and Barrett A.J. (2006) “MEROPS: the peptidase database.” Nucleic Acids Res. 34, D270–D272
Sander C. and Schneider R. (1991) “Database of homology-derived protein structures and the structural meaning of sequence alignment.” Proteins: Struct. Fun. Gen. 9, 56–68
Thompson J.D., Higgins D.G. and Gibson T.J. (1994) “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice.” Nucleic Acids Res. 22, 4673–4680
Henikoff S. and Henikoff J.G. (1992) “Amino acid substitution matrices from protein blocks.” Proc. Natl. Acad. Sci. USA 89, 10915–10919
Henikoff S. and Henikoff J.G. (1993) “Performance evaluation of amino acid substitution matrices.” Proteins Struct. Fun. Gen. 17, 49–61.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Paliakasis, C.D., Michalopoulos, I., Kossida, S. (2008). Web-based Tools for Protein Classification. In: Vlahou, A. (eds) Clinical Proteomics. Methods in Molecular Biology™, vol 428. Humana Press. https://doi.org/10.1007/978-1-59745-117-8_18
Download citation
DOI: https://doi.org/10.1007/978-1-59745-117-8_18
Publisher Name: Humana Press
Print ISBN: 978-1-58829-837-9
Online ISBN: 978-1-59745-117-8
eBook Packages: Springer Protocols