Abstract
The need for information about the functional roles of the elements comprising the human genome became larger than ever with the completion of its sequencing in 2003 (Lander et al. 2001; Venter et al. 2001; Consortium 2004b). BioSapiens (Excellence 2005) is contributing to the ENCODE (Consortium 2004a) program, which is providing a biologically informative representation of the human genome using high-throughput methods to identify and catalogue all functional elements. The ENCODE pilot project consisted of annotating 1% of the genome (Consortium et al. 2007). The presently ongoing functional annotation of the other 99% will be a crucial next step for many, diverse fields of science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aloy P, Querol E, Aviles FX, Sternberg MJ (2001) Automated structure-based prediction of functional sités in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol 311: 395–408
Armon A, Graur D, Ben-Tal N (2001) ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 307: 447–463
Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28: 45–48
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28: 235–242
Beukers MW, Kristiansen I, AP IJ, Edvardsen I (1999) TinyGRAP database: a bioinformatics tool to mine G-protein-coupled receptor mutant data. Trends Pharmacol Sci 20: 475–477
Campagne F, Jestin R, Reversat JL, Bernassau JM, Maigret B (1999) Visualisation and integration of G protein-coupled receptor related information help the modelling: description and applications of the Viseur program. J Comput Aided Mol Des 13: 625–643
Carro A, Tress M, de Juan D, Pazos F, Lopez-Romero P, del Sol A, Valencia A, Rojas AM (2006) TreeDet: a web server to explore sequence space. Nucleic Acids Res 34: W110–W115
Casari G, Sander C, Valencia A (1995) A method to predict functional residues in proteins. Nat Struct Biol 2: 171–178
Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5: 823–826
Consortium EP (2004a) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636–640
Consortium EP, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799–816
Consortium IHGS (2004b) Finishing the euchromatic sequence of the human genome. Nature 431: 931–945
Dean MK, Higgs C, Smith RE, Bywater RP, Snell CR, Scott PD, Upton GJ, Howe TJ, Reynolds CA (2001) Dimerization of G-protein-coupled receptors. J Med Chem 44: 4595–4614
DesJarlais RL, Sheridan RP, Seibel GL, Dixon JS, Kuntz ID, Venkataraghavan R (1988) Using shape complementarity as an initial screen in designing ligands for a receptor binding site of known threedimensional structure. J Med Chem 31: 722–729
Edvardsen O, Reiersen AL, Beukers MW, Kristiansen K (2002) tGRAP, the G-protein coupled receptors mutant database. Nucleic Acids Res 30: 361–363
Excellence TBNo (2005) Research networks: BioSapiens: a European network for integrated genome annotation. Eur J Hum Genet 13: 994–997
Feenstra KA, Pirovano W, Krab K, Heringa J (2007) Sequence harmony: detecting functional specificity from alignments. Nucleic Acids Res 35: W495–W498
Fernandez-Recio J, Totrov M, Skorodumov C, Abagyan R (2005) Optimal docking area: a new method for predicting protein-protein interaction sites. Proteins 58: 134–143
Filizola M, Olmea O, Weinstein H (2002) Prediction of heterodimerization interfaces of G-protein coupled receptors with a new subtractive correlated mutation method. Protein Eng 15: 881–885
Filizola M, Weinstein H (2002) Structural models for dimerization of G-protein coupled receptors: the opioid receptor homodimers. Biopolymers 66: 317–325
Folkertsma S, Van Noort P, Van Durme J, Joosten HJ, Bettler E, Fleuren W, Oliveira L, Horn F, de Vlieg J, Vriend G (2004) A family-based approach reveals the function of residues in the nuclear receptor ligand-binding domain. J Mol Biol 341: 321–335
Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM (2006) A method for localizing ligand binding pockets in protein structures. Proteins 62: 479–488
Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N (2003) ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19: 163–164
Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18: 309–317
Gouldson PR, Dean MK, Snell CR, Bywater RP, Gkoutos G, Reynolds CA (2001) Lipid-facing correlated mutations and dimerization in G-protein coupled receptors. Protein Eng 14: 759–767
Gouldson PR, Higgs C, Smith RE, Dean MK, Gkoutos GV, Reynolds CA (2000) Dimerization and domain swapping in G-protein-coupled receptors: a computational study. Neuropsychopharmacology 23: S60–S77
Gouldson PR, Snell CR, Bywater RP, Higgs C, Reynolds CA (1998) Domain swapping in G-protein coupled receptor dimers. Protein Eng 11: 1181–1193
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33: D514–D517
Hernanz-Falcón P, RodrÃguez-Frade JM, Serrano A, Juan D, del Sol A, Soriano SF, Roncal F, Gómez L, Valencia A, MartÃnez-A C, Mellado M (2004) Identification of amino acid residues crucial for chemokine receptor dimerization. Nat Immunol 5: 216–223
Honig B, Nicholls A (1995) Classical electrostatics in biology and chemistry. Science 268: 1144–1149
Horn F, Bettler E, Oliveira L, Campagne F, Cohen FE, Vriend G (2003) GPCRDB information system for G-protein-coupled receptors. Nucleic Acids Res 31: 294–297
Horn F, Bywater R, Krause G, Kuipers W, Oliveira L, Paiva AC, Sander C, Vriend G (1998a) The interaction of class B G-protein-coupled receptors with their hormones. Recept Channels 5: 305–314
Horn F, Lau AL, Cohen FE (2004) Automated extraction of mutation data from the literature: application of MuteXt to G-protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20: 557–568
Horn F, Van der Wenden EM, Oliveira L, IJzerman AP, Vriend G (2000) Receptors coupling to G proteins: is there a signal behind the sequence? Proteins 41: 448–459
Horn F, Vriend G, Cohen FE (2001) Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems. Nucleic Acids Res 29: 346–349
Horn F, Weare J, Beukers MW, Hörsch S, Bairoch A, Chen W, Edvardsen O, Campagne F, Vriend G (1998b) GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res 26: 275–279
Howard AD, McAllister G, Feighner SD, Liu Q, Nargund RP, Van der Ploeg LH, Patchett AA (2001) Orphan G-protein-coupled receptors and natural ligand discovery. Trends Pharmacol Sci 22: 132–140
Lutje Hulsik D (2002) Public-domain database of GPCR interaction parameters. Trends Pharmacol Sci 23: 258–259
Jones S, Thornton JM (1997) Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 272: 133–143
Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucleic Acids Res 32: W424–W428
Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A, Bates K, Browne P, van den Broek A, Castro M, Cochrane G, Duggan K, Eberhardt R, Faruque N, Gamble J, Diez FG, Harte N, Kulikova T, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Sobhany S, Stoehr P, Tuli MA, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R (2005) The EMBL Nucleotide Sequence Database. Nucleic Acids Res 33: D29–D33
Kazius J, Wurdinger K, van Iterson M, Kok J, Bäck T, IJzerman AP (2007) GPCR NaVa database: natural variants in human G-protein-coupled receptors. Hum Mutat
Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R (2007) PIER: protein interface recognition for structural proteomics. Proteins 67: 400–417
Kuipers W, Link R, Standaar PJ, Stoit AR, Van Wijngaarden I, Leurs R, IJzerman AP (1997a) Study of the interaction between aryloxypropanolamines and Asn386 in helix VII of the human 5-hydroxytryptaminel A receptor. Mol Pharmacol 51: 889–896
Kuipers W, Oliveira L, Vriend G, IJzerman AP (1997b) Identification of class-determining residues in G protein-coupled receptors by sequence analysis. Receptors Channels 5: 159–174
Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE (1982) A geometric approach to macromolecule-ligand interactions. J Mol Biol 161: 269–288
Lamb ML, Jorgensen WL (1997) Computational approaches to molecular recognition. Curr Opi Chem Biol 1: 449–457
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921
Lichtarge O, Bourne HR, Cohen FE (1996a) Evolutionarily conserved Galphabetagamma binding surfaces support a model of the G protein-receptor complex. Proc Natl Acad Sci USA 93: 7507–7511
Lichtarge O, Bourne HR, Cohen FE (1996b) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257: 342–358
Lichtarge O, Yamamoto KR, Cohen FE (1997) Identification of functional surfaces of the zinc binding domains of intracellular receptors. J Mol Biol 274: 325–337
Madabushi S, Gross AK, Philippi A, Meng EC, Wensel TG, Lichtarge O (2004) Evolutionary trace of G protein-coupled receptors reveals clusters of residues that determine global and class-specific functions. J Biol Chem 279: 8126–8132
Madabushi S, Yao H, Marsh M, Kristensen DM, Philippi A, Sowa ME, Lichtarge O (2002) Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol 316: 139–154
Miranker A, Karplus M (1991) Functionality maps of binding sites: a multiple copy simultaneous search method. Proteins 11: 29–34
Mirny L, Shakhnovich E (2001) Evolutionary conservation of the folding nucleus. J Mol Biol 308: 123–129
Mirny LA, Shakhnovich EI (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291: 177–196
Möller S, Vilo J, Croning MD (2001) Prediction of the coupling specificity of G protein coupled receptors to their G proteins. Bioinformatics 17(Suppl 1): S174–S181
Oliveira L, Paiva ACM, Vriend G (1993) A common motifin G protein-coupled seven transmembrane helix receptors. J Comp Aided Mol Des 7: 649–658
Oliveira L, Paiva ACM, Vriend G (1999) A low resolution model for the interaction of G-proteins with G-protein-coupled receptors. Protein Eng 12: 1087–1095
Oliveira L, Paiva ACM, Vriend G (2002) Correlated mutation analyses on very large sequence families. Chembiochem 3: 1010–1017
Oliveira L, Paiva PB, Paiva ACM, Vriend G (2003a) Identification of functionally conserved residues with the use of entropy-variability plots. Proteins 52: 544–552
Oliveira L, Paiva PB, Paiva ACM, Vriend G (2003b) Sequence analysis reveals how G-protein-coupled receptors transduce the signal to the G-protein. Proteins 52: 553–560
Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271: 511–523
Pei J, Grishin NV (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17: 700–712
Pietrokovski S, Henikoff JG, Henikoff S (1996) The Blocks database — a system for protein classification. Nucleic Acids Res 24: 197–200
Pirovano W, Feenstra KA, Heringa J (2006) Sequence comparison by sequence harmony identifies subtype-specific functional sites. Nucleic Acids Res 34: 6540–6548
Russell RB, Sasieni PD, Sternberg MJ (1998) Supersites within superfolds. Binding site similarity in the absence of homology. J Mol Biol 282: 903–918
Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9: 56–68
Shatsky M, Nussinov R, Wolfson HJ (2002) Flexible protein alignment and hinge detection. Proteins 48: 242–256
Shenkin PS, Erman B, Mastrandrea LD (1991) Information-theoretical entropy as a measure of sequence variability. Proteins 11: 297–313
Shulman AI, Larson C, Mangelsdorf DJ, Ranganathan R (2004) Structural determinants of allosteric ligand activation in RXR heterodimers. Cell 116: 417–429
Singer MS, Vriend G, Bywater RP (2002) Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng 15: 721–725
Slep KC, Kercher MA, He W, Cowan CW, Wensel TG, Sigler PB (2001) Structural determinants for regulation of phosphodiesterase by a G-protein at 2.0 A. Nature 409: 1071–1077
Sowa ME, He W, Slep KC, Kercher MA, Lichtarge O, Wensel TG (2001) Prediction and confirmation of a site critical for effector regulation of RGS domain activity. Nat Struct Biol 8: 234–237
Valencia A, Pazos F (2002) Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 12: 368–373
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, et al. (2001) The sequence of the human genome. Science 291: 1304–1351
Wang W, Donini O, Reyes CM, Kollman PA (2001) Biomolecular simulations: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions. Annu Rev Bioph Biom 30: 211–243
Ye K, Anton Feenstra K, Heringa J, Ijzerman AP, Marchiori E (2008) Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a machinelearning approach for feature weighting. Bioinformatics 24: 18–25
Zuckerkandl E (1965) Evolutionary divergence and convergence in proteins. In: Zuckerkandl E, Pauling L (eds) Evolving genes and proteins. Academic Press
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag/Wien
About this chapter
Cite this chapter
Vroling, B., Vriend, G. (2008). Harvesting the information from a family of proteins. In: Frishman, D., Valencia, A. (eds) Modern Genome Annotation. Springer, Vienna. https://doi.org/10.1007/978-3-211-75123-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-211-75123-7_13
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-75122-0
Online ISBN: 978-3-211-75123-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)