Abstract
The sequences of biomolecules such as proteins and RNA genes contain information about their three-dimensional states and functions. For over 40 years biologists have used the evolutionary conservation of this information to detect homology and predict important subsets of residues. Recent work has substantially extended this view of conservation by including the detection of evolutionary couplings , interactions, between residues, resulting in a paradigm shift in our ability to compute three-dimensional structures from sequences alone. In addition to three-dimensional structure of single proteins and RNA, this statistical analysis of evolutionary constraints can identify functional residues involved in ligand binding, biomolecule-interactions, alternative ensembles of conformations, “invisible” tertiary states of disordered proteins and allows quantitative prediction of effects of mutations. In this chapter we present an overview of the statistical inference methodologies, a survey of the resulting applications and challenges facing the field.
Parts of this chapter have been adapted from (Hopf 2016).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ (2011) Learning generative models for protein fold families. Proteins 79(4):1061–1078. doi:10.1002/prot.22934
Baradaran R, Berrisford JM, Minhas GS, Sazanov LA (2013) Crystal structure of the entire respiratory complex I. Nature 494(7438):443–448. doi:10.1038/nature11871
Ben-Naim E, Lapedes AS (1999) Genetic correlations in mutation processes. Phys Rev E Stat Phys Plasmas Fluids 59(6):7000–7007
Besag J (1975) Statistical analysis of non-lattice data. Statistician 179–195
Bitbol AF, Dwyer RS, Colwell LJ, Wingreen NS (2016) Inferring interaction partners from protein sequences. Proc Natl Acad Sci USA 113(43):12180–12185. doi:10.1073/pnas.1606762113
Boyd JS, Cheng RR, Paddock ML, Sancar C, Morcos F, Golden SS (2016) A combined computational and genetic approach uncovers network interactions of the cyanobacterial circadian clock. J Bacteriol 198(18):2439–2447. doi:10.1128/JB.00235-16
Burger L, van Nimwegen E (2008) Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Molecular Syst biology 4:165. doi:10.1038/msb4100203
Burger L, van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6(1):e1000633. doi:10.1371/journal.pcbi.1000633
Cheng RR, Nordesjo O, Hayes RL, Levine H, Flores SC, Onuchic JN, Morcos F (2016) Connecting the sequence-space of bacterial signaling proteins to phenotypes using coevolutionary landscapes. Mol Biol Evol. doi:10.1093/molbev/msw188
Deng Z, Huang W, Bakkalbasi E, Brown NG, Adamski CJ, Rice K, Muzny D, Gibbs RA, Palzkill T (2012) Deep sequencing of systematic combinatorial libraries reveals beta-lactamase sequence constraints at high resolution. J Mol Biol 424(3–4):150–167. doi:10.1016/j.jmb.2012.09.014
dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN (2015) Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 5:13652. doi:10.1038/srep13652
Ekeberg M, Lovkvist C, Lan Y, Weigt M, Aurell E (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlin Soft Matter Phys 87(1):012707
Feinauer C, Szurmant H, Weigt M, Pagnani A (2016) Inter-protein sequence co-evolution predicts known physical interactions in Bacterial Ribosomes and the Trp Operon. PLoS ONE 11(2):e0149166. doi:10.1371/journal.pone.0149166
Figliuzzi M, Jacquier H, Schug A, Tenaillon O, Weigt M (2016) Coevolutionary landscape inference and the context-dependence of mutations in Beta-Lactamase TEM-1. Mol Biol Evol 33(1):268–280. doi:10.1093/molbev/msv211
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44(D1):279–285. doi:10.1093/nar/gkv1344
Giraud BG, Heumann JM, Lapedes AS (1999) Superadditive correlation. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 59 (5 Pt A):4983–4991
Gobel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18(4):309–317. doi:10.1002/prot.340180402
Gueudre T, Baldassi C, Zamparo M, Weigt M, Pagnani A (2016) Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc Natl Acad Sci USA 113(43):12186–12191. doi:10.1073/pnas.1607570113
Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD (1992) Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res 20(21):5785–5795
Hopf T (2016) Phenotype prediction from evolutionary sequence covariation. München, Technische Universität München, Diss 2016
Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149(7):1607–1621. doi:10.1016/j.cell.2012.04.012
Hopf TA, Ingraham JB, Poelwijk FJ, Springer M, Sander C, Marks DS (2015a) Quantification of the effect of mutations using a global probability model of natural sequence variation. arXiv preprint arXiv:151004612
Hopf TA, Ingraham JI, Poelwijk FJ, Schärfe CPI, Springer M, Sander C, Marks DS (2017) Mutational effects captured by epistatic models of evolutionary sequence variation. Nat Biotech 35:128–135. doi:10.1038/nbt.3769
Hopf TA, Morinaga S, Ihara S, Touhara K, Marks DS, Benton R (2015b) Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors. Nat Commun 6:6077. doi:10.1038/ncomms7077
Hopf TA, Schärfe CP, Rodrigues JP, Green AG, Kohlbacher O, Sander C, Bonvin AM, Marks DS (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3. doi:10.7554/eLife.03430
Jacquier H, Birgy A, Le Nagard H, Mechulam Y, Schmitt E, Glodt J, Bercot B, Petit E, Poulain J, Barnaud G, Gros PA, Tenaillon O (2013) Capturing the mutational landscape of the beta-lactamase TEM-1. Proc Natl Acad Sci USA 110(32):13067–13072. doi:10.1073/pnas.1215206110
Jones DT, Buchan DW, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28(2):184–190. doi:10.1093/bioinformatics/btr638
Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31(7):999–1006
Kajan L, Hopf TA, Kalas M, Marks DS, Rost B (2014) FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 15:85. doi:10.1186/1471-2105-15-85
Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci USA 110(39):15674–15679. doi:10.1073/pnas.1314045110
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press
Kosciolek T, Jones DT (2014) De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS ONE 9(3):e92197. doi:10.1371/journal.pone.0092197
Lapedes A, Giraud B, Jarzynski C (2012) Using sequence alignments to predict protein structure and stability with high accuracy. arXiv preprint arXiv:12072484
Lapedes AS, Giraud BG, Liu LC, Stormo GD (1997) Correlated Mutations in Protein Sequences: Phylogenetic and Structural Effects. Santa Fe Institute
Li C, Qian W, Maclean CJ, Zhang J (2016) The fitness landscape of a tRNA gene. Science. doi:10.1126/science.aae0568
Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast-folding proteins fold. Science 334(6055):517–520. doi:10.1126/science.1208351
Mann JK, Barton JP, Ferguson AL, Omarjee S, Walker BD, Chakraborty A, Ndung’u T (2014) The fitness landscape of HIV-1 gag: advanced modeling approaches and validation of model predictions by in vitro testing. PLoS Comput Biol 10(8):e1003776. doi:10.1371/journal.pcbi.1003776
Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6(12):e28766. doi:10.1371/journal.pone.0028766
Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30(11):1072–1080. doi:10.1038/nbt.2419
Melamed D, Young DL, Gamble CE, Miller CR, Fields S (2013) Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA 19(11):1537–1551. doi:10.1261/rna.040709.113
Melamed D, Young DL, Miller CR, Fields S (2015) Combining natural sequence variation with high throughput mutational data to reveal protein interaction sites. PLoS Genet 11(2):e1004918. doi:10.1371/journal.pgen.1004918
Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS (2014) Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res 42(14):e112. doi:10.1093/nar/gku511
Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A (2014) PconsFold: improved contact predictions improve protein models. Bioinformatics 30(17):482–488. doi:10.1093/bioinformatics/btu458
Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci USA 110(51):20533–20538. doi:10.1073/pnas.1315625110
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA 108(49):1293–1301. doi:10.1073/pnas.1111471108
Mosca R, Ceol A, Stein A, Olivella R, Aloy P (2014) 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic acids research 42 (Database issue): 374–379. doi:10.1093/nar/gkt887
Neher E (1994) How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA 91(1):98–102
Oates ME, Romero P, Ishida T, Ghalwash M, Mizianty MJ, Xue B, Dosztanyi Z, Uversky VN, Obradovic Z, Kurgan L, Dunker AK, Gough J (2013) D(2) P(2): database of disordered protein predictions. Nucleic acids research 41 (Database issue): 508–516. doi:10.1093/nar/gks1226
Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3: 02030. doi:10.7554/eLife.02030
Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim DE, Kamisetty H, Grishin NV, Baker D (2015) Large-scale determination of previously unsolved protein structures using evolutionary information. eLife 4: 09248. doi:10.7554/eLife.09248
Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271(4):511–523. doi:10.1006/jmbi.1997.1198
Perdigao N, Heinrich J, Stolte C, Sabir KS, Buckley MJ, Tabor B, Signal B, Gloss BS, Hammang CJ, Rost B, Schafferhans A, O’Donoghue SI (2015) Unexpected features of the dark proteome. Proc Natl Acad Sci USA 112(52):15898–15903. doi:10.1073/pnas.1508380112
Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ, Baker D (2007) High-resolution structure prediction and the crystallographic phase problem. Nature 450(7167):259–264. doi:10.1038/nature06249
Rajagopala SV, Sikorski P, Kumar A, Mosca R, Vlasblom J, Arnold R, Franca-Koh J, Pakala SB, Phanse S, Ceol A, Hauser R, Siszler G, Wuchty S, Emili A, Babu M, Aloy P, Pieper R, Uetz P (2014) The binary protein-protein interaction landscape of Escherichia coli. Nat Biotechnol 32(3):285–290. doi:10.1038/nbt.2831
Rockah-Shmuel L, Toth-Petroczy A, Tawfik DS (2015) Systematic mapping of protein mutational space by prolonged drift reveals the deleterious effects of seemingly neutral mutations. PLoS Comput Biol 11(8):e1004421. doi:10.1371/journal.pcbi.1004421
Roscoe BP, Bolon DN (2014) Systematic exploration of ubiquitin sequence, E1 activation efficiency, and experimental fitness in yeast. J Mol Biol 426(15):2854–2870. doi:10.1016/j.jmb.2014.05.019
Seemayer S, Gruber M, Soding J (2014) CCMpred–fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 30(21):3128–3130. doi:10.1093/bioinformatics/btu500
Shindyalov IN, Kolchanov NA, Sander C (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 7(3):349–358
Skerker JM, Perchuk BS, Siryaporn A, Lubin EA, Ashenberg O, Goulian M, Laub MT (2008) Rewiring the specificity of two-component signal transduction systems. Cell 133(6):1043–1054. doi:10.1016/j.cell.2008.04.040
Starita LM, Pruneda JN, Lo RS, Fowler DM, Kim HJ, Hiatt JB, Shendure J, Brzovic PS, Fields S, Klevit RE (2013) Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proc Natl Acad Sci USA 110(14):1263–1272. doi:10.1073/pnas.1303309110
Starita LM, Young DL, Islam M, Kitzman JO, Gullingsrud J, Hause RJ, Fowler DM, Parvin JD, Shendure J, Fields S (2015) Massively Parallel Functional Analysis of BRCA1 RING Domain Variants. Genetics. doi:10.1534/genetics.115.175802
Stein RR, Marks DS, Sander C (2015) Inferring pairwise interactions from biological data using maximum-entropy probability models. PLoS Comput Biol 11(7):e1004182. doi:10.1371/journal.pcbi.1004182
Stiffler MA, Hekstra DR, Ranganathan R (2015) Evolvability as a function of purifying selection in TEM-1 beta-Lactamase. Cell 160(5):882–892. doi:10.1016/j.cell.2015.01.035
Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN (2012) Genomics-aided structure prediction. Proc Natl Acad Sci USA 109(26):10340–10345. doi:10.1073/pnas.1207864109
Tanabe H, Fujii Y, Okada-Iwabu M, Iwabu M, Nakamura Y, Hosaka T, Motoyama K, Ikeda M, Wakiyama M, Terada T, Ohsawa N, Hato M, Ogasawara S, Hino T, Murata T, Iwata S, Hirata K, Kawano Y, Yamamoto M, Kimura-Someya T, Shirouzu M, Yamauchi T, Kadowaki T, Yokoyama S (2015) Crystal structures of the human adiponectin receptors. Nature 520(7547):312–316. doi:10.1038/nature14301
Tang Y, Huang YJ, Hopf TA, Sander C, Marks DS, Montelione GT (2015) Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat Methods 12(8):751–754. doi:10.1038/nmeth.3455
Toth-Petroczy A, Palmedo P, Ingraham J, Hopf TA, Berger B, Sander C, Marks DS (2016) Structured states of disordered proteins from genomic sequences. cell 167 (1):158–170 e112. doi:10.1016/j.cell.2016.09.010
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones DT, Kim PM, Kriwacki RW, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright PE, Babu MM (2014) Classification of intrinsically disordered regions and proteins. Chem Rev 114(13):6589–6631. doi:10.1021/cr400525m
Webb B, Sali A (2014) Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics 47:5 6 1–32. doi:10.1002/0471250953.bi0506s47
Weigt M, White RA, Szurmant H, Hoch JA, Hwa T (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci USA 106(1):67–72. doi:10.1073/pnas.0805923106
Weinreb C, Riesselman AJ, Ingraham JB, Gross T, Sander C, Marks DS (2016) 3D RNA and Functional Interactions from Evolutionary Couplings. Cell 165(4):963–975. doi:10.1016/j.cell.2016.03.030
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Hopf, T.A., Marks, D.S. (2017). Protein Structures, Interactions and Function from Evolutionary Couplings. In: J. Rigden, D. (eds) From Protein Structure to Function with Bioinformatics. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-1069-3_2
Download citation
DOI: https://doi.org/10.1007/978-94-024-1069-3_2
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-1067-9
Online ISBN: 978-94-024-1069-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)