Skip to main content

Computational Prediction of Short Linear Motifs from Protein Sequences

  • Protocol
  • First Online:
Computational Peptidology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1268))

Abstract

Short Linear Motifs (SLiMs) are functional protein microdomains that typically mediate interactions between a short linear region in one protein and a globular domain in another. SLiMs usually occur in structurally disordered regions and mediate low affinity interactions. Most SLiMs are 3–15 amino acids in length and have 2–5 defined positions, making them highly likely to occur by chance and extremely difficult to identify. Nevertheless, our knowledge of SLiMs and capacity to predict them from protein sequence data using computational methods has advanced dramatically over the past decade. By considering the biological, structural, and evolutionary context of SLiM occurrences, it is possible to differentiate functional instances from chance matches in many cases and to identify new regions of proteins that have the features consistent with a SLiM-mediated interaction. Their simplicity also makes SLiMs evolutionarily labile and prone to independent origins on different sequence backgrounds through convergent evolution, which can be exploited for predicting novel SLiMs in proteins that share a function or interaction partner.

In this review, we explore our current knowledge of SLiMs and how it can be applied to the task of predicting them computationally from protein sequences. Rather than focusing on specific SLiM prediction tools, we provide an overview of the methods available and concentrate on principles that should continue to be paramount even in the light of future developments. We consider the relative merits of using regular expressions or profiles for SLiM discovery and discuss the main considerations for both predicting new instances of known SLiMs, and de novo prediction of novel SLiMs. In particular, we highlight the importance of correctly modelling evolutionary relationships and the probability of false positive predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

DMI:

Domain-motif interaction

ELM:

Eukaryotic linear motif

FPR:

False positive rate

GO:

Gene ontology

HMM:

Hidden Markov model

IDP:

Intrinsically disordered protein

IDR:

Intrinsically disordered region

LDMS:

(l, d) motif search

MnM:

Minimotif miner

MoRF:

Molecular recognition feature

MST:

Minimum spanning tree

PPI:

Protein-protein interaction

PSSM:

Position-specific scoring matrix

PTM:

Posttranslational modification

Regex:

Regular expression

SLiM:

Short linear motif

TPR:

True positive rate

References

  1. Davey NE, Van Roey K, Weatheritt RJ et al (2012) Attributes of short linear motifs. Mol Biosyst 8(1):268–281

    CAS  PubMed  Google Scholar 

  2. Pawson T (1995) Protein modules and signalling networks. Nature 373(6515):573–580

    CAS  PubMed  Google Scholar 

  3. Davis BD, Tai PC (1980) The mechanism of protein secretion across membranes. Nature 283(5746):433–438

    CAS  PubMed  Google Scholar 

  4. Aasland R, Abrams C, Ampe C et al (2002) Normalization of nomenclature for peptide motifs as ligands of modular protein domains. FEBS Lett 513(1):141–144

    CAS  PubMed  Google Scholar 

  5. Puntervoll P, Linding R, Gemund C et al (2003) ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res 31(13):3625–3630

    CAS  PubMed Central  PubMed  Google Scholar 

  6. Pancsa R, Fuxreiter M (2012) Interactions via intrinsically disordered regions: what kind of motifs? IUBMB Life 64(6):513–520

    CAS  PubMed  Google Scholar 

  7. Neduva V, Russell RB (2006) Peptides mediating interaction networks: new leads at last. Curr Opin Biotechnol 17(5):465–471

    CAS  PubMed  Google Scholar 

  8. Diella F, Haslam N, Chica C et al (2008) Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci 13:6580–6603

    CAS  PubMed  Google Scholar 

  9. Dinkel H, Van Roey K, Michael S et al (2014) The eukaryotic linear motif resource ELM: 10 years and counting. Nucleic Acids Res 42(1):D259–D266

    CAS  PubMed Central  PubMed  Google Scholar 

  10. Mi T, Merlin JC, Deverasetty S et al (2012) Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences. Nucleic Acids Res 40(Database issue):D252–D260

    CAS  PubMed Central  PubMed  Google Scholar 

  11. Davey NE, Edwards RJ, Shields DC (2010) Computational identification and analysis of protein short linear motifs. Front Biosci (Landmark Ed) 15:801–825

    CAS  Google Scholar 

  12. Neduva V, Russell RB (2005) Linear motifs: evolutionary interaction switches. FEBS Lett 579(15):3342–3345

    CAS  PubMed  Google Scholar 

  13. Van Roey K, Gibson TJ, Davey NE (2012) Motif switches: decision-making in cell regulation. Curr Opin Struct Biol 22(3):378–385

    PubMed  Google Scholar 

  14. Vyas J, Nowling RJ, Maciejewski MW et al (2009) A proposed syntax for Minimotif Semantics, version 1. BMC Genomics 10:360

    PubMed Central  PubMed  Google Scholar 

  15. Davey NE, Trave G, Gibson TJ (2011) How viruses hijack cell regulation. Trends Biochem Sci 36(3):159–169

    CAS  PubMed  Google Scholar 

  16. Garamszegi S, Franzosa EA, Xia Y (2013) Signatures of pleiotropy, economy and convergent evolution in a domain-resolved map of human-virus protein-protein interaction networks. PLoS Pathog 9(12):e1003778

    PubMed Central  PubMed  Google Scholar 

  17. Davey NE, Edwards RJ, Shields DC (2010) Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins. BMC Bioinform 11:14

    Google Scholar 

  18. Sigrist CJ, Cerutti L, Hulo N et al (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3):265–274

    CAS  PubMed  Google Scholar 

  19. Xia X (2012) Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction. Scientifica (Cairo) 2012:917540

    Google Scholar 

  20. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763

    CAS  PubMed  Google Scholar 

  21. Krogh A, Brown M, Mian IS et al (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235(5):1501–1531

    CAS  PubMed  Google Scholar 

  22. Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641

    CAS  PubMed Central  PubMed  Google Scholar 

  23. Yoon BJ (2009) Hidden Markov models and their applications in biological sequence analysis. Curr Genomics 10(6):402–415

    CAS  PubMed Central  PubMed  Google Scholar 

  24. Seiler M, Mehrle A, Poustka A et al (2006) The 3of5 web application for complex and comprehensive pattern matching in protein sequences. BMC Bioinform 7:144

    Google Scholar 

  25. Davey NE, Haslam NJ, Shields DC et al (2010) SLiMSearch: a webserver for finding novel occurrences of short linear motifs in proteins, incorporating sequence context. Lect Notes Bioinform 6282:50–61

    CAS  Google Scholar 

  26. Meszaros B, Dosztanyi Z, Simon I (2012) Disordered binding regions and linear motifs–bridging the gap between two models of molecular recognition. PLoS One 7(10):e46829

    CAS  PubMed Central  PubMed  Google Scholar 

  27. Davey NE, Shields DC, Edwards RJ (2009) Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery. Bioinformatics 25(4):443–450

    CAS  PubMed  Google Scholar 

  28. Brown CJ, Takayama S, Campen AM et al (2002) Evolutionary rate heterogeneity in proteins with long disordered regions. J Mol Evol 55(1):104–110

    CAS  PubMed  Google Scholar 

  29. Tóth-Petróczy A, Mészáros B, Simon I et al (2008) Assessing conservation of disordered regions in proteins. Open Proteom J 1:46–53

    Google Scholar 

  30. Fuxreiter M, Tompa P, Simon I (2007) Local structural disorder imparts plasticity on linear motifs. Bioinformatics 23(8):950–956

    CAS  PubMed  Google Scholar 

  31. Remaut H, Waksman G (2006) Protein-protein interaction through beta-strand addition. Trends Biochem Sci 31(8):436–444

    CAS  PubMed  Google Scholar 

  32. Cino EA, Choy WY, Karttunen M (2013) Conformational biases of linear motifs. J Phys Chem B 117(50):15943–15957

    CAS  PubMed  Google Scholar 

  33. Abeln S, Frenkel D (2008) Disordered flanks prevent peptide aggregation. PLoS Comput Biol 4(12):e1000241

    PubMed Central  PubMed  Google Scholar 

  34. Sehnal D, Varekova RS, Huber HJ et al (2012) SiteBinder: an improved approach for comparing multiple protein structural motifs. J Chem Inf Model 52(2):343–359

    CAS  PubMed  Google Scholar 

  35. Buljan M, Chalancon G, Eustermann S et al (2012) Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell 46(6):871–883

    CAS  PubMed Central  PubMed  Google Scholar 

  36. Weatheritt RJ, Davey NE, Gibson TJ (2012) Linear motifs confer functional diversity onto splice variants. Nucleic Acids Res 40(15):7123–7131

    CAS  PubMed Central  PubMed  Google Scholar 

  37. Weatheritt RJ, Gibson TJ (2012) Linear motifs: lost in (pre)translation. Trends Biochem Sci 37(8):333–341

    CAS  PubMed  Google Scholar 

  38. Wan J, Qian SB (2014) TISdb: a database for alternative translation initiation in mammalian cells. Nucleic Acids Res 42(1):D845–D850

    CAS  PubMed Central  PubMed  Google Scholar 

  39. Kochetov AV (2008) Alternative translation start sites and hidden coding potential of eukaryotic mRNAs. Bioessays 30(7):683–691

    CAS  PubMed  Google Scholar 

  40. UniProt C (2014) Activities at the universal protein resource (UniProt). Nucleic Acids Res 42(1):D191–D198

    Google Scholar 

  41. Edwards RJ, Davey NE, Shields DC (2007) SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS One 2(10):e967

    PubMed Central  PubMed  Google Scholar 

  42. Davey NE, Edwards RJ, Shields DC (2007) The SLiMDisc server: short, linear motif discovery in proteins. Nucleic Acids Res 35(Web Server issue):W455–W459

    PubMed Central  PubMed  Google Scholar 

  43. Flicek P, Amode MR, Barrell D et al (2014) Ensembl 2014. Nucleic Acids Res 42(1):D749–D755

    CAS  PubMed Central  PubMed  Google Scholar 

  44. Oldfield CJ, Cheng Y, Cortese MS et al (2005) Coupled folding and binding with alpha-helix-forming molecular recognition elements. Biochemistry 44(37):12454–12470

    CAS  PubMed  Google Scholar 

  45. Mohan A, Oldfield CJ, Radivojac P et al (2006) Analysis of molecular recognition features (MoRFs). J Mol Biol 362(5):1043–1059

    CAS  PubMed  Google Scholar 

  46. Vacic V, Oldfield CJ, Mohan A et al (2007) Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome Res 6(6):2351–2366

    CAS  PubMed Central  PubMed  Google Scholar 

  47. Stein A, Aloy P (2008) Contextual specificity in peptide-mediated protein interactions. PLoS One 3(7):e2524

    PubMed Central  PubMed  Google Scholar 

  48. Teyra J, Sidhu SS, Kim PM (2012) Elucidation of the binding preferences of peptide recognition modules: SH3 and PDZ domains. FEBS Lett 586(17):2631–2637

    CAS  PubMed  Google Scholar 

  49. Liu Y, Woods NT, Kim D et al (2011) Yeast two-hybrid junk sequences contain selected linear motifs. Nucleic Acids Res 39(19):e128

    CAS  PubMed Central  PubMed  Google Scholar 

  50. Eisenhaber B, Eisenhaber F (2010) Prediction of posttranslational modification of proteins from their amino acid sequence. Methods Mol Biol 609:365–384

    CAS  PubMed  Google Scholar 

  51. Trost B, Kusalik A (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935

    CAS  PubMed  Google Scholar 

  52. Sigrist CJ, De Castro E, Langendijk-Genevaux PS et al (2005) ProRule: a new database containing functional and structural information on PROSITE profiles. Bioinformatics 21(21):4060–4066

    CAS  PubMed  Google Scholar 

  53. Sigrist CJ, de Castro E, Cerutti L et al (2013) New and continuing developments at PROSITE. Nucleic Acids Res 41(Database issue):D344–D347

    CAS  PubMed Central  PubMed  Google Scholar 

  54. Letunic I, Doerks T, Bork P (2012) SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res 40(Database issue):D302–D305

    CAS  PubMed Central  PubMed  Google Scholar 

  55. Punta M, Coggill PC, Eberhardt RY et al (2012) The Pfam protein families database. Nucleic Acids Res 40(Database issue):D290–D301

    CAS  PubMed Central  PubMed  Google Scholar 

  56. Chica C, Labarga A, Gould CM et al (2008) A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences. BMC Bioinform 9:229

    Google Scholar 

  57. Via A, Gould CM, Gemund C et al (2009) A structure filter for the Eukaryotic Linear Motif Resource. BMC Bioinform 10:351

    Google Scholar 

  58. Weatheritt RJ, Jehl P, Dinkel H et al (2012) iELM–a web server to explore short linear motif-mediated interactions. Nucleic Acids Res 40(Web Server issue):W364–W369

    CAS  PubMed Central  PubMed  Google Scholar 

  59. Dinkel H, Chica C, Via A et al (2011) Phospho.ELM: a database of phosphorylation sites–update 2011. Nucleic Acids Res 39(Database issue):D261–D267

    CAS  PubMed Central  PubMed  Google Scholar 

  60. Van Roey K, Dinkel H, Weatheritt RJ et al (2013) The switches.ELM resource: a compendium of conditional regulatory interaction interfaces. Sci Signal 6(269):rs7

    PubMed  Google Scholar 

  61. Jin J, Pawson T (2012) Modular evolution of phosphorylation-based signalling systems. Philos Trans R Soc Lond B Biol Sci 367(1602):2540–2555

    CAS  PubMed Central  PubMed  Google Scholar 

  62. Songyang Z, Blechner S, Hoagland N et al (1994) Use of an oriented peptide library to determine the optimal substrates of protein kinases. Curr Biol 4(11):973–982

    CAS  PubMed  Google Scholar 

  63. Edwards RJ, Davey NE, O’Brien K et al (2012) Interactome-wide prediction of short, disordered protein interaction motifs in humans. Mol Biosyst 8(1):282–295

    CAS  PubMed  Google Scholar 

  64. Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29

    CAS  PubMed Central  PubMed  Google Scholar 

  65. Hamosh A, Scott AF, Amberger J et al (2000) Online mendelian inheritance in man (OMIM). Hum Mutat 15(1):57–61

    CAS  PubMed  Google Scholar 

  66. Goel R, Harsha HC, Pandey A et al (2012) Human protein reference database and human proteinpedia as resources for phosphoproteome analysis. Mol Biosyst 8(2):453–463

    CAS  PubMed Central  PubMed  Google Scholar 

  67. Safran M, Dalah I, Alexander J et al (2010) GeneCards Version 3: the human gene integrator. Database (Oxford) 2010:baq020

    Google Scholar 

  68. Davey NE, Haslam NJ, Shields DC et al (2011) SLiMSearch 2.0: biological context for short linear motifs in proteins. Nucleic Acids Res 39(Web Server issue):W56–W60

    CAS  PubMed Central  PubMed  Google Scholar 

  69. Edwards RJ, Davey NE, Shields DC (2008) CompariMotif: quick and easy comparisons of sequence motifs. Bioinformatics 24(10):1307–1309

    CAS  PubMed  Google Scholar 

  70. Marsico A, Scheubert K, Tuukkanen A et al (2010) MeMotif: a database of linear motifs in alpha-helical transmembrane proteins. Nucleic Acids Res 38(Database issue):D181–D189

    CAS  PubMed Central  PubMed  Google Scholar 

  71. Neduva V, Linding R, Su-Angrand I et al (2005) Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol 3(12):e405

    PubMed Central  PubMed  Google Scholar 

  72. Bailey TL, Boden M, Buske FA et al (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37(Web Server issue):W202–W208

    CAS  PubMed Central  PubMed  Google Scholar 

  73. Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 27(7):1017–1018

    CAS  PubMed Central  PubMed  Google Scholar 

  74. Frith MC, Saunders NF, Kobe B et al (2008) Discovering sequence motifs with arbitrary insertions and deletions. PLoS Comput Biol 4(4):e1000071

    PubMed Central  PubMed  Google Scholar 

  75. Bailey TL, Gribskov M (1997) Score distributions for simultaneous matching to multiple motifs. J Comput Biol 4(1):45–59

    CAS  PubMed  Google Scholar 

  76. de Castro E, Sigrist CJ, Gattiker A et al (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34(Web Server issue):W362–W365

    PubMed Central  PubMed  Google Scholar 

  77. Davey NE, Shields DC, Edwards RJ (2006) SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res 34(12):3546–3554

    CAS  PubMed Central  PubMed  Google Scholar 

  78. Peng ZL, Kurgan L (2012) Comprehensive comparative assessment of in-silico predictors of disordered regions. Curr Protein Pept Sci 13(1):6–18

    CAS  PubMed  Google Scholar 

  79. Deng X, Eickholt J, Cheng J (2012) A comprehensive overview of computational protein disorder prediction methods. Mol Biosyst 8(1):114–121

    CAS  PubMed Central  PubMed  Google Scholar 

  80. Dosztanyi Z, Csizmok V, Tompa P et al (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21(16):3433–3434

    CAS  PubMed  Google Scholar 

  81. Haslam NJ, Shields DC (2012) Profile-based short linear protein motif discovery. BMC Bioinform 13:104

    Google Scholar 

  82. Sickmeier M, Hamilton JA, LeGall T et al (2007) DisProt: the database of disordered proteins. Nucleic Acids Res 35(Database issue):D786–D793

    CAS  PubMed Central  PubMed  Google Scholar 

  83. Chen JW, Romero P, Uversky VN et al (2006) Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions. J Proteome Res 5(4):879–887

    PubMed Central  PubMed  Google Scholar 

  84. Tompa P, Fuxreiter M, Oldfield CJ et al (2009) Close encounters of the third kind: disordered domains and the interactions of proteins. Bioessays 31(3):328–335

    CAS  PubMed  Google Scholar 

  85. Williams RW, Xue B, Uversky VN et al (2013) Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains. Intrins Disord Prot 1:e25724

    Google Scholar 

  86. Schaeffer RD, Jonsson AL, Simms AM et al (2011) Generation of a consensus protein domain dictionary. Bioinformatics 27(1):46–54

    CAS  PubMed Central  PubMed  Google Scholar 

  87. Towse CL, Daggett V (2012) When a domain is not a domain, and why it is important to properly filter proteins in databases: conflicting definitions and fold classification systems for structural domains make filtering of such databases imperative. Bioessays 34(12):1060–1069

    CAS  PubMed Central  PubMed  Google Scholar 

  88. Linding R, Russell RB, Neduva V et al (2003) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31(13):3701–3708

    CAS  PubMed Central  PubMed  Google Scholar 

  89. Mosca R, Ceol A, Stein A et al (2014) 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res 42(1):D374–D379

    CAS  PubMed Central  PubMed  Google Scholar 

  90. Stein A, Aloy P (2010) Novel peptide-mediated interactions derived from high-resolution 3-dimensional structures. PLoS Comput Biol 6(5):e1000789

    PubMed Central  PubMed  Google Scholar 

  91. Brannetti B, Helmer-Citterich M (2003) iSPOT: a web tool to infer the interaction specificity of families of protein modules. Nucleic Acids Res 31(13):3709–3711

    CAS  PubMed Central  PubMed  Google Scholar 

  92. Trabuco LG, Lise S, Petsalaki E et al (2012) PepSite: prediction of peptide-binding sites from protein surfaces. Nucleic Acids Res 40(Web Server issue):W423–W427

    CAS  PubMed Central  PubMed  Google Scholar 

  93. Perrodou E, Chica C, Poch O et al (2008) A new protein linear motif benchmark for multiple sequence alignment software. BMC Bioinform 9:213

    Google Scholar 

  94. Sayers EW, Barrett T, Benson DA et al (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 39(Database issue):D38–D51

    CAS  PubMed Central  PubMed  Google Scholar 

  95. Balla S, Thapar V, Verma S et al (2006) Minimotif Miner: a tool for investigating protein function. Nat Methods 3(3):175–177

    CAS  PubMed  Google Scholar 

  96. Dinkel H, Sticht H (2007) A computational strategy for the prediction of functional linear peptide motifs in proteins. Bioinformatics 23(24):3297–3303

    CAS  PubMed  Google Scholar 

  97. Davey NE, Cowan JL, Shields DC et al (2012) SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions. Nucleic Acids Res 40(21):10628–10641

    CAS  PubMed Central  PubMed  Google Scholar 

  98. Chica C, Diella F, Gibson TJ (2009) Evidence for the concerted evolution between short linear protein motifs and their flanking regions. PLoS One 4(7):e6052

    PubMed Central  PubMed  Google Scholar 

  99. O’Brien KT, Haslam NJ, Shields DC (2013) SLiMScape: a protein short linear motif analysis plugin for Cytoscape. BMC Bioinform 14:224

    Google Scholar 

  100. Davey NE, Haslam NJ, Shields DC et al (2010) SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs. Nucleic Acids Res 38(Web Server issue):W534–W539

    CAS  PubMed Central  PubMed  Google Scholar 

  101. Plewczynski D, Basu S, Saha I (2012) AMS 4.0: consensus prediction of post-translational modifications in protein sequences. Amino Acids 43(2):573–582

    CAS  PubMed Central  PubMed  Google Scholar 

  102. Kerrien S, Aranda B, Breuza L et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(Database issue):D841–D846

    CAS  PubMed Central  PubMed  Google Scholar 

  103. Altschul SF, Madden TL, Schaffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    CAS  PubMed Central  PubMed  Google Scholar 

  104. Via A, Gherardini PF, Ferraro E et al (2007) False occurrences of functional motifs in protein sequences highlight evolutionary constraints. BMC Bioinform 8:68

    Google Scholar 

  105. Nguyen Ba AN, Yeh BJ, van Dyk D et al (2012) Proteome-wide discovery of evolutionary conserved sequences in disordered regions. Sci Signal 5(215):rs1

    PubMed  Google Scholar 

  106. Fang C, Noguchi T, Tominaga D et al (2013) MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation. BMC Bioinform 14:300

    Google Scholar 

  107. Chou MF, Schwartz D (2011) Biological sequence motif discovery using motif-x. Curr Protoc Bioinform Chapter 13, Unit 13 15–24

    Google Scholar 

  108. Orchard S (2012) Molecular interaction databases. Proteomics 12(10):1656–1662

    CAS  PubMed  Google Scholar 

  109. Jonassen I (1997) Efficient discovery of conserved patterns using a pattern graph. Comput Appl Biosci 13(5):509–522

    CAS  PubMed  Google Scholar 

  110. Jonassen I, Collins JF, Higgins DG (1995) Finding flexible patterns in unaligned protein sequences. Protein Sci 4(8):1587–1595

    CAS  PubMed Central  PubMed  Google Scholar 

  111. Neuwald AF, Green P (1994) Detecting patterns in protein sequences. J Mol Biol 239(5):698–712

    CAS  PubMed  Google Scholar 

  112. Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1):55–67

    CAS  PubMed  Google Scholar 

  113. Neduva V, Russell RB (2006) DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res 34(Web Server issue):W350–W355

    CAS  PubMed Central  PubMed  Google Scholar 

  114. Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2:28–36

    CAS  PubMed  Google Scholar 

  115. Lawrence CE, Reilly AA (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7(1):41–51

    CAS  PubMed  Google Scholar 

  116. Do CB, Batzoglou S (2008) What is the expectation maximization algorithm? Nat Biotechnol 26(8):897–899

    CAS  PubMed  Google Scholar 

  117. Down TA, Hubbard TJ (2005) NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence. Nucleic Acids Res 33(5):1445–1453

    CAS  PubMed Central  PubMed  Google Scholar 

  118. Dogruel M, Down TA, Hubbard TJ (2008) NestedMICA as an ab initio protein motif discovery tool. BMC Bioinform 9:19

    Google Scholar 

  119. Dinh H, Rajasekaran S (2013) PMS: a panoptic motif search tool. PLoS One 8(12):e80660

    PubMed Central  PubMed  Google Scholar 

  120. Dinh H, Rajasekaran S, Davila J (2012) qPMS7: a fast algorithm for finding (l, d)-motifs in DNA and protein sequences. PLoS One 7(7):e41425

    CAS  PubMed Central  PubMed  Google Scholar 

  121. Tan SH, Hugo W, Sung WK et al (2006) A correlated motif approach for finding short linear motifs from protein interaction networks. BMC Bioinform 7:502

    Google Scholar 

  122. Leung HC, Siu MH, Yiu SM et al (2009) Clustering-based approach for predicting motif pairs from protein interaction data. J Bioinform Comput Biol 7(4):701–716

    CAS  PubMed  Google Scholar 

  123. Boyen P, Van Dyck D, Neven F et al (2011) SLIDER: a generic metaheuristic for the discovery of correlated motifs in protein-protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform 8(5):1344–1357

    CAS  PubMed  Google Scholar 

  124. Lieber DS, Elemento O, Tavazoie S (2010) Large-scale discovery and characterization of protein regulatory motifs in eukaryotes. PLoS One 5(12):e14444

    CAS  PubMed Central  PubMed  Google Scholar 

  125. Dosztanyi Z, Meszaros B, Simon I (2009) ANCHOR: web server for predicting protein binding regions in disordered proteins. Bioinformatics 25(20):2745–2746

    CAS  PubMed Central  PubMed  Google Scholar 

  126. Meszaros B, Simon I, Dosztanyi Z (2009) Prediction of protein binding regions in disordered proteins. PLoS Comput Biol 5(5):e1000376

    PubMed Central  PubMed  Google Scholar 

  127. Cheng Y, Oldfield CJ, Meng J et al (2007) Mining alpha-helix-forming molecular recognition features with cross species sequence alignments. Biochemistry 46(47):13468–13477

    CAS  PubMed Central  PubMed  Google Scholar 

  128. Disfani FM, Hsu WL, Mizianty MJ et al (2012) MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. Bioinformatics 28(12):i75–i83

    CAS  PubMed Central  PubMed  Google Scholar 

  129. Mooney C, Pollastri G, Shields DC et al (2012) Prediction of short linear protein binding regions. J Mol Biol 415(1):193–204

    CAS  PubMed  Google Scholar 

  130. Rose PW, Bi C, Bluhm WF et al (2013) The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res 41(Database issue):D475–D482

    CAS  PubMed Central  PubMed  Google Scholar 

  131. Betel D, Breitkreuz KE, Isserlin R et al (2007) Structure-templated predictions of novel protein interactions from sequence information. PLoS Comput Biol 3(9):1783–1789

    CAS  PubMed  Google Scholar 

  132. Hugo W, Sung WK, Ng SK (2013) Discovering interacting domains and motifs in protein-protein interactions. Methods Mol Biol 939:9–20

    CAS  PubMed  Google Scholar 

  133. Gibson TJ (2009) Cell regulation: determined to signal discrete cooperation. Trends Biochem Sci 34(10):471–482

    CAS  PubMed  Google Scholar 

  134. Lam HY, Kim PM, Mok J et al (2010) MOTIPS: automated motif analysis for predicting targets of modular protein domains. BMC Bioinform 11:243

    Google Scholar 

  135. Schwartz D, Gygi SP (2005) An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat Biotechnol 23(11):1391–1398

    CAS  PubMed  Google Scholar 

  136. Schwartz D, Chou MF, Church GM (2009) Predicting protein post-translational modifications using meta-analysis of proteome scale data sets. Mol Cell Proteomics 8(2):365–379

    CAS  PubMed Central  PubMed  Google Scholar 

  137. Villen J, Beausoleil SA, Gerber SA et al (2007) Large-scale phosphorylation analysis of mouse liver. Proc Natl Acad Sci U S A 104(5):1488–1493

    CAS  PubMed Central  PubMed  Google Scholar 

  138. Wilson-Grady JT, Villen J, Gygi SP (2008) Phosphoproteome analysis of fission yeast. J Proteome Res 7(3):1088–1097

    CAS  PubMed  Google Scholar 

  139. Zhai B, Villen J, Beausoleil SA et al (2008) Phosphoproteome analysis of Drosophila melanogaster embryos. J Proteome Res 7(4):1675–1682

    CAS  PubMed Central  PubMed  Google Scholar 

  140. Edwards RJ. SLiMSuite software package. 2013 [cited 25/1/14]; Available from: http://www.southampton.ac.uk/~re1u06/software/packages/slimsuite/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard J. Edwards .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this protocol

Cite this protocol

Edwards, R.J., Palopoli, N. (2015). Computational Prediction of Short Linear Motifs from Protein Sequences. In: Zhou, P., Huang, J. (eds) Computational Peptidology. Methods in Molecular Biology, vol 1268. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-2285-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-2285-7_6

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-2284-0

  • Online ISBN: 978-1-4939-2285-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics