A Structure Based Algorithm for Improving Motifs Prediction

  • Sudipta Pathak
  • Vamsi Krishna Kundeti
  • Martin R. Schiller
  • Sanguthevar Rajasekaran
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7986)


Minimotifs are short contiguous peptide sequences in proteins that are known to have functions. There are many repositories for experimentally validated minimotifs. MnM is one of them. Predicting minimotifs (in unknown sequences) is a challenging and interesting problem in biology. Minimotifs stored in the MnM database range in length from 5 to 15. Any algorithm for predicting minimotifs in an unknown query sequence is likely to have many false positives owing to the short lengths of the motifs looked for. Our team has developed a series of algorithms (called filters) in the past to reduce the false positives and improve the prediction accuracy. All of these algorithms are based on sequence information. In a recent paper we have demonstrated the power of structural information in characterizing motifs. In this paper we present an algorithm that exploits structural information for reducing false positives in motifs prediction. We test the validity of our algorithm using the minimotifs stored in the MnM database. MnM is a web system for minimotif search that our team has built. It houses more than 300,000 minimotifs. Our new algorithm is a learning algorithm that will be trained in the first phase and in the second phase its accuracy will be measured. For any input query protein sequence, MnM identifies a list of putative minimotifs in the query sequence. We currently employ a series of sequence based algorithms to reduce the false positives in the predictions of MnM. For every minimotif stored in MnM, we also store a number of attributes pertinent to the motif. One such attribute is the source of the minimotif. The source is nothing but the protein in which the minimotif is present. For the analysis of our new algorithm we only employ those minimtofis that have multiple sources for positive control. Random data is used as negative data. The basic idea of our algorithm is the hypothesis that a putative minimotif is likely to be valid if its structure in the query sequence is very similar to its structure in its source protein. Another important feature of our algorithm is that it is specific to individual minimotifs. In other words, a unique set of parameters is learnt for every minimotif. We feel that this is a better approach than learning a common set of parameters for all the minimotifs together. Our findings reveal that in most of the cases the occurrences of the minimotifs in their source proteins are structurally similar. Also, typically, the occurrences of a minimotif in its source protein and a random protein are dissimilar. Our experimental results show that the parameters learnt by our algorithm can significantly reduce false positives.


Protein Data Bank Source Protein Nucleic Acid Research Positive Occurrence Eukaryotic Linear Motif 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Kundeti, V.K., Rajasekaran, S.: A Statistical Technique to Predict Structural Characteristics of Short Motifs, BECAT Tech. ReportGoogle Scholar
  2. 2.
    Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description Version 3.30 Document Published by the wwPDBGoogle Scholar
  3. 3.
  4. 4.
    Database of protein domains, families and functional sites,
  5. 5.
    Non-redundant databases (NRDB)Google Scholar
  6. 6.
  7. 7.
    Obenauer, J.C., Cantley, L.C., Yaffe, M.B.: Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Research 31(13), 3635–3641 (2003)CrossRefGoogle Scholar
  8. 8.
    Rajasekaran, S., Merlin, J.C., Kundeti, V., Oommen, A., Mi, T., Oommen, A., Vyas, J., Alaniz, I., Chung, K., Chowdhury, F., Deverasatty, S., Irvey, T.M., Lacambacal, D., Lara, D., Panchangam, S., Rathnayake, V., Watts, P., Schiller, M.R.: A computational tool for identifying minimotifs in protein-protein interactions and improving the accuracy of minimotif predictions. Proteins: Structure, Function, and Bioinformatics 79(1), 153–164 (2010)CrossRefGoogle Scholar
  9. 9.
    Rajasekaran, S., Mi, T., Merlin, J.C., Oommen, A., Gradie, P., Schiller, M.R.: Partitioning of minimotifs based on function with improved prediction accuracy. PLoS ONE 5(8), e12276 (2010)Google Scholar
  10. 10.
    Rajasekaran, S., Balla, S., Gradie, P., Gryk, M.R., Kadaveru, K., Kundeti, V., Maciejewski, M.W., Mi, T., Rubino, N., Vyas, J., Schiller, M.R.: Minimotif miner 2nd release: a database and web system for motif search. Nucleic Acids Research 37, D185–D190 (2009)Google Scholar
  11. 11.
    Balla, S., Thapar, V., Verma, S., Luong, T., Faghri, T., Huang, C.-H., Rajasekaran, S., del Campo, J.J., Shinn, J.H., Mohler, W.A., Maciejewski, M.W., Gryk, M.R., Piccirillo, B., Schiller, S.R., Schiller, M.R.: Minimotif Miner, a tool for investigating protein function. Nat. Methods 3, 175–177 (2006) (PMID: 16489333)Google Scholar
  12. 12.
    Via, A., Gould, C.M., Gemünd, C., Gibson, T.J., Helmer-Citterich, M.: A structure filter for the Eukaryotic Linear Motif Resource. BMC Bioinformatics 10, 351 (2009), doi:10.1186/1471-2105-10-351CrossRefGoogle Scholar
  13. 13.
    Sigrist, C.J.A., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.: PROSITE: A documented database using patterns and profiles as motif descriptors. Oxford Journals (2002), doi: 10.1093/bib/3.3.265Google Scholar
  14. 14.
    Neduva, V., Russell, R.B.: DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res. (2006), doi: 10.1093/nar/gkl159Google Scholar
  15. 15.
    Sidman, K.E., George, D.G., Barker, W.C., Hunt, L.T.: The protein identification resource (PIR). Nucleic Acids Research 16(5) (1988)Google Scholar
  16. 16.
    Altschul, S.F., Gish, W., Myers, W.M.E.W., Lipmanl, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sudipta Pathak
    • 1
  • Vamsi Krishna Kundeti
    • 2
  • Martin R. Schiller
    • 3
  • Sanguthevar Rajasekaran
    • 1
  1. 1.Department of Computer ScienceUniversity of ConnecticutUSA
  2. 2.Intel CorporationUSA
  3. 3.School of Life SciencesUniversity of Nevada Las VegasUSA

Personalised recommendations