Skip to main content

Fast Significant Matches of Position Weight Matrices Based on Diamond Sampling

  • Conference paper
  • First Online:
  • 2109 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10362))

Abstract

Position weight matrices are important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present techniques for increasing the speed of sequence analysis using position weight matrices. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow our algorithm to be used more widely in searching with large-scale sequence and annotation projects.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Gribskov, M., Mclachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84(13), 8 (1987)

    Article  Google Scholar 

  2. Zhu, L., Guo, W.L., Deng, S.P., Huang, D.S.: ChIP-PIT: enhancing the analysis of ChIP-Seq data using convex-relaxed pair-wise interaction tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(1), 55–63 (2016)

    Article  Google Scholar 

  3. Stormo, G.D., Schneider, T.D., Gold, L.M., Ehrenfeucht, A.: Use of the “perceptron” algorithm to distinguish translational initiation sites in E. Coli. Nucleic Acid Res. 10, 299–3012 (1982)

    Google Scholar 

  4. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-margoulils, O.V., Kloos, D.U., Land, S., Lewicki-potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374–378 (2003)

    Article  Google Scholar 

  5. Wu, T.D., Neville-manning, C.G., Brutlag, D.L.: Fast probabilistic analysis of sequence function using scoring matrices. Bioinformatics 16(3), 233–244 (2000)

    Article  Google Scholar 

  6. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lanhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004)

    Article  Google Scholar 

  7. Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)

    Article  Google Scholar 

  8. Dorohonceanu, B., Neville-Manning, C.G.: Accelerating protein classification using suffix trees. In: Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 12–133 (2000)

    Google Scholar 

  9. Schones, D.E., Smith, A.D., Zhang, M.Q.: Statistical significance of cis-regulatory modules. BMC Bioinform. 8, 19 (2007)

    Article  Google Scholar 

  10. Beckstette, M., Strothmann, D., Homann, R., Giegerich, R., Kurtz, S.: PoSSuMsearch: fast and sensitive matching of position specific scoring matrices using enhanced suffix arrays. In: Proceedings of German Conference on Bioinformatics, pp. 53–64 (2004)

    Google Scholar 

  11. Ballard, G., Seshadhri, C.: Diamond sampling for approximate maximum all-pairs dot-product (MAD) search. In: IEEE International Conference on Data Mining, pp. 11–20 (2015)

    Google Scholar 

  12. Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5(2), 89–96 (1989)

    Google Scholar 

  13. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. 379–423, 623–656 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  14. Pierce, J.R.: An Introduction to Information Theory: Symbols, Signals and Noise. Dover Publications, New York (1980)

    MATH  Google Scholar 

  15. Zhu, L., Deng, S.P., Huang, D.S.: A two stage geometric method for pruning unreliable links in protein-protein networks. IEEE Trans. Nanobiosci. 14(5), 528–534 (2015)

    Article  Google Scholar 

  16. Zhu, L., You, Z.H., Huang, D.S., Wang, B.: t-LSE: a novel robust geometric approach for modeling protein-protein interaction networks. PLoS ONE 8(4), e58368 (2013). doi:10.1371/journal.pone.0058368

    Article  Google Scholar 

  17. Papp, P.P., Chattoraj, D.K., Schneider, T.D.: Information analysis of sequences that bind the replication initiator RepA. J. Mol. Biol. 233, 219–230 (1993)

    Article  Google Scholar 

  18. Schneider, T.D.: Protein patterns as shown by sequence logos. In: Visual Cues-Practical Data Visualization, p. 64. IEEE Press, Piscataway (1993)

    Google Scholar 

  19. Pietrokovski, S.: Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucl. Acids Res. 24, 3836–3845 (1996)

    Article  Google Scholar 

  20. Blom, N., Hansen, J., Blaas, D., Brunak, S.: Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks. Protein Sci. 5, 2203–2216 (1996)

    Article  Google Scholar 

  21. Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7), 56–577 (1999)

    Article  Google Scholar 

  22. Attwood, T.K., Beck, M.E.: PRINT—a protein motif finger-print database. Protein Eng. 7(7), 84–848 (1994)

    Article  Google Scholar 

  23. Deng, S.-P., Zhu, L., Huang, D.S.: Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks. BMC Genom. 16(Suppl 3), S4 (2015)

    Article  Google Scholar 

  24. Deng, S.-P., Huang, D.S.: SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method. Methods 69(3), 207–212 (2014)

    Article  Google Scholar 

  25. Huang, D.S., Zhang, L., Han, K., Deng, S., Yang, K., Zhang, H.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15(6), 553–560 (2014)

    Article  Google Scholar 

  26. Wang, B., Huang, D.S., Jiang, C.: A new strategy for protein interface identification using manifold learning method. IEEE Trans. Nanobiosci. 13(2), 118–123 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the grants of the National Science Foundation of China, Nos. 61672203, 61402334, 61472282, 61520106006, 31571364, U1611265, 61472280, 61532008, 61472173, 61572447, 61373098 and 61672382, China Postdoctoral Science Foundation Grant, Nos. 2016M601646. De-Shuang Huang is the corresponding author of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang-xin Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gao, Lx., Zhang, Hb., Zhu, L. (2017). Fast Significant Matches of Position Weight Matrices Based on Diamond Sampling. In: Huang, DS., Jo, KH., Figueroa-García, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63312-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63311-4

  • Online ISBN: 978-3-319-63312-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics