Abstract
Position weight matrices are important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present techniques for increasing the speed of sequence analysis using position weight matrices. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow our algorithm to be used more widely in searching with large-scale sequence and annotation projects.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gribskov, M., Mclachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84(13), 8 (1987)
Zhu, L., Guo, W.L., Deng, S.P., Huang, D.S.: ChIP-PIT: enhancing the analysis of ChIP-Seq data using convex-relaxed pair-wise interaction tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(1), 55–63 (2016)
Stormo, G.D., Schneider, T.D., Gold, L.M., Ehrenfeucht, A.: Use of the “perceptron” algorithm to distinguish translational initiation sites in E. Coli. Nucleic Acid Res. 10, 299–3012 (1982)
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-margoulils, O.V., Kloos, D.U., Land, S., Lewicki-potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374–378 (2003)
Wu, T.D., Neville-manning, C.G., Brutlag, D.L.: Fast probabilistic analysis of sequence function using scoring matrices. Bioinformatics 16(3), 233–244 (2000)
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lanhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004)
Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)
Dorohonceanu, B., Neville-Manning, C.G.: Accelerating protein classification using suffix trees. In: Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 12–133 (2000)
Schones, D.E., Smith, A.D., Zhang, M.Q.: Statistical significance of cis-regulatory modules. BMC Bioinform. 8, 19 (2007)
Beckstette, M., Strothmann, D., Homann, R., Giegerich, R., Kurtz, S.: PoSSuMsearch: fast and sensitive matching of position specific scoring matrices using enhanced suffix arrays. In: Proceedings of German Conference on Bioinformatics, pp. 53–64 (2004)
Ballard, G., Seshadhri, C.: Diamond sampling for approximate maximum all-pairs dot-product (MAD) search. In: IEEE International Conference on Data Mining, pp. 11–20 (2015)
Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5(2), 89–96 (1989)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. 379–423, 623–656 (1948)
Pierce, J.R.: An Introduction to Information Theory: Symbols, Signals and Noise. Dover Publications, New York (1980)
Zhu, L., Deng, S.P., Huang, D.S.: A two stage geometric method for pruning unreliable links in protein-protein networks. IEEE Trans. Nanobiosci. 14(5), 528–534 (2015)
Zhu, L., You, Z.H., Huang, D.S., Wang, B.: t-LSE: a novel robust geometric approach for modeling protein-protein interaction networks. PLoS ONE 8(4), e58368 (2013). doi:10.1371/journal.pone.0058368
Papp, P.P., Chattoraj, D.K., Schneider, T.D.: Information analysis of sequences that bind the replication initiator RepA. J. Mol. Biol. 233, 219–230 (1993)
Schneider, T.D.: Protein patterns as shown by sequence logos. In: Visual Cues-Practical Data Visualization, p. 64. IEEE Press, Piscataway (1993)
Pietrokovski, S.: Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucl. Acids Res. 24, 3836–3845 (1996)
Blom, N., Hansen, J., Blaas, D., Brunak, S.: Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks. Protein Sci. 5, 2203–2216 (1996)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7), 56–577 (1999)
Attwood, T.K., Beck, M.E.: PRINT—a protein motif finger-print database. Protein Eng. 7(7), 84–848 (1994)
Deng, S.-P., Zhu, L., Huang, D.S.: Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks. BMC Genom. 16(Suppl 3), S4 (2015)
Deng, S.-P., Huang, D.S.: SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method. Methods 69(3), 207–212 (2014)
Huang, D.S., Zhang, L., Han, K., Deng, S., Yang, K., Zhang, H.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15(6), 553–560 (2014)
Wang, B., Huang, D.S., Jiang, C.: A new strategy for protein interface identification using manifold learning method. IEEE Trans. Nanobiosci. 13(2), 118–123 (2014)
Acknowledgments
This work was supported by the grants of the National Science Foundation of China, Nos. 61672203, 61402334, 61472282, 61520106006, 31571364, U1611265, 61472280, 61532008, 61472173, 61572447, 61373098 and 61672382, China Postdoctoral Science Foundation Grant, Nos. 2016M601646. De-Shuang Huang is the corresponding author of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Gao, Lx., Zhang, Hb., Zhu, L. (2017). Fast Significant Matches of Position Weight Matrices Based on Diamond Sampling. In: Huang, DS., Jo, KH., Figueroa-García, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-63312-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)