Fast Significant Matches of Position Weight Matrices Based on Diamond Sampling

Gao, Liang-xin; Zhang, Hong-bo; Zhu, Lin

doi:10.1007/978-3-319-63312-1_19

Fast Significant Matches of Position Weight Matrices Based on Diamond Sampling

Liang-xin Gao¹⁶,
Hong-bo Zhang¹⁶ &
Lin Zhu¹⁶

Conference paper
First Online: 20 July 2017

2109 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10362))

Abstract

Position weight matrices are important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present techniques for increasing the speed of sequence analysis using position weight matrices. Our techniques also permit the user to specify a p threshold to indicate the desired trade-off between sensitivity and speed for a particular sequence analysis. The resulting increase in speed should allow our algorithm to be used more widely in searching with large-scale sequence and annotation projects.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gribskov, M., Mclachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84(13), 8 (1987)
Article Google Scholar
Zhu, L., Guo, W.L., Deng, S.P., Huang, D.S.: ChIP-PIT: enhancing the analysis of ChIP-Seq data using convex-relaxed pair-wise interaction tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(1), 55–63 (2016)
Article Google Scholar
Stormo, G.D., Schneider, T.D., Gold, L.M., Ehrenfeucht, A.: Use of the “perceptron” algorithm to distinguish translational initiation sites in E. Coli. Nucleic Acid Res. 10, 299–3012 (1982)
Google Scholar
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-margoulils, O.V., Kloos, D.U., Land, S., Lewicki-potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374–378 (2003)
Article Google Scholar
Wu, T.D., Neville-manning, C.G., Brutlag, D.L.: Fast probabilistic analysis of sequence function using scoring matrices. Bioinformatics 16(3), 233–244 (2000)
Article Google Scholar
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lanhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004)
Article Google Scholar
Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124, 47–59 (2006)
Article Google Scholar
Dorohonceanu, B., Neville-Manning, C.G.: Accelerating protein classification using suffix trees. In: Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 12–133 (2000)
Google Scholar
Schones, D.E., Smith, A.D., Zhang, M.Q.: Statistical significance of cis-regulatory modules. BMC Bioinform. 8, 19 (2007)
Article Google Scholar
Beckstette, M., Strothmann, D., Homann, R., Giegerich, R., Kurtz, S.: PoSSuMsearch: fast and sensitive matching of position specific scoring matrices using enhanced suffix arrays. In: Proceedings of German Conference on Bioinformatics, pp. 53–64 (2004)
Google Scholar
Ballard, G., Seshadhri, C.: Diamond sampling for approximate maximum all-pairs dot-product (MAD) search. In: IEEE International Conference on Data Mining, pp. 11–20 (2015)
Google Scholar
Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5(2), 89–96 (1989)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. 379–423, 623–656 (1948)
Article MathSciNet MATH Google Scholar
Pierce, J.R.: An Introduction to Information Theory: Symbols, Signals and Noise. Dover Publications, New York (1980)
MATH Google Scholar
Zhu, L., Deng, S.P., Huang, D.S.: A two stage geometric method for pruning unreliable links in protein-protein networks. IEEE Trans. Nanobiosci. 14(5), 528–534 (2015)
Article Google Scholar
Zhu, L., You, Z.H., Huang, D.S., Wang, B.: t-LSE: a novel robust geometric approach for modeling protein-protein interaction networks. PLoS ONE 8(4), e58368 (2013). doi:10.1371/journal.pone.0058368
Article Google Scholar
Papp, P.P., Chattoraj, D.K., Schneider, T.D.: Information analysis of sequences that bind the replication initiator RepA. J. Mol. Biol. 233, 219–230 (1993)
Article Google Scholar
Schneider, T.D.: Protein patterns as shown by sequence logos. In: Visual Cues-Practical Data Visualization, p. 64. IEEE Press, Piscataway (1993)
Google Scholar
Pietrokovski, S.: Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucl. Acids Res. 24, 3836–3845 (1996)
Article Google Scholar
Blom, N., Hansen, J., Blaas, D., Brunak, S.: Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks. Protein Sci. 5, 2203–2216 (1996)
Article Google Scholar
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7), 56–577 (1999)
Article Google Scholar
Attwood, T.K., Beck, M.E.: PRINT—a protein motif finger-print database. Protein Eng. 7(7), 84–848 (1994)
Article Google Scholar
Deng, S.-P., Zhu, L., Huang, D.S.: Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks. BMC Genom. 16(Suppl 3), S4 (2015)
Article Google Scholar
Deng, S.-P., Huang, D.S.: SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method. Methods 69(3), 207–212 (2014)
Article Google Scholar
Huang, D.S., Zhang, L., Han, K., Deng, S., Yang, K., Zhang, H.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15(6), 553–560 (2014)
Article Google Scholar
Wang, B., Huang, D.S., Jiang, C.: A new strategy for protein interface identification using manifold learning method. IEEE Trans. Nanobiosci. 13(2), 118–123 (2014)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the grants of the National Science Foundation of China, Nos. 61672203, 61402334, 61472282, 61520106006, 31571364, U1611265, 61472280, 61532008, 61472173, 61572447, 61373098 and 61672382, China Postdoctoral Science Foundation Grant, Nos. 2016M601646. De-Shuang Huang is the corresponding author of this paper.

Author information

Authors and Affiliations

Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Caoan Road 4800, Shanghai, 201804, China
Liang-xin Gao, Hong-bo Zhang & Lin Zhu

Authors

Liang-xin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Hong-bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang-xin Gao .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Juan Carlos Figueroa-García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Lx., Zhang, Hb., Zhu, L. (2017). Fast Significant Matches of Position Weight Matrices Based on Diamond Sampling. In: Huang, DS., Jo, KH., Figueroa-García, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-63312-1_19
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics