Abstract
This paper addresses the problem of multiple pattern matching for motifs encoded by Position Weight Matrices. We first present an algorithm that uses a multi-index table to preprocess the set of motifs, allowing a dramatically decrease of computation time. We then show how to take benefit from simlar motifs to prevent useless computations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chung, Y.S., Peng, S.L., Tang, C.Y., Yang, J.M.: Finging k-cliques on a k-partite graph. In: 22nd Workshop on Combinatorial Mathematics and Computational Theory (2005)
Claverie, J.M., Audic, S.: The statistical significance of nucleotide position-weight matrix matches. Computer Applications in the Biosciences 12(5), 431–439 (1996)
Elkon, R., Linhart, C., Sharan, R., Shamir, R., Shiloh, Y.: Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Research 13, 773–780 (2003)
Grunert, T., Irnich, S., Zimmermann, H.-J., Schneider, M., Wulfhorst, B.: Cliques in k-partite graphs and their application in textile engineering (2002)
Huang, H., Kao, M.-C.J., Zhou, X., Liu, J.S., Wong, W.H.: Determination of local statistical significance of patterns in markov sequences with application to promoter element identification. J. Comput. Biol. 11(1), 1–14 (2004)
Marinescu, V.D., Kohane, I.S., Riva, A.: The MAPPER database: a multi-genome catalog of putative transcription factor binding sites. Nucleic Acids Research 33, Database issue: D91–D97 (2005)
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research, Database issue: D91-D94 (2004)
Sandelin, A., Wasserman, W.W.: Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. Journal of Molecular Biology 338(2), 207–215 (2004)
Schones, E.D., Sumazin, P., Zhang, M.Q.: Similarity of position frequency matrices for transcription factor binding sites. Bioinformatics 21(3), 307–313 (2005)
Kielbasa, S.M., Gonze, D., Herzel, H.: Measuring similarities between transcription factor binding sites. BMC Bioinformatics 6(237) (2005)
Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. Computer Applications in the Biosciences 5, 89–96 (1989)
Stormo, G.D., Fields, D.S.: Specificity, free energy and information content in protein-DNA interactions. Trends in biochemical sciences 23, 109–113 (1998)
Sui, S.J.H., Mortimer, J.R., Arenillas, D.J., Brumm, J., Walsh, C.J., Kennedy, B.P., Wasserman, W.W.: oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 33(10), 3154–3164 (2005)
Wingender, E., Chen, X., Hehl, R., Karas, I., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Research 28(1), 316–319 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liefooghe, A., Touzet, H., Varré, JS. (2006). Large Scale Matching for Position Weight Matrices. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_36
Download citation
DOI: https://doi.org/10.1007/11780441_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)