Abstract
Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables (1964)
Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., Pochanard, P., Mozes, E., Garraway, L.A., Pe’er, D.: An Integrated Approach to Uncover Drivers of Cancer. Cell 143(6), 1005–1017 (2010)
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21(1-2), 51–80 (1995)
Bailey, T.L.: DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27(12), 1653–1659 (2011)
Dehan, E., Ben-Dor, A., Liao, W., Lipson, D., Frimer, H., Rienstein, S., Simansky, D., Krupsky, M., Yaron, P., Friedman, E., et al.: Chromosomal aberrations and gene expression profiles in non-small cell lung cancer. Lung Cancer 56(2), 175–184 (2007)
Eden, E., Navon, R., Steinfeld, I., Lipson, D., Yakhini, Z.: GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10(1), 48 (2009)
Eden, E., Lipson, D., Yogev, S., Yakhini, Z.: Discovering Motifs in Ranked Lists of DNA Sequences. PLoS Comput. Biol. 3(3), e39 (2007)
Enerly, E., Steinfeld, I., Kleivi, K., Leivonen, S.-K., Ragle-Aure, M., Russnes, H.G., Rønneberg, J.A., Johnsen, H., Navon, R., Rødland, E., et al.: miRNA-mRNA Integrated Analysis Reveals Roles for miRNAs in Primary Breast Tumors. PLoS ONE 6(2), e16915 (2011)
Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano Jr., M., Jungkamp, A.-C., Munschauer, M., et al.: Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP. Cell 141(1), 129–141 (2010)
Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563–577 (1999)
Hogan, D.J., Riordan, D.P., Gerber, A.P., Herschlag, D., Brown, P.O.: Diverse RNA-Binding Proteins Interact with Functionally Related Sets of RNAs, Suggesting an Extensive Regulatory System. PLoS Biol. 6(10), e255 (2008)
Lebedeva, S., Jens, M., Theil, K., Schwanhäusser, B., Selbach, M., Landthaler, M., Rajewsky, N.: Transcriptome-wide Analysis of Regulatory Interactions of the RNA-Binding Protein HuR. Molecular Cell 43(3), 340–352 (2011)
Lee, B.-K., Bhinge, A.A., Iyer, V.R.: Wide-ranging functions of E2F4 in transcriptional activation and repression revealed by genome-wide analysis. Nucleic Acids Research 39(9), 3558–3573 (2011)
Leibovich, L., Yakhini, Z.: Efficient motif search in ranked lists and applications to variable gap motifs. Nucleic Acids Research 40(13), 5832–5847 (2012)
Leibovich, L., Paz, I., Yakhini, Z., Mandel-Gutfreund, Y.: DRIMust: a web server for discovering rank imbalanced motifs using suffix trees. Nucleic Acids Research 41(W1), W174–W179 (2013)
Luehr, S., Hartmann, H., Söding, J.: The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Research 41(W1), W104–W109 (2012)
Plis, S.M., Weisend, M.P., Damaraju, E., Eichele, T., Mayer, A., Clark, V.P., Lane, T., Calhoun, V.D.: Effective connectivity analysis of fMRI and MEG data collected under identical paradigms. Computers in Biology and Medicine 41(12), 1156–1165 (2011)
Ragle-Aure, M., Steinfeld, I., Baumbusch, L.O., Liestøl, K., Lipson, D., Nyberg, S., Naume, B., Sahlberg, K.K., Kristensen, V.N., Børresen-Dale, A.-L., et al.: Identifying In-Trans Process Associated Genes in Breast Cancer by Integrated Analysis of Copy Number and Expression Data. PLoS ONE 8(1), e53014 (2013)
Rhee, H.S., Pugh, B.F.: Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution. Cell 147(6), 1408–1419 (2011)
Al-Shahrour, F., Díaz-Uriarte, R., Dopazo, J.: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20(4), 578–580 (2004)
Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454-e463 (2006)
Smeenk, L., van Heeringen, S.J., Koeppel, M., van Driel, M.A., Bartels, S.J.J., Akkers, R.C., Denissov, S., Stunnenberg, H.G., Lohrum, M.: Characterization of genome-wide p53-binding sites upon stress response. Nucleic Acids Research 36(11), 3639–3654 (2008)
Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12(1 Part 2), 505–519 (1984)
Steinfeld, I., Navon, R., Ach, R., Yakhini, Z.: miRNA target enrichment analysis reveals directly active miRNAs in health and disease. Nucleic Acids Research 41(3), e45–e45 (2013)
Steinfeld, I., Navon, R., Ardigò, D., Zavaroni, I., Yakhini, Z.: Clinically driven semi-supervised class discovery in gene expression data. Bioinformatics 24(16), i90–i97 (2008)
Stormo, G.D., Schneider, T.D., Gold, L.: Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Research 14(16), 6661–6679 (1986)
Straussman, R., Nejman, D., Roberts, D., Steinfeld, I., Blum, B., Benvenisty, N., Simon, I., Yakhini, Z., Cedar, H.: Developmental programming of CpG island methylation profiles in the human genome. Nat. Struct. Mol. Biol. 16(5), 564–571 (2009)
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43), 15545–15550 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leibovich, L., Yakhini, Z. (2013). Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-40453-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40452-8
Online ISBN: 978-3-642-40453-5
eBook Packages: Computer ScienceComputer Science (R0)