Skip to main content

Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs

  • Conference paper
Book cover Algorithms in Bioinformatics (WABI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8126))

Included in the following conference series:

Abstract

Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables (1964)

    Google Scholar 

  2. Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., Pochanard, P., Mozes, E., Garraway, L.A., Pe’er, D.: An Integrated Approach to Uncover Drivers of Cancer. Cell 143(6), 1005–1017 (2010)

    Article  Google Scholar 

  3. Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21(1-2), 51–80 (1995)

    Article  Google Scholar 

  4. Bailey, T.L.: DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27(12), 1653–1659 (2011)

    Article  Google Scholar 

  5. Dehan, E., Ben-Dor, A., Liao, W., Lipson, D., Frimer, H., Rienstein, S., Simansky, D., Krupsky, M., Yaron, P., Friedman, E., et al.: Chromosomal aberrations and gene expression profiles in non-small cell lung cancer. Lung Cancer 56(2), 175–184 (2007)

    Article  Google Scholar 

  6. Eden, E., Navon, R., Steinfeld, I., Lipson, D., Yakhini, Z.: GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10(1), 48 (2009)

    Article  Google Scholar 

  7. Eden, E., Lipson, D., Yogev, S., Yakhini, Z.: Discovering Motifs in Ranked Lists of DNA Sequences. PLoS Comput. Biol. 3(3), e39 (2007)

    Google Scholar 

  8. Enerly, E., Steinfeld, I., Kleivi, K., Leivonen, S.-K., Ragle-Aure, M., Russnes, H.G., Rønneberg, J.A., Johnsen, H., Navon, R., Rødland, E., et al.: miRNA-mRNA Integrated Analysis Reveals Roles for miRNAs in Primary Breast Tumors. PLoS ONE 6(2), e16915 (2011)

    Article  Google Scholar 

  9. Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano Jr., M., Jungkamp, A.-C., Munschauer, M., et al.: Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP. Cell 141(1), 129–141 (2010)

    Article  Google Scholar 

  10. Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)

    Article  Google Scholar 

  11. Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563–577 (1999)

    Article  Google Scholar 

  12. Hogan, D.J., Riordan, D.P., Gerber, A.P., Herschlag, D., Brown, P.O.: Diverse RNA-Binding Proteins Interact with Functionally Related Sets of RNAs, Suggesting an Extensive Regulatory System. PLoS Biol. 6(10), e255 (2008)

    Google Scholar 

  13. Lebedeva, S., Jens, M., Theil, K., Schwanhäusser, B., Selbach, M., Landthaler, M., Rajewsky, N.: Transcriptome-wide Analysis of Regulatory Interactions of the RNA-Binding Protein HuR. Molecular Cell 43(3), 340–352 (2011)

    Article  Google Scholar 

  14. Lee, B.-K., Bhinge, A.A., Iyer, V.R.: Wide-ranging functions of E2F4 in transcriptional activation and repression revealed by genome-wide analysis. Nucleic Acids Research 39(9), 3558–3573 (2011)

    Article  Google Scholar 

  15. Leibovich, L., Yakhini, Z.: Efficient motif search in ranked lists and applications to variable gap motifs. Nucleic Acids Research 40(13), 5832–5847 (2012)

    Article  Google Scholar 

  16. Leibovich, L., Paz, I., Yakhini, Z., Mandel-Gutfreund, Y.: DRIMust: a web server for discovering rank imbalanced motifs using suffix trees. Nucleic Acids Research 41(W1), W174–W179 (2013)

    Article  Google Scholar 

  17. Luehr, S., Hartmann, H., Söding, J.: The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Research 41(W1), W104–W109 (2012)

    Google Scholar 

  18. Plis, S.M., Weisend, M.P., Damaraju, E., Eichele, T., Mayer, A., Clark, V.P., Lane, T., Calhoun, V.D.: Effective connectivity analysis of fMRI and MEG data collected under identical paradigms. Computers in Biology and Medicine 41(12), 1156–1165 (2011)

    Article  Google Scholar 

  19. Ragle-Aure, M., Steinfeld, I., Baumbusch, L.O., Liestøl, K., Lipson, D., Nyberg, S., Naume, B., Sahlberg, K.K., Kristensen, V.N., Børresen-Dale, A.-L., et al.: Identifying In-Trans Process Associated Genes in Breast Cancer by Integrated Analysis of Copy Number and Expression Data. PLoS ONE 8(1), e53014 (2013)

    Google Scholar 

  20. Rhee, H.S., Pugh, B.F.: Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution. Cell 147(6), 1408–1419 (2011)

    Article  Google Scholar 

  21. Al-Shahrour, F., Díaz-Uriarte, R., Dopazo, J.: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20(4), 578–580 (2004)

    Article  Google Scholar 

  22. Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14), e454-e463 (2006)

    Google Scholar 

  23. Smeenk, L., van Heeringen, S.J., Koeppel, M., van Driel, M.A., Bartels, S.J.J., Akkers, R.C., Denissov, S., Stunnenberg, H.G., Lohrum, M.: Characterization of genome-wide p53-binding sites upon stress response. Nucleic Acids Research 36(11), 3639–3654 (2008)

    Article  Google Scholar 

  24. Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12(1 Part 2), 505–519 (1984)

    Google Scholar 

  25. Steinfeld, I., Navon, R., Ach, R., Yakhini, Z.: miRNA target enrichment analysis reveals directly active miRNAs in health and disease. Nucleic Acids Research 41(3), e45–e45 (2013)

    Google Scholar 

  26. Steinfeld, I., Navon, R., Ardigò, D., Zavaroni, I., Yakhini, Z.: Clinically driven semi-supervised class discovery in gene expression data. Bioinformatics 24(16), i90–i97 (2008)

    Google Scholar 

  27. Stormo, G.D., Schneider, T.D., Gold, L.: Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Research 14(16), 6661–6679 (1986)

    Article  Google Scholar 

  28. Straussman, R., Nejman, D., Roberts, D., Steinfeld, I., Blum, B., Benvenisty, N., Simon, I., Yakhini, Z., Cedar, H.: Developmental programming of CpG island methylation profiles in the human genome. Nat. Struct. Mol. Biol. 16(5), 564–571 (2009)

    Article  Google Scholar 

  29. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43), 15545–15550 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leibovich, L., Yakhini, Z. (2013). Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40453-5_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40452-8

  • Online ISBN: 978-3-642-40453-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics