MADMX: A Novel Strategy for Maximal Dense Motif Extraction

Grossi, Roberto; Pietracaprina, Andrea; Pisanti, Nadia; Pucci, Geppino; Upfal, Eli; Vandin, Fabio

doi:10.1007/978-3-642-04241-6_30

Roberto Grossi²¹,
Andrea Pietracaprina²²,
Nadia Pisanti²¹,
Geppino Pucci²²,
Eli Upfal²³ &
…
Fabio Vandin²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5724))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

781 Accesses
4 Citations

Abstract

We develop, analyze and experiment with a new tool, called madmx, which extracts frequent motifs, possibly including don’t care characters, from biological sequences. We introduce density, a simple and flexible measure for bounding the number of don’t cares in a motif, defined as the ratio of solid (i.e., different from don’t care) characters to the total length of the motif. By extracting only maximal dense motifs, madmx reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by madmx.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of 20th VLDB, pp. 487–499 (1994)
Google Scholar
Apostolico, A., Comin, M., Parida, L.: VARUN: discovering extensible motifs under saturation constraints. IEEE Trans. on Computational Biology and Bioinformatics (to appear, 2009)
Google Scholar
Apostolico, A., Parida, L.: Incremental paradigms of motif discovery. Journal of Computational Biology 11(1), 15–25 (2004)
Article CAS PubMed Google Scholar
Apostolico, A., Tagliacollo, C.: Optimal offline extraction of irredundant motif bases. In: Lin, G. (ed.) COCOON 2007. LNCS, vol. 4598, pp. 360–371. Springer, Heidelberg (2007)
Chapter Google Scholar
Apostolico, A., Tagliacollo, C.: Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n ² log n) time. Theoretical Computer Science 408(2-3), 106–115 (2008)
Article Google Scholar
Arimura, H., Uno, T.: Mining maximal flexible patterns in a sequence. In: Satoh, K., Inokuchi, A., Nagao, K., Kawamura, T. (eds.) JSAI 2007. LNCS (LNAI), vol. 4914, pp. 307–317. Springer, Heidelberg (2008)
Chapter Google Scholar
Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohani, O., Walichiewicz, J.: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005)
Article CAS PubMed Google Scholar
Morris, M., Nicolas, F., Ukkonen, E.: On the complexity of finding gapped motifs. CoRR, abs/0802.0314 (2008)
Google Scholar
Parida, L.: Some results on flexible-pattern discovery. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 33–45. Springer, Heidelberg (2000)
Chapter Google Scholar
Parida, L.: Pattern discovery in bioinformatics. Mathematical and Computational Biology Series. Chapman & Hall / CRC, Boca Raton (2008)
Google Scholar
Pisanti, N.: Segment-based distances and similarities in genomic sequences. PhD thesis, University of Pisa, Italy (2002)
Google Scholar
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.F.: Bases of motifs for generating repeated patterns with wild cards. IEEE Trans. on Computational Biology and Bioinformatics 2(1), 40–50 (2005)
Article CAS PubMed Google Scholar
Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)
Article CAS PubMed Google Scholar
Saha, S., Bridges, S., Magbanua, Z.V., Peterson, D.G.: Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res. 36(7), 2284–2294 (2008)
Article CAS PubMed PubMed Central Google Scholar
Smit, A.F.A., Hubley, R., Green, P.: RepeatMasker Open-3.0. 1996–2004, http://www.repeatmasker.org
Ukkonen, E.: Structural analysis of gapped motifs of a string. In: Kučera, L., Kučera, A. (eds.) MFCS 2007. LNCS, vol. 4708, pp. 681–690. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Pisa, Pisa, Italy
Roberto Grossi & Nadia Pisanti
Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padova, Italy
Andrea Pietracaprina, Geppino Pucci & Fabio Vandin
Department of Computer Science, Brown University, Providence RI, USA
Eli Upfal

Authors

Roberto Grossi
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Pietracaprina
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Pisanti
View author publications
You can also search for this author in PubMed Google Scholar
Geppino Pucci
View author publications
You can also search for this author in PubMed Google Scholar
Eli Upfal
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Vandin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Bioinformatics and Computational Biology, and Department of Computer Science, University of Maryland, MD, College Park, USA
Steven L. Salzberg
Department of Computer Sciences, The University of Texas at Austin, TX, USA
Tandy Warnow

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grossi, R., Pietracaprina, A., Pisanti, N., Pucci, G., Upfal, E., Vandin, F. (2009). MADMX: A Novel Strategy for Maximal Dense Motif Extraction. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-04241-6_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics