Removing Artifacts of Approximated Motifs

Federico, Maria; Pisanti, Nadia

doi:10.1007/978-3-642-23208-4_14

Maria Federico^20,21 &
Nadia Pisanti²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6865))

Included in the following conference series:

International Conference on Information Technology in Bio- and Medical Informatics

539 Accesses
1 Citations

Abstract

Frequent patterns (motifs) in biological sequences are good candidates to correspond to structural or functional important elements. The typical output of existing tools for the exhaustive detection of approximated motifs is a long list of motifs containing some real motifs (i.e., patterns representing functional elements) along with a large number of random variations of them, called artifacts. Artifacts increase the output size, often leading to redundant and poorly usable results for biologists. In this paper, we provide a new solution to the problem of separating real motifs from artifacts. We define a notion of motif maximality, called maximality in conservation, which, if applied to the output of existing motif finding tools, allows us to identify and remove artifacts. Their detection is based on the fact that variations of a motif share a large subset of occurrences of the real motif, but the latter is more conserved than any of its artifacts. Experiments show that the tool we implemented according to such definition allows a sensible reduction of the output size removing artifacts with a negligible time cost.

This work was supported in part by MIUR of Italy under project AlgoDEEP prot. 2008TFBWL4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blanchette, M., Sinha, S.: Separating real motifs from their artifacts. In: Proceedings of the 9th Int. Conf. on Intell. Syst. for Mol. Biol., ISMB, Copenhagen, Denmark, July 21-25, pp. 30–38 (2001), Supplement of Bioinformatics
Google Scholar
Federico, M., Pisanti, N.: Suffix tree characterization of maximal motifs in biological sequences. In: BIRD, pp. 456–465 (2008)
Google Scholar
Federico, M., Pisanti, N.: Suffix tree characterization of maximal motifs in biological sequences. Theor. Comput. Sci. 410(43), 4391–4401 (2009)
Article MathSciNet MATH Google Scholar
Grossi, R., Pietracaprina, A., Pisanti, N., Pucci, G., Upfal, E., Vandin, F.: Madmx: A strategy for maximal dense motif extraction. J. of Comput. Biol. 18(4), 535–545 (2011)
Article MathSciNet Google Scholar
Haubler, M.: Motif discovery on promotor sequences. Master’s thesis, Institut fur Informatik and IRISA/INRIA Rennes, Universitat Potsdam, Supervised by Dr. Torsten Schaub and Dr. Jacques Nicolas (2005)
Google Scholar
Kolpakov, R., Kucherov, G.: Finding approximate repetitions under hamming distance. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 170–181. Springer, Heidelberg (2001)
Chapter Google Scholar
Kurtz, S., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: Computation and visualization of degenerate repeats in complete genomes. In: Proceedings of the 8th Int. Conf. on Intell. Syst. for Mol. Biol. (ISMB), pp. 228–238 (2000)
Google Scholar
Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. of Comput. Biol. 7(3-4), 345–362 (2000)
Article Google Scholar
Marsan, L., Sagot, M.-F.: Extracting structured motifs using a suffix tree - algorithms and application to promoter consensus identification. In: RECOMB, pp. 210–219 (2000)
Google Scholar
Parida, L., Rigoutsos, I., Floratos, A., Platt, D.E., Gao, Y.: Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In: SODA, pp. 297–308 (2000)
Google Scholar
Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM Trans. Comput. Biology Bioinform. 2(1), 40–50 (2005)
Article Google Scholar
Soldano, H., Viari, A., Champesme, M.: Searching for flexible repeated patterns using a non-transitive similarity relation. Pattern Recognition Letters 16, 243–246 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria dell’Informazione, Università di Modena e Reggio Emilia, Italy
Maria Federico
Dipartimento di Informatica, Universit`a di Pisa, Italy
Maria Federico & Nadia Pisanti

Authors

Maria Federico
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Pisanti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Ludwig-Maximilians-Universität, Oettingenstrasse 67, 80538, München, Germany
Christian Böhm
Department of Computer Science, San José State University, One Washington Square, 95192-0249, San José, CA, U.S.A.
Sami Khuri
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University, Technicka 2, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Dipartimento di Informatica, Università di Pisa, Italy
Nadia Pisanti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Federico, M., Pisanti, N. (2011). Removing Artifacts of Approximated Motifs. In: Böhm, C., Khuri, S., Lhotská, L., Pisanti, N. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2011. Lecture Notes in Computer Science, vol 6865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23208-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-23208-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23207-7
Online ISBN: 978-3-642-23208-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics