Skip to main content

Removing Artifacts of Approximated Motifs

  • Conference paper
Book cover Information Technology in Bio- and Medical Informatics (ITBAM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6865))

Abstract

Frequent patterns (motifs) in biological sequences are good candidates to correspond to structural or functional important elements. The typical output of existing tools for the exhaustive detection of approximated motifs is a long list of motifs containing some real motifs (i.e., patterns representing functional elements) along with a large number of random variations of them, called artifacts. Artifacts increase the output size, often leading to redundant and poorly usable results for biologists. In this paper, we provide a new solution to the problem of separating real motifs from artifacts. We define a notion of motif maximality, called maximality in conservation, which, if applied to the output of existing motif finding tools, allows us to identify and remove artifacts. Their detection is based on the fact that variations of a motif share a large subset of occurrences of the real motif, but the latter is more conserved than any of its artifacts. Experiments show that the tool we implemented according to such definition allows a sensible reduction of the output size removing artifacts with a negligible time cost.

This work was supported in part by MIUR of Italy under project AlgoDEEP prot. 2008TFBWL4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blanchette, M., Sinha, S.: Separating real motifs from their artifacts. In: Proceedings of the 9th Int. Conf. on Intell. Syst. for Mol. Biol., ISMB, Copenhagen, Denmark, July 21-25, pp. 30–38 (2001), Supplement of Bioinformatics

    Google Scholar 

  2. Federico, M., Pisanti, N.: Suffix tree characterization of maximal motifs in biological sequences. In: BIRD, pp. 456–465 (2008)

    Google Scholar 

  3. Federico, M., Pisanti, N.: Suffix tree characterization of maximal motifs in biological sequences. Theor. Comput. Sci. 410(43), 4391–4401 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Grossi, R., Pietracaprina, A., Pisanti, N., Pucci, G., Upfal, E., Vandin, F.: Madmx: A strategy for maximal dense motif extraction. J. of Comput. Biol. 18(4), 535–545 (2011)

    Article  MathSciNet  Google Scholar 

  5. Haubler, M.: Motif discovery on promotor sequences. Master’s thesis, Institut fur Informatik and IRISA/INRIA Rennes, Universitat Potsdam, Supervised by Dr. Torsten Schaub and Dr. Jacques Nicolas (2005)

    Google Scholar 

  6. Kolpakov, R., Kucherov, G.: Finding approximate repetitions under hamming distance. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 170–181. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Kurtz, S., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: Computation and visualization of degenerate repeats in complete genomes. In: Proceedings of the 8th Int. Conf. on Intell. Syst. for Mol. Biol. (ISMB), pp. 228–238 (2000)

    Google Scholar 

  8. Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. of Comput. Biol. 7(3-4), 345–362 (2000)

    Article  Google Scholar 

  9. Marsan, L., Sagot, M.-F.: Extracting structured motifs using a suffix tree - algorithms and application to promoter consensus identification. In: RECOMB, pp. 210–219 (2000)

    Google Scholar 

  10. Parida, L., Rigoutsos, I., Floratos, A., Platt, D.E., Gao, Y.: Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In: SODA, pp. 297–308 (2000)

    Google Scholar 

  11. Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM Trans. Comput. Biology Bioinform. 2(1), 40–50 (2005)

    Article  Google Scholar 

  12. Soldano, H., Viari, A., Champesme, M.: Searching for flexible repeated patterns using a non-transitive similarity relation. Pattern Recognition Letters 16, 243–246 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Federico, M., Pisanti, N. (2011). Removing Artifacts of Approximated Motifs. In: Böhm, C., Khuri, S., Lhotská, L., Pisanti, N. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2011. Lecture Notes in Computer Science, vol 6865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23208-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23208-4_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23207-7

  • Online ISBN: 978-3-642-23208-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics