Skip to main content

Building Phylogeny with Minimal Absent Words

  • Conference paper
Implementation and Application of Automata (CIAA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6807))

Included in the following conference series:

Abstract

An absent word in a sequence is a segment that does not occur in the given sequence. It is a minimal absent word if all its proper factors occur in the given sequence.

In this paper, we review the concept of minimal absent words, which includes the notion of shortest absent words but is much stronger. We present an efficient method for computing the minimal absent words of bounded length for DNA sequence using a Suffix Trie of bounded depth, representing bounded length factors. This method outputs the whole set of minimal absent words and furthermore our technique provides a linear-time algorithm with less memory usage than previous solutions.

We also present an approach to distinguish sequences of different organisms using their minimal absent words. Our solution applies a length-weighted index to discriminate sequences and the results show that we can build phylogenetic tree based on the collected information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acquisti, C., Poste, G., Curtiss, D., Kumar, S.: Nullomers: really a matter of natural selection? PLoS ONE. 10 (2007)

    Google Scholar 

  2. Béal, M.P., Crochemore, M., Mignosi, F., Restivo, A., Sciortino, M.: Forbidden words of regular languages. Fundamenta Informaticae 56, 121–135 (2003)

    MathSciNet  MATH  Google Scholar 

  3. Béal, M.P., Mignosi, F., Restivo, A.: Minimal Forbidden Words and Symbolic Dynamics. In: Puech, C., Reischuk, R. (eds.) STACS 1996. LNCS, vol. 1046, pp. 555–566. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  4. Böckenhauer, H.J., Bongartz, D.: Algorithmic Aspects of Bioinformatics. Springer, Berlin (2007)

    MATH  Google Scholar 

  5. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)

    Book  MATH  Google Scholar 

  6. Crochemore, M., Mignosi, F., Restivo, A.: Automata and Forbidden Words. Information Processing Letters 67, 111–117 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  7. Crochemore, M., Mignosi, F., Restivo, A., Salemi, S.: Data compression using antidictonaries. Proceedings of the IEEE 88, 1756–1768 (2000)

    Article  Google Scholar 

  8. Hampikian, G., Andersen, T.: Absent sequences: Nullomers and primes. In: Pacific Symposium on Biocomputing, vol. 12, pp. 355–366 (2007)

    Google Scholar 

  9. Herold, J., Kurtz, S., Giegerich, R.: Efficient computation of absent words in genomic sequences. BMC Bioinformatics 9 (2008)

    Google Scholar 

  10. Liu, N., Wang, T.M.: A relative similarity measure for the similarity analysis of DNA sequences. Chemical Physics Letters 408, 307–311 (2005)

    Article  Google Scholar 

  11. Pinho, A.J., Ferreira, P.J., Garcia, S.P., Rodrigues, J.M.: On finding minimal absent words. BMC Bioinformatics 10 (2009)

    Google Scholar 

  12. Polanski, A., Kimmel, M.: Bioinformatics. Springer, Berlin (2007)

    MATH  Google Scholar 

  13. Rosenberg, M.S.: Sequence Alignment: Methods, Models,Concepts, and Strategies. University of California Press, California (2009)

    Google Scholar 

  14. Sung, W.K.: Algorithms in Bioinformatics: a practical intoduction. CRC Press, New York (2009)

    Google Scholar 

  15. Wu, Z.D., Jiang, T., Su, W.J.: Efficient computation of shortest absent words in a genomic sequence. Information Processing Letters 110, 596–601 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chairungsee, S., Crochemore, M. (2011). Building Phylogeny with Minimal Absent Words. In: Bouchou-Markhoff, B., Caron, P., Champarnaud, JM., Maurel, D. (eds) Implementation and Application of Automata. CIAA 2011. Lecture Notes in Computer Science, vol 6807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22256-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22256-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22255-9

  • Online ISBN: 978-3-642-22256-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics