Skip to main content

Parallelising the Computation of Minimal Absent Words

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9574))

Abstract

An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an \(\mathcal {O}(n)\)-time and \(\mathcal {O}(n)\)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix array (Barton et al., 2014). An implementation of this algorithm was also provided by the authors and is currently the fastest available. In this article, we present a new \(\mathcal {O}(n)\)-time and \(\mathcal {O}(n)\)-space algorithm for computing all minimal absent words; it has the desirable property that, given the indexing data structure at hand, the computation of minimal absent words can be executed in parallel. Experimental results show that a multiprocessing implementation of this algorithm can accelerate the overall computation by more than a factor of two compared to state-of-the-art approaches. By excluding the indexing data structure construction time, we show that the implementation achieves near-optimal speed-ups.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abboud, A., Williams, V.V., Weimann, O.: Consequences of faster alignment of sequences. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP 2014. LNCS, vol. 8572, pp. 39–51. Springer, Heidelberg (2014)

    Google Scholar 

  2. Acquisti, C., Poste, G., Curtiss, D., Kumar, S.: Nullomers: really a matter of natural selection? PLoS One 2(10), e1022 (2007)

    Article  Google Scholar 

  3. Barton, C., Heliou, A., Mouchard, L., Pissis, S.P.: Linear-time computation of minimal absent words using suffix array. BMC Bioinform. 15, 388 (2014)

    Article  Google Scholar 

  4. Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional Burrows-Wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theoret. Comput. Sci. 450, 109–116 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Crochemore, M., Mignosi, F., Restivo, A.: Automata and forbidden words. Inf. Process. Lett. 67, 111–117 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  7. Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Garcia, S.P., Pinho, A.J.: Minimal absent words in four human genome assemblies. PLoS One 6(12), e29344 (2011)

    Article  Google Scholar 

  9. Garcia, S.P., Pinho, O.J., Rodrigues, J., Bastos, C.A.C., Ferreira, G.P.J.S.: Minimal absent words in prokaryotic and eukaryotic genomes. PLoS One 6, e16065 (2011)

    Article  Google Scholar 

  10. Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinform. 6, 123 (2005)

    Article  Google Scholar 

  11. Jacobson, G.: Space-efficient static trees and graphs. In: 30th SFCS 1989, pp. 549–554. IEEE Computer Society (1989)

    Google Scholar 

  12. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  13. Mignosi, F., Restivo, A., Sciortino, M.: Words and forbidden factors. Theoret. Comput. Sci. 273(1–2), 99–117 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: DCC 2009, pp. 193–202. IEEE Computer Society (2009)

    Google Scholar 

  15. Pinho, A.J., Ferreira, P.J.S.G., Garcia, S.P., Rodrigues, J.M.: On finding minimal absent words. BMC Bioinformatics 10 (2009)

    Google Scholar 

  16. Shun, J.: Fast parallel computation of longest common prefixes. In: SC 2014, pp. 387–398. IEEE Computer Society (2014)

    Google Scholar 

  17. Silva, R.M., Pratas, D., Castro, L., Pinho, A.J., Ferreira, P.J.S.G.: Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics 31(15), 2421–2425 (2015)

    Article  Google Scholar 

  18. Wu, Z.D., Jiang, T., Su, W.J.: Efficient computation of shortest absent words in a genomic sequence. Inf. Process. Lett. 110(14–15), 596–601 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Solon P. Pissis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Barton, C., Heliou, A., Mouchard, L., Pissis, S.P. (2016). Parallelising the Computation of Minimal Absent Words. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32152-3_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32151-6

  • Online ISBN: 978-3-319-32152-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics