Skip to main content

An SIMD Algorithm for Wraparound Tandem Alignment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10330))

Abstract

DNA tandem repeats (TRs), and in particular, variable number of tandem repeat (VNTR) loci, can have functional effects on gene regulation and disease mechanisms and are useful for forensics studies. The need to quickly analyze high volumes of sequencing data for TRs and VNTRs has motivated the search for a more efficient sequence alignment algorithm for tandem repeats. Alignment of a pattern to a sequence, which may contain zero or more tandem copies of the pattern, can be accomplished using wraparound dynamic programming (WDP). This paper presents the use of Single Instruction, Multiple Data (SIMD) computer instructions as well as a parallel scan to accelerate WDP, extending earlier SIMD algorithms for global alignment. The SIMD data types and intrinsics store data in 128 bit computer words partitioned into 16 1-byte blocks. Operations are performed on the bytes separately and simultaneously. We allow either single values for match and mismatch, or a substitution scoring scheme that assigns a potentially different substitution weight to every pair of alphabet characters. Additionally, for indels, we allow either a simple linear gap penalty or an affine gap penalty. Benchmarking demonstrated that SIMD tandem alignment runs over 3 times faster than standard wraparound dynamic programming.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alleman, M., Sidorenko, L., McGinnis, K., Seshadri, V., Dorweiler, J.E., White, J., Sikkink, K., Chandler, V.L.: An RNA-dependent RNA polymerase is required for paramutation in maize. Nature 442, 295–298 (2006)

    Article  Google Scholar 

  2. Benson, G.: Sequence alignment with tandem duplication. J. Comput. Biol. 4, 351–367 (1997)

    Article  Google Scholar 

  3. Blelloch, G.E.: Vector Models for Data-parallel Computing, vol. 356. MIT Press, Cambridge (1990)

    Google Scholar 

  4. Campuzano, V., Montermini, L., Molto, M., Pianese, L., Cossee, M.: Friedreich’s ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion. Science 271, 1423–1427 (1996)

    Article  Google Scholar 

  5. Clarke, H., Flint, J., Attwood, A., Munafo, M.: Association of the 5-HTTLPR genotype and unipolar depression: a meta-analysis. Psychol. Med. 40, 1767–1778 (2010)

    Article  Google Scholar 

  6. de Koning, A.P., Gu, W., Castoe, T.A., Batzer, M.A., Pollock, D.D.: Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7(12), e1002384 (2011)

    Article  Google Scholar 

  7. Farrar, M.: Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)

    Article  Google Scholar 

  8. Fischetti, V.A., Landau, G.M., Schmidt, J.P., Sellers, P.H.: Identifying periodic occurrences of a template with applications to protein structure. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 111–120. Springer, Heidelberg (1992). doi:10.1007/3-540-56024-6_9

    Chapter  Google Scholar 

  9. Frothingham, R., Meeker-O’Connell, W.A.: Genetic diversity in the Mycobacterium tuberculosis complex based on variable numbers of tandem DNA repeats. Microbiology 144(5), 1189–1196 (1998)

    Article  Google Scholar 

  10. Fu, Y.-H., Pizzuti, A., Fenwick, R., King, J., Rajnarayan, S., Dunne, P., Dubel, J., Nasser, G., Ashizawa, T., DeJong, P., Wieringa, B., Korneluk, R., Perryman, M., Epstein, H., Caskey, C.: An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science 255, 1256–1258 (1992)

    Article  Google Scholar 

  11. Gascoyne-Binzi, D., Barlow, R., Frothingham, R., Robinson, G., Collyns, T., Gelletlie, R., Hawkey, P.: Rapid identification of laboratory contamination with Mycobacterium tuberculosis using variable number tandem repeat analysis. J. Clin. Microbiol. 39, 69–74 (2001)

    Article  Google Scholar 

  12. Gelfand, Y., Hernandez, Y., Loving, J., Benson, G.: VNTRseek - a computational tool to detect tandem repeat variants in high-throughput sequencing data. Nucleic Acids Res. 42(14), 8884–8894 (2014). http://dx.doi.org/10.1093/nar/gku642

    Article  Google Scholar 

  13. Huntington’s disease collaborative research group: A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72, 971–983 (1993)

    Article  Google Scholar 

  14. Jobling, M.A., Gill, P.: Encoded evidence: DNA in forensic analysis. Nat. Rev. Genet. 5(10), 739–751 (2004)

    Article  Google Scholar 

  15. Keim, P., Pearson, T., Okinaka, R.: Microbial forensics: DNA fingerprinting of Bacillus anthracis (anthrax). Anal. Chem. 80(13), 4791–4800 (2008). doi:10.1021/ac086131g

    Article  Google Scholar 

  16. Lasky-Su, J.A., Faraone, S.V., Glatt, S.J., Tsuang, M.T.: Meta-analysis of the association between two polymorphisms in the serotonin transporter gene and affective disorders. Am. J. Med. Genet. B Neuropsychiatr. Genet. 133B, 110–115 (2005)

    Article  Google Scholar 

  17. Lesch, K.P., Bengel, D., Heils, A., Sabol, S.Z., Greenberg, B.D., Petri, S., Benjamin, J., Muller, C.R., Hamer, D.H., Murphy, D.L.: Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science 274, 1527–1531 (1996)

    Article  Google Scholar 

  18. Lindstedt, B.-A.: Multiple-locus variable number tandem repeats analysis for genetic fingerprinting of pathogenic bacteria. Electrophoresis 26(13), 2567–2582 (2005)

    Article  Google Scholar 

  19. Loving, J.: Bit-parallel and SIMD alignment algorithms for biological sequence analysis. Ph.D. thesis, Boson University (2017)

    Google Scholar 

  20. Loving, J., Hernandez, Y., Benson, G.: BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm. Bioinformatics 30(22), 3166–3173 (2014)

    Article  MATH  Google Scholar 

  21. Loving, J., Becker, E., Benson, G.: Bit-parallel alignment with substitution scoring. In: Proceedings of the 8th International Conference on Bioinformatics and Computational Biology (BICoB), pp. 149–154 (2016)

    Google Scholar 

  22. Miller, W., Myers, E.: Approximate matching of regular expressions. Bull. Math. Biol. 51, 5–37 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  23. Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  24. Pritchard, A.L., Pritchard, C.W., Bentham, P., Lendon, C.L.: Role of serotonin transporter polymorphisms in the behavioural and psychological symptoms in probable Alzheimer disease patients. Dement. Geriatr. Cogn. Disord. 24, 201–206 (2007)

    Article  Google Scholar 

  25. Stam, M., Belele, C., Dorweiler, J.E., Chandler, V.L.: Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation. Genes Dev. 16, 1906–1918 (2002)

    Article  Google Scholar 

  26. Teixeira, F.K., Colot, V.: Repeat elements and the Arabidopsis DNA methylation landscape. Heredity 105, 14–23 (2010). http://dx.doi.org/10.1038/hdy.2010.52

    Article  Google Scholar 

  27. Van Belkum, A.: Tracing isolates of bacterial species by multilocus variable number of tandem repeat analysis (MLVA). FEMS Immunol. Med. Microbiol. 49(1), 22–27 (2007)

    Article  Google Scholar 

  28. Verkerk, A., Pieretti, M., Sutcliffe, J., Fu, Y., Kuhl, D., Pizzuti, A., Reiner, O., Richards, S., Victoria, M., Zhang, F., Eussen, B., van Ommen, G., Blonden, A., Riggins, G., Chastain, J., Kunst, C., Galjaard, H., Caskey, C., Nelson, D., Oostra, B., Warren, S.: Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905–914 (1991)

    Article  Google Scholar 

  29. Vinces, M.D., Legendre, M., Caldara, M., Hagihara, M., Verstrepen, K.J.: Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009)

    Article  Google Scholar 

  30. Walker, E.L.: Paramutation of the r1 locus of maize is associated with increased cytosine methylation. Genetics 148, 1973–1981 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua Loving .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Loving, J., Scaduto, J.P., Benson, G. (2017). An SIMD Algorithm for Wraparound Tandem Alignment. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59575-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59574-0

  • Online ISBN: 978-3-319-59575-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics