GPU Accelerated Computation of the Longest Common Subsequence

  • Katsuya Kawanami
  • Noriyuki Fujimoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7174)


The longest common subsequence (LCS for short) for given two strings has various applications, e.g. comparison of DNAs. In this paper, we propose a GPU algorithm to accelerate Hirschberg’s CPU LCS algorithm improved using Crochemore et al’s bit-parallel CPU algorithm. Crochemore’s algorithm includes bitwise logical operators which can be computed in embarrassingly parallel. However, it also includes an operator with less parallelism, i.e. an arithmetic sum. In this paper, we focus on how to implement these operators efficiently in parallel. Our experiments with 2.93GHz Intel Core i3 530 CPU, GeForce 8800 GTX, GTX 285, and GTX 480 GPUs show that the proposed algorithm runs maximum 12.77 times faster than the bit-parallel CPU algorithm and maximum 76.5 times faster than Hirschberg’s LCS CPU algorithm. Furthermore, the proposed algorithm runs 10.9 to 18.1 times faster than Kloetzli’s existing GPU algorithm.


Shared Memory Global Memory Recursive Call Full Adder Speedup Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Crochemore, M., Iliopoulos, C.S., Pinzon, Y.J., Reid, J.F.: A Fast and Practical Bit-Vector Algorithm for the Longest Common Subsequence Problem. Information Processing Letters 80(6), 279–285 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Garland, M., Kirk, D.B.: Understanding Throughput-Oriented Architectures. Communications of the ACM 53(11), 58–66 (2010)CrossRefGoogle Scholar
  3. 3.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)Google Scholar
  4. 4.
    Hirschberg, D.S.: A Linear Space Algorithm for Computing Maximal Common Subsequences. Communications of the ACM 18(6), 341–343 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann (2010)Google Scholar
  6. 6.
    Kloetzli, J., Strege, B., Decker, J., Olano, M.: Parallel Longest Common Subsequence Using Graphics Hardware. In: Proc. of the 8th Eurographics Symposium on Parallel Graphics and Visualization, EGPGV (2008)Google Scholar
  7. 7.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro 28(2), 39–55 (2008)CrossRefGoogle Scholar
  8. 8.
    Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional (2010)Google Scholar
  9. 9.
    Sklansky, J.: Conditional-Sum Addition Logic. IRE Trans. on Electronic Computers EC-9, 226–231 (1960)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Vai, M.: VLSI Design. CRC Press (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Katsuya Kawanami
    • 1
  • Noriyuki Fujimoto
    • 1
  1. 1.Sakai-shiJapan

Personalised recommendations