Searching for the longest common substring (LCS) of biosequences is one of the most important tasks in Bioinformatics. A fast algorithm for LCS problem named FAST_LCS is presented. The algorithm first seeks the successors of the initial identical character pairs according to a successor table to obtain all the identical pairs and their levels. Then by tracing back from the identical character pair at the largest level, the result of LCS can be obtained. For two sequences X and Y with lengths n and m, the memory required for FAST_LCS is max{8*(n+1)*8*(m*1),L}, here L is the number of identical character pairs and time complexity of parallel implementation is O(|LCS(X,Y)|), here, |LCS(X,Y)| is the length of the LCS of X,Y. Experimental result on the gene sequences of tigr database using MPP parallel computer Shenteng 1800 shows that our algorithm can get exact correct result and is faster and more efficient than other LCS algorithms.
Chapter PDF
Similar content being viewed by others
References
A. Aggarwal and J. Park, 1988, Notes on Searching in Multidimensional Monotone Arrays, Proc. 29th Ann. IEEE Symp. Foundations of Comput. Sci. pp. 497-512.
A. Aho, D. Hirschberg, and J. Ullman, 1976, Bounds on the Complexity of the Longest Common Subsequence Problem, J. Assoc. Comput. Mach., Vol. 23, No. 1, 1976, pp. 1-12.
A. Apostolico, M. Atallah, L. Larmore, and S. Mcfaddin, 1990, Efficient Parallel Algorithms for String Editing and Related Problems, SIAM J. Computing, Vol. 19, pp. 968-988.
Bailin Hao, Shuyu Zhang, 2000, The manual of Bioinformatics, Shanghai science and technology publishing company.
D.S. Hirschberg, 1975, A Linear Space Algorithm for Computing Maximal Common Subsequences, Commun. ACM, Vol. 18, No. 6, pp. 341-343.
E.W. Mayers, W. Miller, 1998, Optimal Alignment in Linear Space, Comput. Appl. Biosci. Vol. 4, No. 1, pp. 11-17.
Edmiston E.W., Core N.G., Saltz J.H, et al., 1988, Parallel processing of biological sequence comparison algorithms. International Journal of Parallel Programming, Vol. 17, No. 3, pp. 259-275.
Jean Frédéric Myoupo, David Seme, 1999, Time-Efficient Parallel Algorithms for the Longest Common Subsequence and Related Problems, Journal of Parallel and Distributed Computing, Vol. 57, No. 2, pp. 212-223.
K. Nandan Babu, Wipro Systems, and Sanjeev Saxena, 1997, Parallel Algorithms for the Longest Common Subsequence Problem, 4th International Conference on High Performance Computing, December 18-21, 1997 - Bangalore, India.
L. Bergroth, H. Hakonen, and T. Raita, 2000, A survey of longest common subsequence algorithms, Seventh International Symposium on String Processing Information Retrieval, pp. 39-48.
Needleman, S.B. and Wunsch, C.D., 1970, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., Vol. 48, No. 3, pp. 443-453.
O. Gotoh, 1982, An improved algorithm for matching biological sequences, J. Molec. Biol. Vol. 162, pp. 705-708.
Smith T.F., Waterman M.S. 1990, Identification of common molecular subsequence. Journal of Molecular Biology, Vol. 215, pp. 403-410.
V. Freschi and A. Bogliolo, 2004, Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism, Information Processing Letters, Vol. 90, No. 4, pp. 167-173.
Y. Pan, K. Li, 1998, Linear Array with a Reconfigurable Pipelined Bus System - Concepts and Applications, Journal of Information Science, Vol. 106, pp. 237-258.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 IFIP International Federation for Information Processing
About this paper
Cite this paper
Liu, W., Chen, L. (2008). A Fast Longest Common Subsequence Algorithm for Biosequences Alignment. In: Li, D. (eds) Computer And Computing Technologies In Agriculture, Volume I. CCTA 2007. The International Federation for Information Processing, vol 258. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-77251-6_8
Download citation
DOI: https://doi.org/10.1007/978-0-387-77251-6_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-77250-9
Online ISBN: 978-0-387-77251-6
eBook Packages: Computer ScienceComputer Science (R0)