Handbook of Formal Languages pp 361-398 | Cite as

# String Editing and Longest Common Subsequences

## Summary

The string editing problem for input strings *x* and y consists of transforming *x* into *y* by performing a series of weighted edit operations on *x* of overall minimum cost. An edit operation on *x* can be the deletion of a symbol from *x*, the insertion of a symbol in *x* or the substitution of a symbol of *x* with another symbol. String editing models a variety of problems arising in such diverse areas as text and speech processing, geology and, last but not least, molecular biology. Special cases of string editing include the longest common subsequence problem, local alignment and similarity searching in DNA and protein sequences, and approximate string searching. We describe serial and parallel algorithmic solutions for the problem and some of its basic variants.

## Keywords

Linear Space Edit Distance Input String Edit Operation Longe Common Subsequence## Preview

Unable to display preview. Download preview PDF.

## References

- [1]Aho, A. V. [ 1990 ], Algorithms for finding patterns in strings, Handbook of Theoretical Computer Science, J. van Leeuwen, Ed., Elsevier, Amsterdam, 255–300.Google Scholar
- [2]Aho, A. V., D. S. Hirschberg and J. D. Ullman [ 1976 ], Bounds on the complexity of the longest common subsequence problem,
*J. Assoc. Comput. Mach*.,**23**, 1–12.MathSciNetMATHCrossRefGoogle Scholar - [3]Aho, A. V., J. E. Hopcroft and J. D. Ullman [ 1974 ],
*The Design and Analysis of Computer Algorithms*, Addison-Wesley, Reading, MA.Google Scholar - [4]Aggarwal, A. and J. Park [ 1988 ], Notes on searching in multidimensional monotone arrays, in Proc. 29th Annual IEEE Symposium on Foundations of Computer Science, 1988, IEEE Computer Society, Washington, DC, 497–512.Google Scholar
- [5]Apostolico, A. [ 1986 ], Improving the worst case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings,
*Information Processing Letters***23**, 63–69.MathSciNetMATHCrossRefGoogle Scholar - [6]Apostolico, A. [ 1987 ], Remark on HSU-DU New Algorithm for the LCS Problem.
*Information Processing Letters***25**, 235–236.MathSciNetCrossRefGoogle Scholar - [7]Apostolico, A., Ed. [1994],
*Algorithmica***4/5**, Special Issue on String Algorithmics and Its Applications.Google Scholar - [8]Apostolico, A., M. J. Atallah, L. L. Larmore and S. Mcfaddin [1990], Efficient parallel algorithms for string editing and related problems,
*SIAM Journal on Computing***19**, 968–988. Also:*Proceedings of the 26th Allerton Conf. on Comm*.,*Control and Comp*., Monticello, IL, Sept. 1988, 253–263.Google Scholar - [9]Apostolico, A., S. Browne and C. Guerra [ 1992 ], Fast linear space computations of longest common subsequences,
*Theoretical Computer Science*,**92**, 3–17.MathSciNetMATHCrossRefGoogle Scholar - [10]Apostolico, A. and Z. Galil, Eds. [ 1985 ],
*Combinatorial Algorithms on Words*, Springer-Verlag, Berlin.MATHGoogle Scholar - [11]Apostolico, A. and C. Guerra [ 1985 ], A fast linear space algorithm for computing longest common subsequences,
*Proceedings of the 23rd Allerton Conference*, Monticello, IL (1985).Google Scholar - [12]Apostolico, A. and C. Guerra [ 1987 ], The longest common subsequence problem revisited,
*Algorithmica*,**2**, 315–336.MathSciNetMATHCrossRefGoogle Scholar - [13]Arlazarov, V.L., E. A. Dinic, M. A. Kronrod, and I. A. Faradzev[1970]. On economical construction of the transitive closure of a directed graph,
*Dokl. Akad. Nauk SSSR***194**, 487–488 (in Russian). English translation in*Soviet Math. Dokl*.**11:5**, 1209–1210.Google Scholar - [14]Atallah, M. J. [ 1993 ] A Faster Parallel Algorithm for a Matrix Searching Problem,
*Algorithmica*,**9**, 156–167.MathSciNetMATHCrossRefGoogle Scholar - [15]Bentley, J. L. and A. C-C. Yao [ 1976 ], An almost optimal algorithm for unbounded searching,
*Inform. Process. Letters***5**, 82–87.MathSciNetMATHCrossRefGoogle Scholar - [16]Bishop, M. J. and C. J Rawlings, Eds. [ 1987 ],
*Nucleic Acids and Protein Sequence Analysis*, IRL Press, Oxford.Google Scholar - [17]
- [18]Brown, M. R. and R. E. Tarjan [ 1978 ], A representation of linear lists with movable fingers.
*Proceedings of the 10-th STOC*, San Diego, CA, 19–29.Google Scholar - [19]Chang, W. I. and E. L. Lawler [1990], Approximate string matching in sublinear expected time, in
*Proc. 31st Annual IEEE Symp. on Foundations of Computer Science*, St. Louis, MO, 116–124Google Scholar - [20]Chao, K. M. [1994], Computing all suboptimal alignments in linear space, in
*Combinatorial Pattern Matching**1991*, M. Crochemore and D. Gusfield, Eds., Proceedings of the 5th Annual Symposium, Asilomar, CA, June 1994, Springer-Verlag Lecture Notes in Computer Science Vol. 807 (1994).Google Scholar - [21]
- [22]Dilworth, R.
**P**. [1950], A decomposition theorem for partially ordered sets,*Ann. Math*.**51**, 161–165.Google Scholar - [23]Doolittle, R. F., Ed. [ 1990 ],
*Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences*, Methods of Enzymology**183**, Academic Press, San Diego, CA.Google Scholar - [24]van Emde Boas, P. [ 1975 ], Preserving order in a forest in less than logarithmic time,
*Proc. 16th FOCS*, 75–84.Google Scholar - [25]Eppstein, D. and Z. Galil [ 1988 ], Parallel algorithmic techniques for combinatorial computation,
*Ann. Rev. Comput. Sci*., 3, 233–283.MathSciNetCrossRefGoogle Scholar - [26]Eppstein, D., Z. Galil, R. Giancarlo, and G. Italiano [ 1990 ]. Sparse dynamic programming,
*Proc. Symp. on Discrete Algorithms*, San Francisco, CA, 513–522.Google Scholar - [27]Fredman, M. L. [ 1975 ], On Computing the Length of Longest Increasing Subsequences,
*Discrete Mathematics**11*, 29–35.MathSciNetMATHCrossRefGoogle Scholar - [28]Fuchs, H., Z. M. Kedem, and S. P. Uselton [ 1977 ], Optimal surface reconstruction from planar contours,
*Communications of the Assoc. Comput. Mach*.,**20**, 693–702.MathSciNetMATHGoogle Scholar - [29]Galil Z. and R. Giancarlo [ 1988 ], Data structures and algorithms for approximate string matching,
*J. Complexity*4, 33–72.MathSciNetMATHCrossRefGoogle Scholar - [30]Galil, Z. and K. Park [ 1990 ], An improved algorithm for approximate string matching,
*SIAM Jour. Computing*19, 989–999.MathSciNetMATHCrossRefGoogle Scholar - [31]Gotoh, O. [ 1982 ]. An improved algorithm for matching biological sequences,
*J. Mol. Biol*.**162**, 705–708.CrossRefGoogle Scholar - [32]von Heijne, G. [ 1987 ],
*Sequence Analysis in Molecular Biology*, Academic Press, San Diego.Google Scholar - [33]Hirschberg, D.S. [ 1975 ], A linear space algorithm for computing maximal common subsequences,
*CACM***18**, 6, 341–343.MathSciNetMATHCrossRefGoogle Scholar - [34]Hirschberg, D. S. [ 1977 ], Algorithms for the longest common subsequence problem,
*JACM***24**, 4, 664–675.MathSciNetMATHCrossRefGoogle Scholar - [35]Hirschberg, D. S. [ 1978 ], An information theoretic lower bound for the longest common subsequence problem,
*Inform. Process. Lett*. 7: 1, 40–41.MathSciNetMATHCrossRefGoogle Scholar - [36]Hsu, W. J., and M. W.Du [ 1984 ], New algorithms for the LCS Problem,
*J. Comput. System Sci*.,**29**, 133–152.MathSciNetMATHCrossRefGoogle Scholar - [37]Hunt, J. W. and T. G. Szymanski [ 1977 ], A fast algorithm for computing longest common subsequences,
*CACM***20**, 5, 350–353.MathSciNetMATHCrossRefGoogle Scholar - [38]Ja Ja, J. [ 1992 ],
*An Introduction to Parallel Algorithms*, Addison-Wesley, Reading, MA.Google Scholar - [39]Jacobson, G. and K. P. Vo [1992], Heaviest increasing/common subsequence problems, in
*Combinatorial Pattern Matching*,*Proceedings of the Third Annual Symposium*, A. Apostolico, M. Crochemore, Z. Galil and U. Manger, Eds., Tucson, Arizona, 1992. Springer-Verlag, Berlin, Lecture Notes in Computer Science 644, 52–66.Google Scholar - [40]Johnson, D. B. [ 1982 ]. A priority queue in which initialization and queue operations take O(log log
*D)*time,*Math. Systems Theory*15, 295–309.MATHCrossRefGoogle Scholar - [41]Ivanov, A. G. [ 1985 ], Recognition of an approximate occurrence of. words on a Turing machine in real time,
*Math. USSR Izv*.,**24**, 479–522.MATHCrossRefGoogle Scholar - [42]Kedem, Z. M. and H. Fuchs [1980], On finding several shortest paths in certain graphs, in Proc. 18th Allerton Conference on Communication, Control, and Computing, October 1980, pp. 677–683.Google Scholar
- [43]Kumar, S. K. and C. P. Rangan [ 1987 ], A linear space algorithm for the LCS problem,
*Acta Informatica***24**, 353–362.MathSciNetMATHCrossRefGoogle Scholar - [44]Ladner, R. E., and M. J. Fischer [ 1980 ], Parallel prefix computation,
*J. Assoc. Comput. Mach*., 27, 831–838.MathSciNetMATHCrossRefGoogle Scholar - [45]Landau. G. M. and U. Vishkin [ 1986 ], Introducing efficient parallelism into approximate string matching and a new serial algorithm, in
*Proc. 18th Annual ACM STOC*, New York, 1986, 220–230.Google Scholar - [46]Landau, G. M. and U. Vishkin [ 1988 ], Fast string matching with k differences,
*Jour. Comp. and System Sci*.**37**, 63–78.MathSciNetMATHCrossRefGoogle Scholar - [47]Leighton, F. T. [ 1992 ],
*Introduction to Parallel Algorithms and Architectures*, Morgan Kaufmann, San Mateo, CA.Google Scholar - [48]Levenshtein, V. I. [ 1966 ], Binary codes capable of correcting deletions, insertions and reversals, Soviet Phys. Dokl., 10, 707–710.Google Scholar
- [49]Lipton, R. J. and D. Lopresti [ 1985 ], A systolic array for rapid string comparison
*Proc. Chapel Hill Conf. on Very Large Scale Integration*, H. Fucs, Ed., Computer Science Press, 363–376.Google Scholar - [50]H. M. Martinez, Ed. [ 1984 ], Mathematical and computational problems in the analysis of molecular sequences,
*Bull. Math. Bio*.*46*, ( Special Issue Honoring M. O. Dayhoff ).Google Scholar - [51]Masek, W. J. and M. S. Paterson [ 1980 ], A faster algorithm computing string edit distances,
*J. Comput. System Sci*.,*20*, 18–31.MathSciNetMATHCrossRefGoogle Scholar - [52]Mathies, T. R. [ 1988 ], A fast parallel algorithm to determine edit distance, Tech. Report CMU-CS-88–130, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, April 1988.Google Scholar
- [53]Mehlhorn, K. [ 1984 ],
*Data structures and algorithms 1: sorting and searching*, EATCS Monographs on TCS, Springer-Verlag, Berlin.MATHGoogle Scholar - [54]Myers, E. W. and W. Miller [ 1988 ], Optimal alignments in linear space,
*Comp. Appl. Biosc*.*4*,*1*,*11*-17.Google Scholar - [55]Myers, E. W. [ 1986 ], An
*O(ND)*difference algorithm and its variations,*Algorithmica**1*, 251–266.MathSciNetMATHCrossRefGoogle Scholar - [56]Nakatsu, N., Y. Kambayashi, and S. Yajima [ 1982 ], A longest common subsequence algorithm suitable for similar text strings,
*Acta Informatica**18*, 171–179.MathSciNetMATHCrossRefGoogle Scholar - [57]Needleman, R. B. and C. D. Wunsch [ 1973 ], A general method applicable to the search for similarities in the amino-acid sequence of two proteins,
*J. Molecular Bio*.,*48*, 443–453.CrossRefGoogle Scholar - [58]Ranka, S. and S. Sahni [ 1988 ], String editing on an SIMD hypercube multi-computer, Tech. Report 88–29, Department of Computer Science, University of Minnesota, March 1988,
*J. Parallel Distributed Comput*.Google Scholar - [59]
- [60]Sankoff, D.[ 1972 ], Matching sequences under deletion-insertion constraints,
*Proc. Nat. Acad. Sci. U.S.A*.,*69*, 4–6.MathSciNetMATHCrossRefGoogle Scholar - [61]Sankoff, D. and J. B. Kruskal, Eds. [ 1983 ],
*Time Warps*,*String Edits and Macromolecules: The Theory and Practice of Sequence Comparison*, Addison-Wesley, Reading, MA.Google Scholar - [62]Sankoff, D. and P. H. Sellers [ 1973 ], Shortcuts, Diversions and Maximal Chains in Partially Ordered Sets,
*Discrete Mathematics*,*4*, 287–293.MathSciNetMATHCrossRefGoogle Scholar - [63]Sellers, P. H. [ 1980 ], The theory and computation of evolutionary distance: pattern recognition,
*J. Algorithms*,*1*, 359–373.MathSciNetMATHCrossRefGoogle Scholar - [64]Smith, T. F. and M. S. Waterman [ 1981 ], Identification of Common Molecular Subsequences,
*Journal of Molecular Biology**147*, 195–197.CrossRefGoogle Scholar - [65]Ukkonen, E. [ 1985 ], Finding approximate patterns in strings,
*J. Algorithms*6, 132–137.MathSciNetMATHCrossRefGoogle Scholar - [67]Wagner, R. A. and M. J. Fischer [ 1974 ], The string to string correction problem,
*J. Assoc. Comput. Mach*.,*21*, 168–173.MathSciNetMATHCrossRefGoogle Scholar - [68]Waterman, M. S. (Ed.) [ 1989 ],
*Mathematical Methods for DNA sequences*, CRC Press, Boca Raton.MATHGoogle Scholar - [69]Wong, C. K. and A. K. Chandra [ 1976 ], Bounds for the string editing problem,
*J. Assoc. Comput. Mach*.,*23*, 13–16.MathSciNetMATHCrossRefGoogle Scholar - [70]Wu, S., U. Manber, E. W. Myers, and W. Miller [ 1990 ]. An
*O(NP)*sequence comparison algorithm,*Info. Proc. Letters**35*, 317–323.MathSciNetMATHCrossRefGoogle Scholar - [71]Wu, S., U. Manber, and E. Myers [ 1991 ]. Improving the running times for some string-matching problems.Google Scholar