Sequences II pp 225-244 | Cite as

Efficient Algorithms for Sequence Analysis

  • David Eppstein
  • Zvi Galil
  • Raffaele Giancarlo
  • Giuseppe F. Italiano

Abstract

We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based effectively exploit the physical constraints of the problem to derive more efficient methods for sequence analysis.

Keywords

Polypeptide Macromolecule Hunt Adenine Sorting 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A. Aggarwal, M. M. Klawe, S. Moran, P. Shor, and R. Wilber, Geometric Applications of a Matrix-Searching Algorithm, Algorithmica 2, 1987, pp. 209–233.CrossRefMathSciNetGoogle Scholar
  2. [2]
    A. Aggarwal and J. Park, Searching in Multidimensional Monotone Matrices, 29th IEEE Symp. Found. Comput. Sci., 1988, pp. 497–512.Google Scholar
  3. [3]
    A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974.MATHGoogle Scholar
  4. [4]
    A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Data Structures and Algorithms, Addison-Wesley, 1983.MATHGoogle Scholar
  5. [5]
    A. Apostolico and C. Guerra, The Longest Common Subsequence Problem Revisited, Algorithmica 2, 1987, pp. 315–336.MATHCrossRefMathSciNetGoogle Scholar
  6. [6]
    J. L. Bentley and J. B. Saxe, Decomposable Searching Problems I: Static-to-Dynamic Transformation. J. Algorithms 1 (4), December 1980, pp. 301–358.MATHCrossRefMathSciNetGoogle Scholar
  7. [7]
    H. S. Bilofsky, C. Burks, J. W. Fickett, W. B. Goad, F. I. Lewitter, W. P. Rindone, C. D. Swindel, and C. S. Tung, The GenBank Genetic Sequence Databank, Nucl. Acids Res. 14, 1986, pp. 1–4.CrossRefGoogle Scholar
  8. [8]
    C. DeLisi, Computers in Molecular Biology: Current Applications and Emerging Trends, Science, 240, 1988, pp. 47–52.CrossRefGoogle Scholar
  9. [9]
    D. Eppstein, Sequence Comparison with Mixed Convex and Concave Costs, J. of Algorithms, 11, 1990, pp. 85–101.MATHCrossRefMathSciNetGoogle Scholar
  10. [10]
    D. Eppstein, Z. Galil, and R. Giancarlo, Speeding Up Dynamic Programming, 29th IEEE Symp. Found. Comput. Sci., 1988, pp. 488–490.Google Scholar
  11. [11]
    D. Eppstein, Z. Galil, R. Giancarlo, and G. F. Italiano, Sparse Dynamic Programming I: Linear Cost Functions, J. ACM, to appear.Google Scholar
  12. [12]
    D. Eppstein, Z. Galil, R. Giancarlo, and G. F. Italiano, Sparse Dynamic Programming II: Convex and Concave Cost Functions, J. ACM, to appear.Google Scholar
  13. [13]
    M. J. Fischer and R. Wagner, The String to String Correction Problem, J. ACM 21, 1974, pp. 168–178.MATHCrossRefMathSciNetGoogle Scholar
  14. [14]
    W. M. Fitch, Weighted Parsimony, Workshop on Algorithms for Molecular Genetics, Washington D.C., 1988.Google Scholar
  15. [15]
    W. M. Fitch and T. F. Smith, Optimal Sequence Alignment, Proc. Nat. Acad. Sci. USA 80, 1983, pp. 1382–1385.CrossRefGoogle Scholar
  16. [16]
    Z. Galil and R. Giancarlo, Speeding Up Dynamic Programming with Applications to Molecular Biology, Theor. Comput. Sci., 64, 1989, pp. 107–118.MATHCrossRefMathSciNetGoogle Scholar
  17. [17]
    Z. Galil and Y. Rabani, On the Space Requirement for Computing Edit Distances with Convex or Concave Gap Costs, Theor. Comp. Sci., to appear.Google Scholar
  18. [18]
    M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, 1979.MATHGoogle Scholar
  19. [19]
    O. Gotoh, An Improved Algorithm for Matching Biological Sequences, J. Mol. Biol. 162, 1982, pp. 705–708.CrossRefGoogle Scholar
  20. [20]
    G. H. Hamm and G. N. Cameron, The EMBL Data Library, Nucl. Acids Res. 14, 1986, pp. 5–9.CrossRefGoogle Scholar
  21. [21]
    J. P. Haton, Practical Application of a Real-Time Isolated-Word Recognition System using Syntactic Constraints, IEEE Trans. Acoustics, Speech and Signal Proc. ASSP-22(6), 1974, pp. 416–419.CrossRefGoogle Scholar
  22. [22]
    D. S. Hirschberg, A Linear Space Algorithm for Computing Maximal Common Subsequences, Comm. ACM 18, 1975, pp. 341–343.MATHCrossRefMathSciNetGoogle Scholar
  23. [23]
    D. S. Hirschberg, Algorithms for the Longest Common Subsequence Problem, J. ACM 24, 1977, pp. 664–675.MATHCrossRefMathSciNetGoogle Scholar
  24. [24]
    D. S. Hirschberg and L. L. Larmore, The Least Weight Subsequence Problem, 26th IEEE Symp. Found. Comput. Sci., 1985, 137–143, and SIAM J. Comput. 16, 1987, pp. 628–638.Google Scholar
  25. [24a]
    D. S. Hirschberg and L. L. Larmore, The Least Weight Subsequence Problem, 26th IEEE Symp. Found. Comput. Sci., 1985, 137–143, and SIAM J. Comput. 16, 1987, pp. 628–638.MATHCrossRefMathSciNetGoogle Scholar
  26. [25]
    M. K. Hobish, The Role of the Computer in Estimates of DNA Nucleotide Sequence Divergence, in S. K. Dutta, ed., DNA Systematics, Volume I: Evolution, CRC Press, 1986.Google Scholar
  27. [26]
    J. W. Hunt and T. G. Szymanski, A Fast Algorithm for Computing Longest Common Subsequences, C. ACM 20 (5), 1977, pp. 350–353.MATHCrossRefMathSciNetGoogle Scholar
  28. [27]
    D. B. Johnson, A Priority Queue in Which Initialization and Queue Operations Take O(loglog D) Time, Math. Sys. Th. 15, 1982, pp. 295–309.MATHCrossRefGoogle Scholar
  29. [28]
    M. I. Kanehisi and W. B. Goad, Pattern Recognition in Nucleic Acid Sequences II: An Efficient Method for Finding Locally Stable Secondary Structures, Nucl. Acids Res. 10 (1), 1982, pp. 265–277.CrossRefGoogle Scholar
  30. [29]
    Z. M. Kedem and H. Fuchs, On Finding Several Shortest Paths in Certain Graphs, 18th Allerton Conf., 1980, pp. 677–686.Google Scholar
  31. [30]
    M. M. Klawe and D. Kleitman, An Almost Linear Algorithm for Generalized Matrix Searching, Tech. Rep. IBM Almaden Research Center, 1988.Google Scholar
  32. [31]
    D. E. Knuth and M. F. Plass, Breaking Paragraphs into Lines, Software Practice and Experience 11, 1981, pp. 1119–1184.MATHCrossRefGoogle Scholar
  33. [32]
    A. G. Ivanov, Distinguishing an approximate word’s inclusion on Turing machine in real time, Izv. Acad. Nauk USSR Ser. Mat. 48, 1984, pp. 520–568.MATHGoogle Scholar
  34. [33]
    L. L. Larmore and B. Schieber, On-Line Dynamic Programming with Applications to the Prediction of RNA Secondary Structure, J. Algorithms, to appear.Google Scholar
  35. [34]
    V. I. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions and Reversals, Sov. Phys. Dokl. 10, 1966, pp. 707–710.MathSciNetGoogle Scholar
  36. [35]
    D. Maier, The Complexity of Some Problems on Subsequences and Supersequences, J. ACM 25, 1978, pp. 322–336.MATHCrossRefMathSciNetGoogle Scholar
  37. [36]
    T. Maniatis, Recombinant DNA, in D.M. Prescott, ed., Cell Biology, Academic Press, New York, 1980.Google Scholar
  38. [37]
    H. Martinez, Extending RNA Secondary Structure Predictions to Include Pseudoknots, Workshop on Algorithms for Molecular Genetics, Washington D.C., 1988.Google Scholar
  39. [38]
    W. J. Masek and M. S. Paterson, A Faster Algorithm Computing String Edit Distances, J. Comp. Sys. Sci. 20, 1980, pp. 18–31.MATHCrossRefMathSciNetGoogle Scholar
  40. [39]
    A. M. Maxam and W. Gilbert, Sequencing End-Labeled DNA with Base Specific Chemical Cleavages, Meth. Enzymol. 65, 1980, p. 499.CrossRefGoogle Scholar
  41. [40]
    W. Miller and E. W. Myers, Sequence Comparison with Concave Weighting Functions, Bull. Math. Biol., 50 (2), 1988, pp. 97–120.MATHMathSciNetGoogle Scholar
  42. [41]
    S. B. Needleman and C. D. Wunsch, A General Method applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins, J. Mol. Biol. 48, 1970, p. 443.CrossRefGoogle Scholar
  43. [42]
    R. Nussinov, G. Pieczenik, J. R. Griggs, and D. J. Kleitman, Algorithms for Loop Matchings, SIAM J. Appl. Math. 35 (1), 1978, pp. 68–82.MATHCrossRefMathSciNetGoogle Scholar
  44. [43]
    R. Nussinov and A. Jacobson, Fast Algorithm for Predicting the Secondary Structure of Single-Stranded RNA, Proc. Nat. Acad. Sci. USA 77, 1980, pp. 6309–6313.CrossRefGoogle Scholar
  45. [44]
    G. N. Reeke, Protein Folding: Computational Approaches to an Exponential-Time Problem, Ann. Rev. Comput. Sci. 3, 1988, pp. 59–84.CrossRefGoogle Scholar
  46. [45]
    T. A. Reichert, D. N. Cohen, and A. K. C. Wong, An Application of Information Theory to Genetic Mutations and the Matching of Polypeptide Sequences, J. Theor. Biol. 42, 1973, pp. 245–261.CrossRefGoogle Scholar
  47. [46]
    H. Sakoe and S. Chiba, A Dynamic-Programming Approach to Continuous Speech Recognition, Proc. Int. Cong. Acoustics, Budapest, 1971, Paper 20 C 13.Google Scholar
  48. [47]
    F. Sanger, S. Nicklen, and A. R. Coulson, Chain Sequencing with Chain-Terminating Inhibitors, Proc. Nat. Acad. Sci. USA 74, 1977, 5463.CrossRefGoogle Scholar
  49. [48]
    David Sankoff, Matching Sequences under Deletion-Insertion Constraints, Proc. Nat. Acad. Sci. USA 69, 1972, pp. 4–6.MATHCrossRefMathSciNetGoogle Scholar
  50. [49]
    D. Sankoff, Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems, SIAM J. Appl. Math. 45 (5), 1985, pp. 810–825.MATHCrossRefMathSciNetGoogle Scholar
  51. [50]
    D. Sankoff, J. B. Kruskal, S. Mainville, and R. J. Cedergren, Fast Algorithms to Determine RNA Secondary Structures Containing Multiple Loops, in D. Sankoff and J. B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983, pp. 93–120.Google Scholar
  52. [51]
    D. Sankoff and J. B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983.Google Scholar
  53. [52]
    P. H. Sellers, On the Theory and Computation of Evolutionary Distance, SIAM J. Appl. Math. 26, 1974, pp. 787–793.MATHCrossRefMathSciNetGoogle Scholar
  54. [53]
    P. H. Sellers, Personal Communication, 1989.Google Scholar
  55. [54]
    T. Smith and M. S. Waterman, Identification of Common Molecular Subsequences, J. Mol. Biol. 147 (1981), pp. 195–197.CrossRefGoogle Scholar
  56. [55]
    E. Ukkonen, On approximate string matching, J. of Algorithms, 6, 1985, pp. 132–137.MATHCrossRefMathSciNetGoogle Scholar
  57. [56]
    V. M. Velichko and N. G. Zagoruyko, Automatic Recognition of 200 Words, Int. J. Man-Machine Studies 2, 1970, pp. 223–234.CrossRefGoogle Scholar
  58. [57]
    T. K. Vintsyuk, Speech Discrimination by Dynamic Programming, Cybernetics 4(1), 1968, 52–57;CrossRefMathSciNetGoogle Scholar
  59. [57a]
    T. K. Vintsyuk, Speech Discrimination by Dynamic Programming, Russian Kibernetika 4 (1), 1968, pp. 81–88.MathSciNetGoogle Scholar
  60. [58]
    R. A. Wagner, On the Complexity of the Extended String-to-String Correction Problem, 7th ACM Symp. Theory of Computing, 1975, pp. 218–223.Google Scholar
  61. [59]
    M. S. Waterman, Sequence alignments in the neighborhood of the optimum with general applications to dynamic programming, Proc. Natl. Acad. Sci. USA, 80, 1983, pp. 3123–3124.MATHCrossRefGoogle Scholar
  62. [60]
    M. S. Waterman, Efficient Sequence Alignment Algorithms, J. of Theor. Biol., 108, 1984, pp. 333.CrossRefMathSciNetGoogle Scholar
  63. [61]
    M. S. Waterman, General Methods of Sequence Comparison, Bull. Math. Biol. 46, 1984, pp. 473–501.MATHMathSciNetGoogle Scholar
  64. [62]
    M. S. Waterman Editor, Mathematical Methods for DNA Sequences, CRC Press, Inc., 1988.Google Scholar
  65. [63]
    M. S. Waterman and T. F. Smith, RNA Secondary Structure: A Complete Mathematical Analysis, Math. Biosciences 42, 1978, pp. 257–266.MATHCrossRefGoogle Scholar
  66. [64]
    M. S. Waterman and T. F. Smith, New Stratigraphic Correlation Techniques, J. Geol. 88, 1980, pp. 451–457.CrossRefGoogle Scholar
  67. [65]
    M. S. Waterman and T. F. Smith, Rapid Dynamic Programming Algorithms for RNA Secondary Structure, Adv. Appl. Math. 7, 1986, pp. 455–464.MATHCrossRefMathSciNetGoogle Scholar
  68. [66]
    M. S. Waterman, T. F. Smith, and W. A. Beyer, Some Biological Sequence Metrics, Adv. Math. 20, 1976, pp. 367–387.MATHCrossRefMathSciNetGoogle Scholar
  69. [67]
    Robert Wilber, The Concave Least Weight Subsequence Problem Revisited, J. Algorithms 9 (3), 1988, pp. 418–425.MATHCrossRefMathSciNetGoogle Scholar
  70. [68]
    W. J. Wilbur and D. J. Lipman, Rapid Similarity Searches of Nucleic Acid and Protein Data Banks, Proc. Nat. Acad. Sci. USA 80, 1983, pp. 726–730.CrossRefGoogle Scholar
  71. [69]
    W. J. Wilbur and D. J. Lipman, The Context Dependent Comparison of Biological Sequences, SIAM J. Appl. Math. 44 (3), 1984, pp. 557–567.MATHCrossRefMathSciNetGoogle Scholar
  72. [70]
    M. Zucker, The Use of Dynamic Programming Algorithms in RNA Secondary Structure Prediction, in M. S. Waterman editor, Mathematical Methods for DNA Sequences, CRC Press, 1988, pp. 159–184.Google Scholar
  73. [71]
    M. Zuker, and P. Stiegler, Optimal Computer Folding of Large RNA Sequences using Thermodynamics and Auxiliary Information, Nucl. Acids Res. 9, 1981, pp. 133.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1993

Authors and Affiliations

  • David Eppstein
    • 1
  • Zvi Galil
    • 2
    • 3
  • Raffaele Giancarlo
    • 4
  • Giuseppe F. Italiano
    • 2
    • 5
  1. 1.Department of Information and Computer ScienceUniversity of CaliforniaIrvineUSA
  2. 2.Department of Computer ScienceColumbia UniversityNew YorkUSA
  3. 3.Department of Computer ScienceTel-Aviv UniversityTel-AvivIsrael
  4. 4.AT&T Bell LaboratoriesMurray HillUSA
  5. 5.Dipartimento di Informatica e SistemisticaUniversità di Roma “La Sapienza”RomeItaly

Personalised recommendations