Abstract
We present two algorithmic results pertinent to the matching of patterns of interest in macromolecular sequences. The first result is an output sensitive algorithm for approximately matching network expressions, i.e., regular expressions without Kleene closure. This result generalizes the O(kn) expected-time algorithm of Ukkonen for approximately matching keywords [Ukk85]. The second result concerns the problem of matching a pattern that is a network expression whose elements are approximate matches to network expressions interspersed with specifiable distance ranges. For this class of patterns, it is shown how to determine a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures.
This work was supported in part by the National Institutes of Health under Grant R01 LM04960 and the Aspen Center for Physics.
Preview
Unable to display preview. Download preview PDF.
References
Fickett, J.W., “Fast optimal alignment,” Nucleic Acids Research 12 (1984), 175–179.
Hopcroft, J.E. and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation (Addison-Wesley, 1979), 13–76.
Levenshtein, V. I., “Binary codes capable of correcting deletions, insertions, and reversals,” Cybernetics and Control Theory 10 (1966), 707–710.
U. Manber and R. Baeza-Yates, “An algorithm for string matching with a sequence of don't cares,” Information Processing Letters 37 (1991), 133–136.
Mehldau, G. and E.W. Myers, “A system for pattern matching applications on biosequences,” Technical Report TR91-31, Dept. of Computer Science, U. of Arizona, Tucson, AZ 85721.
Miller, J., A.D. McLachlan and A. Klug, “Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes,” EMBO Journal 4 (1985), 1609–1614.
Myers, E.W. and W. Miller, “Approximate matching of regular expressions,” Bull. of Math. Biol. 51 (1989), 5–37.
Needleman, S.B. and C.D. Wunsch, “A general method applicable to the search for similarities in the amino-acid sequence of two proteins,” J. Molecular Biology 48 (1970), 443–453.
Posfai, J., A.S. Bhagwat, G. Posfai, and R.J. Roberts, “Predictive motifs derived from cytosine methyltransferases,” Nucleic Acids Research 17 (1989), 2421–2435.
Sankoff, D. and J. B. Kruskal, Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison (Addison-Wesley, 1983), 265–310.
Sellers, P.H., “The theory and computation of evolutionary distances: pattern recognition,” J. Algorithms 1 (1980), 359–373.
Ukkonen, E., “Finding approximate patterns in strings,” J. of Algorithms 6 (1985), 132–137.
Wagner, R.A. and Fischer, M.J., “The String-to-String Correction Problem,” Journal of ACM 21 (1974), 168–173.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Myers, G. (1992). Approximate matching of network expressions with spacers. In: Simon, I. (eds) LATIN '92. LATIN 1992. Lecture Notes in Computer Science, vol 583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0023842
Download citation
DOI: https://doi.org/10.1007/BFb0023842
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55284-0
Online ISBN: 978-3-540-47012-0
eBook Packages: Springer Book Archive