Abstract
Given two sets of strings, consider the problem to find a subsequence that is common to one set but never appears in the other set. The problem is known to be NP-complete.We generalize the problem to an optimization problem, and give a practical algorithm to solve it exactly. Our algorithm uses pruning heuristic and subsequence automata, and can find the best subsequence. We show some experiments, that convinced us the approach is quite promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the 11th International Conference on Data Engineering, Mar. 1995.
D. Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46–62, Aug. 1980.
H. Arimura and S. Shimozono. Maximizing agreement with a classification by bounded or unbounded number of associated words. In Proc. of 9th Annual International Symposium on Algorithms and Computation, volume 1533 of Lecture Notes in Computer Science. Springer-Verlag, Dec. 1998.
H. Arimura, A. Wataki, R. Fujino, and S. Arikawa. A fast algorithm for discovering optimal string patterns in large text databases. In Proc. the 8th International Workshop on Algorithmic Learning Theory, volume 1501 of Lecture Notes in Artificial Intelligence, pages 247–261. Springer-Verlag, Oct. 1998.
R. A. Baeza-Yates.Searching subsequences. Theoretical Computer Science, 78(2):363–376, Jan. 1991.
A. Califano. SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics, Feb. 1999.
M. Crochemore and Z. Troníček. Directed acyclic subsequence graph for multiple texts. Technical Report IGM-99-13, Institut Gaspard-Monge, June 1999.
G. Das, R. Fleischer, L. Gasieniek, D. Gunopulos, and J. Kärkkäinen. iEpisode matching. In A. Apostolico and J. Hein, editors, Proc. of the 8th Annual Symposium on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 12–27. Springer-Verlag, 1997.
R. Feldman, Y. Aumann, A. Amir, A. Zilberstein, and W. Klosgen. Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, pages 167–174. AAAI Press, Aug. 1997.
R. Fujino, H. Arimura, and S. Arikawa. Discovering unordered and ordered phrase association patterns for text mining. In Proc. of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 1805 of Lecture Notes in Artificial Intelligence. Springer-Verlag, Apr. 2000.
H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa. Online construction of subsequence automata for multiple texts. In Proc. of 7th International Symposium on String Processing and Information Retrieval. IEEE Computer Society, Sept. 2000. (to appear).
L. C. K. Hui. Color set problem with applications to string matching. In Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230–243. Springer-Verlag, 1992.
T. Jiang and M. Li. On the complexity of learning strings and sequences. In Proc. of 4th ACM Conf. Computational Learning Theory, pages 367–371, 1991.
K.-I. Ko and W. Tzeng. Three ∑p 2-complete problems in computational learning theory. Computational Complexity, 1(3):269–310, 1991.
H. Mannila, H. Toivonen, and A. I. Vercamo. Discovering frequent episode in sequences. In Proc. of the 1st International Conference on Knowledge Discovery and Data Mining, pages 210–215. AAAI Press, Aug. 1995.
S. Miyano, A. Shinohara, and T. Shinohara. Which classes of elementary formal systems are polynomial-time learnable? In Proc. of 2nd Workshop on Algorithmic Learning Theory, pages 139–150, 1991.
S. Miyano, A. Shinohara, and T. Shinohara. Polynomial-time learning of elementary formal systems. New Generation Computing, 18:217–242, 2000.
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 226–236, May 2000.
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
S. Shimozono, A. Shinohara, T. Shinohara, S. Miyano, S. Kuhara, and S. Arikawa. Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan, 35(10):2009–2018, Oct. 1994.
Z. Troníček and B. Melichar. Directed acyclic subsequence graph. In Proc. of the Prague Stringology Club Workshop’ 98, pages 107–118, Sept. 1998.
J. T. L. Wang, G.-W. Chirn, T. G. Marr, B. A. Shapiro, D. Shasha, and K. Zhang. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proc. of the 1994 ACM SIGMOD International Conference on Management of Data, pages 115–125. ACM Press, May 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S. (2000). A Practical Algorithm to Find the Best Subsequence Patterns. In: Arikawa, S., Morishita, S. (eds) Discovery Science. DS 2000. Lecture Notes in Computer Science(), vol 1967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44418-1_12
Download citation
DOI: https://doi.org/10.1007/3-540-44418-1_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41352-3
Online ISBN: 978-3-540-44418-3
eBook Packages: Springer Book Archive