Abstract
Most daily and scientific data are sequential in nature. Discoveringimportant patterns from such data can benefit the user and scientist bypredicting coming activities, interpreting recurring phenomena, extractingoutstanding similarities and differences for close attention, compressingdata, and detecting intrusion. We consider the following incrementaldiscovery problem for large and dynamic sequential data. Suppose thatpatterns were previously discovered and materialized. An update is made tothe sequential database. An incremental discovery will take advantage ofdiscovered patterns and compute only the change by accessing the affectedpart of the database and data structures. In addition to patterns, thestatistics and position information of patterns need to be updated to allowfurther analysis and processing on patterns. We present an efficientalgorithm for the incremental discovery problem. The algorithm is applied tosequential data that honors several sequential patterns modeling weatherchanges in Singapore. The algorithm finds what it is supposed to find.Experiments show that for small updates and large databases, the incrementaldiscovery algorithm runs in time independent of the data size.
Similar content being viewed by others
References
Agrawal, R. and Srikant, R. (1995). Mining sequential patterns. IEEE Conference on Data Engineering(pp. 3–14).
Agrawal, R., Lin, K.I., Sawhney, H.S., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, VLDB, 490–501.
Baeza-Yates, R. (1992). Text Retrieval: Theory and Practice, Algorithms, Software, Architecture: Information Processing, 1, 465–476.
Baeza-Yates, R. and Gonnet, G.H. (1992). A New Approach to Text Searching, CACM, 35(10), 74–82.
Boyer, R.S. and Moore, J.S. (1977). A Fast String Searching Algorithm, CACM, 20(10), 762–772.
Cobbs, A.L. (1995). Fast approximate matching using suffix trees. In Proc. Combinatorial Pattern Matching, Lecture Notes in Computer Science (vol. 937, pp. 41–54), Springer-Verlag.
Dietterich, T.G. and Michalski, R.S. (1985). Discovering Patterns in Strings of Events, Artificial Intelligence, 25, 187–232.
Faloutsos, C. (1985). Access Methods for Text, ACM Computing Surveys, 17, 49–74.
Fayyad, U., Shapiro, G.P., and Smyth, P. (1996). Knowledge Discovery and Data Mining: Towards a Unifying Framework, KDD, 82–88.
Gonnet, G.H. and Baeza-Yates, R. (1991). Handbook of Algorithms and Data Structures in Pascal and C, second edition.
Guttman, A. (1984). R-trees: A Dynamic Index Structure for Spatial Searching, ACM SIGMOD, 47–57.
Hui, L.C.K. (1992). Color Set Size Problem with Applications to String Matching. In A. Apostolico et al. (Eds.), Combinatorial Patterns Matching, Lecture Notes in Computer Science, 644, 230–243, Springer-Verlag.
Knuth, D.E., Morris, J.H., and Pratt, V.R. (1977). Fast Pattern Matching in Strings, SIAM J. Comput.6, 323–350.
Landau, G.M. and Vishkin, U. (1989). Fast Parallel and Serial Approximate String Matching, Journal of Algorithms, 10(2), 157–169.
McCreight, E.M. (1976). A Space-Economical Suffix Tree Construction Algorithm, JACM, 23(2), 262–272.
Sellis, T., Roussopoulos, N., and Faloutsos, C. (1987). The R+-tree: A Dynamic Index for Multi-Dimensional Objects, VLDB, 507–518.
Stephen, G.A. (1994). String Searching Algorithms, Lectures Notes Series on Computing, World Scientific, 3.
Tomasic, A., Garcia-Molina, H., and Shoens, K. (1994). Incremental Updates of Inverted Lists for Text Document Retrievals, ACM SIGMOD.
Ukkonen, E. (1992). Constructing Suffix-Trees On-Line in Linear Time, Algorithms, Software, Architecture: Information Processing 92, Amsterdam: Elsevier, 1, 484–492.
Ukkonen, E., (1993). Approximate matching over suffix trees. In Proc. Combinatorial Pattern Matching(vol. 4, pp. 228–242), Springer-Verlag.
Wang, J.T.L., Chirn, G.W., Marr, T.G., Shapiro, B., Shasha, D., and Zhang, K. (1994). Combinatorial Pattern Discovery for Scientific Sata: Some Preliminary Results, ACM SIGMOD, 115–125.
Weiner, P. (1973). Linear pattern matching algorithms, Conf. Record, IEEE 14th Annual Symposium on Switching and Automata Theory(pp. 1–11).
Wu, S. and Manber, U. (1992). Fast Text Searching Allowing Errors, CACM, 35(10), 83–91.
Zobel, J., Moffat, A., and Sacks-Davis, R. (1993). Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files, VLDB, 290–301.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wang, K. Discovering Patterns from Large and Dynamic Sequential Data. Journal of Intelligent Information Systems 9, 33–56 (1997). https://doi.org/10.1023/A:1008689103430
Issue Date:
DOI: https://doi.org/10.1023/A:1008689103430