Skip to main content
Log in

Discovering Patterns from Large and Dynamic Sequential Data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Most daily and scientific data are sequential in nature. Discoveringimportant patterns from such data can benefit the user and scientist bypredicting coming activities, interpreting recurring phenomena, extractingoutstanding similarities and differences for close attention, compressingdata, and detecting intrusion. We consider the following incrementaldiscovery problem for large and dynamic sequential data. Suppose thatpatterns were previously discovered and materialized. An update is made tothe sequential database. An incremental discovery will take advantage ofdiscovered patterns and compute only the change by accessing the affectedpart of the database and data structures. In addition to patterns, thestatistics and position information of patterns need to be updated to allowfurther analysis and processing on patterns. We present an efficientalgorithm for the incremental discovery problem. The algorithm is applied tosequential data that honors several sequential patterns modeling weatherchanges in Singapore. The algorithm finds what it is supposed to find.Experiments show that for small updates and large databases, the incrementaldiscovery algorithm runs in time independent of the data size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R. and Srikant, R. (1995). Mining sequential patterns. IEEE Conference on Data Engineering(pp. 3–14).

  • Agrawal, R., Lin, K.I., Sawhney, H.S., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, VLDB, 490–501.

  • Baeza-Yates, R. (1992). Text Retrieval: Theory and Practice, Algorithms, Software, Architecture: Information Processing, 1, 465–476.

    Google Scholar 

  • Baeza-Yates, R. and Gonnet, G.H. (1992). A New Approach to Text Searching, CACM, 35(10), 74–82.

    Google Scholar 

  • Boyer, R.S. and Moore, J.S. (1977). A Fast String Searching Algorithm, CACM, 20(10), 762–772.

    Google Scholar 

  • Cobbs, A.L. (1995). Fast approximate matching using suffix trees. In Proc. Combinatorial Pattern Matching, Lecture Notes in Computer Science (vol. 937, pp. 41–54), Springer-Verlag.

    Google Scholar 

  • Dietterich, T.G. and Michalski, R.S. (1985). Discovering Patterns in Strings of Events, Artificial Intelligence, 25, 187–232.

    Google Scholar 

  • Faloutsos, C. (1985). Access Methods for Text, ACM Computing Surveys, 17, 49–74.

    Google Scholar 

  • Fayyad, U., Shapiro, G.P., and Smyth, P. (1996). Knowledge Discovery and Data Mining: Towards a Unifying Framework, KDD, 82–88.

  • Gonnet, G.H. and Baeza-Yates, R. (1991). Handbook of Algorithms and Data Structures in Pascal and C, second edition.

  • Guttman, A. (1984). R-trees: A Dynamic Index Structure for Spatial Searching, ACM SIGMOD, 47–57.

  • Hui, L.C.K. (1992). Color Set Size Problem with Applications to String Matching. In A. Apostolico et al. (Eds.), Combinatorial Patterns Matching, Lecture Notes in Computer Science, 644, 230–243, Springer-Verlag.

  • Knuth, D.E., Morris, J.H., and Pratt, V.R. (1977). Fast Pattern Matching in Strings, SIAM J. Comput.6, 323–350.

    Google Scholar 

  • Landau, G.M. and Vishkin, U. (1989). Fast Parallel and Serial Approximate String Matching, Journal of Algorithms, 10(2), 157–169.

    Google Scholar 

  • McCreight, E.M. (1976). A Space-Economical Suffix Tree Construction Algorithm, JACM, 23(2), 262–272.

    Google Scholar 

  • Sellis, T., Roussopoulos, N., and Faloutsos, C. (1987). The R+-tree: A Dynamic Index for Multi-Dimensional Objects, VLDB, 507–518.

  • Stephen, G.A. (1994). String Searching Algorithms, Lectures Notes Series on Computing, World Scientific, 3.

  • Tomasic, A., Garcia-Molina, H., and Shoens, K. (1994). Incremental Updates of Inverted Lists for Text Document Retrievals, ACM SIGMOD.

  • Ukkonen, E. (1992). Constructing Suffix-Trees On-Line in Linear Time, Algorithms, Software, Architecture: Information Processing 92, Amsterdam: Elsevier, 1, 484–492.

    Google Scholar 

  • Ukkonen, E., (1993). Approximate matching over suffix trees. In Proc. Combinatorial Pattern Matching(vol. 4, pp. 228–242), Springer-Verlag.

    Google Scholar 

  • Wang, J.T.L., Chirn, G.W., Marr, T.G., Shapiro, B., Shasha, D., and Zhang, K. (1994). Combinatorial Pattern Discovery for Scientific Sata: Some Preliminary Results, ACM SIGMOD, 115–125.

  • Weiner, P. (1973). Linear pattern matching algorithms, Conf. Record, IEEE 14th Annual Symposium on Switching and Automata Theory(pp. 1–11).

  • Wu, S. and Manber, U. (1992). Fast Text Searching Allowing Errors, CACM, 35(10), 83–91.

    Google Scholar 

  • Zobel, J., Moffat, A., and Sacks-Davis, R. (1993). Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files, VLDB, 290–301.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K. Discovering Patterns from Large and Dynamic Sequential Data. Journal of Intelligent Information Systems 9, 33–56 (1997). https://doi.org/10.1023/A:1008689103430

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008689103430

Navigation