Skip to main content

Processing Sequential Patterns in Relational Databases

  • Conference paper
Journal on Data Semantics VIII

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 4380))

  • 315 Accesses

Abstract

Integrating data mining techniques into database systems has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementations. Reasons for this are among others the prohibitive nature of the cost associated with extracting knowledge as well as the lack of suitable declarative query language support. Recent studies have found that for association rule mining and sequential pattern mining with carefully tuned SQL formulations it is possible to achieve performance comparable to systems that cache the data in files outside the DBMS. However, most of the previous pattern mining methods follow the method of Apriori, which still encounters problems when a sequential database is large and/or when sequential patterns to be mined are numerous and long.

In this paper, we present a novel SQL based approach that we recently proposed, called Prospad (PROjection Sequential PAttern Discovery). Prospad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach is a pattern growth-based approach without candidate generation. It grows longer patterns from shorter ones by successively projecting the sequential table into subsequential tables. Since a projected table for a sequential pattern i contains all and only necessary information for mining the sequential patterns that can grow from i, the size of the projected table usually reduces quickly as mining proceeds to longer patterns. Moreover, a depth first approach is used to facilitate the projecting process in order to avoid creating and dropping costs of temporary tables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining using a Bitmap Representation. In: Knowledge Discovery and Data Mining. 8th Int. Conference, pp. 429–435. ACM Press, New York, NY, USA (2002)

    Google Scholar 

  2. Antunes, C., Oliveira, A.L.: Sequential Pattern Mining Algorithms: Trade-offs between Speed and Memory. In: 2nd Int. Workshop on Mining Graphs, Trees and Sequences, Pisa, Italy, (September 2004)

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Data Engineering (ICDE’95). 11th Int. Conference, Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press (1995)

    Google Scholar 

  4. Chaudhuri, S.: Data Mining and Database Systems: Where is the Intersection? Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 21(1) (March 1998)

    Google Scholar 

  5. Han, J., Fu, Y., Wang, W.: DMQL: A Data Mining Query Language for Relational Database. In: Proc. of the 1996 SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada (1996)

    Google Scholar 

  6. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of the Int. Conf. on Data Engineering (ICDE’01), pp. 215–224, Heidelberg, Germany (April 2001)

    Google Scholar 

  7. Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Berlin Heidelberg New York (1996)

    Chapter  Google Scholar 

  8. Shang, X., Sattler, K.: Depth-First Frequent Itemset Mining in Relational Databases. In: Proc. ACM Symposium on Applied Computing SAC 2005, New Mexico, USA (2005)

    Google Scholar 

  9. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. In: Proc. of the Int. Conf. on Management of Data (SIGMOD’98), pp. 345–354. ACM Press, New York (1998)

    Google Scholar 

  10. Toivonen, H.: Sampling Large Databases for Association Rules. In: Proc. of Int. Conf. Very Large Data Bases (VLDB’96), pp. 134–145 (1996)

    Google Scholar 

  11. Thomas, S., Sarawagi, S.: Mining Generalized Association Rules and Sequential Patterns Using SQL Queries. In: Proc. of the 4th Int. Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 344–348 (1998)

    Google Scholar 

  12. Wojciechowski, M.: Mining Various Patterns in Sequential Data in an SQL-like Manner. In: Advances in Databases and Information Systems, 3rd East European Conference (ADBIS’99A) – Short Papers, pp. 131–138 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Stefano Spaccapietra Paolo Atzeni François Fages Mohand-Saïd Hacid Michael Kifer John Mylopoulos Barbara Pernici Pavel Shvaiko Juan Trujillo Ilya Zaihrayeu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Shang, X., Sattler, KU. (2007). Processing Sequential Patterns in Relational Databases. In: Spaccapietra, S., et al. Journal on Data Semantics VIII. Lecture Notes in Computer Science, vol 4380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70664-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70664-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70663-2

  • Online ISBN: 978-3-540-70664-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics