CAMLS: A Constraint-Based Apriori Algorithm for Mining Long Sequences

Gonen, Yaron; Gal-Oz, Nurit; Yahalom, Ran; Gudes, Ehud

doi:10.1007/978-3-642-12026-8_7

Yaron Gonen²⁰,
Nurit Gal-Oz²⁰,
Ran Yahalom²⁰ &
…
Ehud Gudes²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5981))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1327 Accesses
5 Citations

Abstract

Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.

Supported by the IMG4 consortium under the MAGNET program of the Israel ministry of trade and industry; and the Lynn and William Frankel center for computer science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal, special issue on Unsupervised Learning, 31–60 (2001)
Google Scholar
Zaki, M.J.: Sequence mining in categorical domains: incorporating constraints. In: 9th Int. Conf. on Information and knowledge management, pp. 422–429. ACM, New York (2000)
Google Scholar
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: 11th Int. Conf. Data Engineering, pp. 3–14. IEEE Computer Society, Los Alamitos (1995)
Google Scholar
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. J. SIGMOD Rec. 31(1), 76–77 (2002)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)
Article Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: The PrefixSpan approach. J. IEEE Transactions on Knowledge and Data Engineering 16 (2004)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.: Discovery of Frequent Episodes in Event Sequences. J. Data Min. Knowl. Discov. 1(3), 259–289 (1997)
Article Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation A Frequent-Pattern Tree Approach. J. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Orlando, S., Perego, R., Silvestri, C.: A new algorithm for gap constrained sequence mining. In: The 2004 ACM symposium on Applied computing, pp. 540–547. ACM, New York (2004)
Chapter Google Scholar
Pyle, D.: Data preparation for data mining. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Torgo, L.: Daily stock prices from January 1988 through October 1991, for ten aerospace companies, http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th International Conference on Extending Database Technology. Springer, Heidelberg (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Ben Gurion University of the Negev, Israel
Yaron Gonen, Nurit Gal-Oz, Ran Yahalom & Ehud Gudes

Authors

Yaron Gonen
View author publications
You can also search for this author in PubMed Google Scholar
Nurit Gal-Oz
View author publications
You can also search for this author in PubMed Google Scholar
Ran Yahalom
View author publications
You can also search for this author in PubMed Google Scholar
Ehud Gudes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, Tennodai, Tsukuba, 305–8573, Ibaraki, Japan
Hiroyuki Kitagawa
Information Technology Center, Nagoya University, Furo-cho, Chikusa-ku, 464-8601, Nagoya, Japan
Yoshiharu Ishikawa
City University of Hong Kong, Department of Computer Science, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Qing Li
Department of Information Science, Ochanomizu University, 2-1-1, Otsuka, Bunkyo-ku, 112-8610, Tokyo, Japan
Chiemi Watanabe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonen, Y., Gal-Oz, N., Yahalom, R., Gudes, E. (2010). CAMLS: A Constraint-Based Apriori Algorithm for Mining Long Sequences. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 5981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12026-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-12026-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12025-1
Online ISBN: 978-3-642-12026-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics