Abstract
Previous studies have presented convincing arguments that a frequent sequence mining algorithm should not mine all frequent sequences but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, frequent closed sequence mining is still challenging on stand-alone for its large size and high dimension. In this paper, an algorithm, PFCSeq, is presented for mining frequent closed sequence based on distributed-memory parallel machine, in which each processor mines local frequent closed sequence set independently using task parallelism with data parallelism approach, and only two communications are needed except that imbalance is detected. Therefore, time spent in communications is significantly reduced. In order to ensure good load balance among processors, a dynamic workload balance strategy is proposed. Experiments show that it is linearly scalable in terms of database size and the number of processors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Mining Sequential patterns. In: ICDE 1995, Taipei, Taiwan (March 1995)
Ayres, J., Gehuke, J., Yiu, T., Flannick, J.: Sequential Pattern Mining using a Bitmap Representation. In: SIGKDD 2002, Edmonton, Canada (July 2002)
Zaki, M.J.: Parallel sequence mining on smp machines. In: Workshop On Large-Scale Paralle KDD System(in conjunction 5th ACM SIGKDD International Conference on Konwledge Discovery and Data Mining), san Diego, CA, August 1999, pp. 57–65 (1999)
Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining (April 1998)
Agrawal, R., Shafer, J.C.: Parallel Mining of Association Rules. IEEE Trans.on knowledge and Data Engineering 8(6) (1996)
Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Databases. In: SDM 2003, San Franciso, CA (May 2003)
Wang, J., Han, J.: BIDE: Efficient Mining of Frequent Closed Sequences. In: ICDE 2004, Boston, USA (2004)
Agarwall, R.C., Aggarwal, C., et al.: A tree projection algorithm for generation for frequent itemsets. Journal of Parallel and Distributed Computing (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ma, C., Li, Q. (2005). Parallel Algorithm for Mining Frequent Closed Sequences. In: Gorodetsky, V., Liu, J., Skormin, V.A. (eds) Autonomous Intelligent Systems: Agents and Data Mining. AIS-ADM 2005. Lecture Notes in Computer Science(), vol 3505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11492870_15
Download citation
DOI: https://doi.org/10.1007/11492870_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26164-3
Online ISBN: 978-3-540-31932-0
eBook Packages: Computer ScienceComputer Science (R0)