Skip to main content

Parallel Algorithm for Mining Frequent Closed Sequences

  • Conference paper
  • First Online:
Autonomous Intelligent Systems: Agents and Data Mining (AIS-ADM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3505))

Abstract

Previous studies have presented convincing arguments that a frequent sequence mining algorithm should not mine all frequent sequences but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, frequent closed sequence mining is still challenging on stand-alone for its large size and high dimension. In this paper, an algorithm, PFCSeq, is presented for mining frequent closed sequence based on distributed-memory parallel machine, in which each processor mines local frequent closed sequence set independently using task parallelism with data parallelism approach, and only two communications are needed except that imbalance is detected. Therefore, time spent in communications is significantly reduced. In order to ensure good load balance among processors, a dynamic workload balance strategy is proposed. Experiments show that it is linearly scalable in terms of database size and the number of processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Mining Sequential patterns. In: ICDE 1995, Taipei, Taiwan (March 1995)

    Google Scholar 

  2. Ayres, J., Gehuke, J., Yiu, T., Flannick, J.: Sequential Pattern Mining using a Bitmap Representation. In: SIGKDD 2002, Edmonton, Canada (July 2002)

    Google Scholar 

  3. Zaki, M.J.: Parallel sequence mining on smp machines. In: Workshop On Large-Scale Paralle KDD System(in conjunction 5th ACM SIGKDD International Conference on Konwledge Discovery and Data Mining), san Diego, CA, August 1999, pp. 57–65 (1999)

    Google Scholar 

  4. Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: 2nd Pacific-Asia Conf. on Knowledge Discovery and Data Mining (April 1998)

    Google Scholar 

  5. Agrawal, R., Shafer, J.C.: Parallel Mining of Association Rules. IEEE Trans.on knowledge and Data Engineering 8(6) (1996)

    Google Scholar 

  6. Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Databases. In: SDM 2003, San Franciso, CA (May 2003)

    Google Scholar 

  7. Wang, J., Han, J.: BIDE: Efficient Mining of Frequent Closed Sequences. In: ICDE 2004, Boston, USA (2004)

    Google Scholar 

  8. Agarwall, R.C., Aggarwal, C., et al.: A tree projection algorithm for generation for frequent itemsets. Journal of Parallel and Distributed Computing (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ma, C., Li, Q. (2005). Parallel Algorithm for Mining Frequent Closed Sequences. In: Gorodetsky, V., Liu, J., Skormin, V.A. (eds) Autonomous Intelligent Systems: Agents and Data Mining. AIS-ADM 2005. Lecture Notes in Computer Science(), vol 3505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11492870_15

Download citation

  • DOI: https://doi.org/10.1007/11492870_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26164-3

  • Online ISBN: 978-3-540-31932-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics