Skip to main content
Log in

Multi-relational pattern mining over data streams

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The data storage paradigm has changed in the last decade, from operational databases to data repositories that make easier to analyze data and mining information. Among those, the primary multidimensional model represents data through star schemas, where each relation denotes an event involving a set of dimensions or business perspectives. Mining data modeled as a star schema presents two major challenges, namely: mining extremely large amounts of data and dealing with several data tables at the same time. In this paper, we describe an algorithm—Star FP Stream, in detail. This algorithm aims for finding the set of frequent patterns in a large star schema, mining directly the data, in their original structure, and exploring the most efficient techniques for mining data streams. Experiments were conducted over two star schemas, in the healthcare and sales domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Dimensions can also be streams. Since for a foreign key to appear in the fact table, it must be already created and populated in the respective dimension, only the fact table needs to be treated as a stream.

  2. AdventureWorks Sample Data Warehouse is available at http://sqlserversamples.codeplex.com/.

  3. The Hepatitis dataset was made available as part of the ECML/PKDD 2005 Discovery Challenge: http://lisp.vse.cz/challenge/CURRENT/.

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB 94: Proceedings of the 20th International Conference on Very Large Data Bases, Morgan Kaufmann, San Francisco, USA, pp 487–499

  • Appice A, Ceci M, Turi A, Malerba D (2011) A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell Data Anal 15(1):69–88

    Google Scholar 

  • Crestana-Jensen V, Soparkar N (2000) Frequent itemset counting across multiple tables. In: PADKK 00: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, Springer, London, pp. 49–61

  • Dehaspe L, Raedt LD (1997) Mining association rules in multiple relations. In: ILP 97: Proceedings of the 7th International Workshop on Inductive Logic Programming, Springer, London, UK, pp. 125–132

  • Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1):1–16

    Article  Google Scholar 

  • Fonseca N, Silva F, Camacho R (2005) Strategies to parallelize ilp systems. In: Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 05), Springer, Berlin, Heidelberg, pp. 136–153

  • Fumarola F, Ciampi A, Appice A, Malerba D (2009) A sliding window algorithm for relational frequent patterns mining from data streams. In: Proceedings of the 12th International Conference on Discovery Science, Springer, pp. 385–392

  • Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities: next generation data mining. AAAI/MIT, Cambridge

    Google Scholar 

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD 00: Proceedings of the 2000 ACM SIGMOD, ACM, New York, NY, USA, pp. 1–12

  • Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87

    Article  MathSciNet  Google Scholar 

  • Hou W, Yang B, Xie Y, Wu C (2009) Mining multi-relational frequent patterns in data streams. In: BIFE 09: Proceedings of the Second International Conference on Business Intelligence and Financial Engineering, pp. 205–209

  • Kimball R, Ross M (2002) The data warehouse Toolkit: the complete guide to dimensional modeling, 2nd edn. Wiley, New York

    Google Scholar 

  • Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1):1–30

    Article  MATH  Google Scholar 

  • Malerba D, Esposito F, Lisi FA (2001) A logical framework for frequent pattern discovery in spatial data. FLAIRS Conference. AAAI Press, Florida, USA, pp. 557–561

  • Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB 02: Proceedings of the 28th International Conference on Very Large Data Bases, Morgan Kaufman, Hong Kong, China, pp. 346–357

  • Ng EKK, Fu AWC, Wang K (2002) Mining association rules from stars. In: ICDM 02: Proceedings of the 2002 IEEE International Conference on Data Mining, IEEE, Japan, pp. 322–329

  • Nijssen S, Kok JN (2001) Faster association rules for multiple relations. In: IJCAI 01: Proceedings of the 17th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco, CA, USA, vol 2, pp. 891–896

  • Rastogi R, Shim K (1998) Mining optimized association rules with categorical and numeric attributes. In: ICDE, pp. 503–512

  • Silva A, Antunes C (2010) Pattern mining on stars with fp-growth. In: MDAI 2010: Proceedings of the 7th International Conference on Modeling Decisions for Artificial Intelligence, Springer, Perpignan, France, pp. 175–186

  • Silva A, Antunes C (2012a) Finding patterns in large star schemas at the right aggregation level. In: MDAI 2012: Proceedings of the 9th International Conference on Modeling Decisions for Artificial Intelligence, Springer, pp. 329–340

  • Silva A, Antunes C (2012) Mining patterns from large star schemas based on streaming algorithms. In: Lee R (ed) Computer and information science 2012: studies in computational intelligence. Springer, Berlin

    Google Scholar 

  • Silva A, Antunes C (2014a) Finding multi-dimensional patterns in healthcare. In: MLDM 14: Proceedings of the 10th International Conference on Machine Learning and Data Mining, Springer, St. Petersborg, Russia

  • Silva A, Antunes C (2014b) Mining multi-dimensional patterns for student modeling. In: EDM 14: Proceedings of the 7th International Conference on Educational Data Mining, London, UK

  • Silva A, Antunes C (2014c) Multi-dimensional pattern mining: a case study in healthcare. In: ICEIS 14: Proceedings of the 16th International Conference on Enterprise Information Systems, Morgan Kaufmann, Lisbon, Portugal

  • Watanabe T, Susuki E, Yokoi H, Takabayashi K, (2003) Application of prototypelines to chronic hepatitis data. In: ECML, PKDD 2003 Discovery Challenge. Cavtat, Croatia

  • Xu LJ, Xie KL (2006) A novel algorithm for frequent itemset mining in data warehouses. J Zhejiang Univ A 7(2):216–224

    Article  MathSciNet  MATH  Google Scholar 

  • Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(4):597–604

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by FCT – Fundação para a Ciência e a Tecnologia, under project educare (PTDC/EIA-EIA/110058/2009) and PhD Grant SFRH/ BD/64108/2009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreia Silva.

Additional information

Responsible editor: Bart Goethals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Silva, A., Antunes, C. Multi-relational pattern mining over data streams. Data Min Knowl Disc 29, 1783–1814 (2015). https://doi.org/10.1007/s10618-014-0394-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0394-6

Keywords

Navigation