Abstract
The data storage paradigm has changed in the last decade, from operational databases to data repositories that make easier to analyze data and mining information. Among those, the primary multidimensional model represents data through star schemas, where each relation denotes an event involving a set of dimensions or business perspectives. Mining data modeled as a star schema presents two major challenges, namely: mining extremely large amounts of data and dealing with several data tables at the same time. In this paper, we describe an algorithm—Star FP Stream, in detail. This algorithm aims for finding the set of frequent patterns in a large star schema, mining directly the data, in their original structure, and exploring the most efficient techniques for mining data streams. Experiments were conducted over two star schemas, in the healthcare and sales domains.
Similar content being viewed by others
Notes
Dimensions can also be streams. Since for a foreign key to appear in the fact table, it must be already created and populated in the respective dimension, only the fact table needs to be treated as a stream.
AdventureWorks Sample Data Warehouse is available at http://sqlserversamples.codeplex.com/.
The Hepatitis dataset was made available as part of the ECML/PKDD 2005 Discovery Challenge: http://lisp.vse.cz/challenge/CURRENT/.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB 94: Proceedings of the 20th International Conference on Very Large Data Bases, Morgan Kaufmann, San Francisco, USA, pp 487–499
Appice A, Ceci M, Turi A, Malerba D (2011) A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell Data Anal 15(1):69–88
Crestana-Jensen V, Soparkar N (2000) Frequent itemset counting across multiple tables. In: PADKK 00: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, Springer, London, pp. 49–61
Dehaspe L, Raedt LD (1997) Mining association rules in multiple relations. In: ILP 97: Proceedings of the 7th International Workshop on Inductive Logic Programming, Springer, London, UK, pp. 125–132
Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1):1–16
Fonseca N, Silva F, Camacho R (2005) Strategies to parallelize ilp systems. In: Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 05), Springer, Berlin, Heidelberg, pp. 136–153
Fumarola F, Ciampi A, Appice A, Malerba D (2009) A sliding window algorithm for relational frequent patterns mining from data streams. In: Proceedings of the 12th International Conference on Discovery Science, Springer, pp. 385–392
Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities: next generation data mining. AAAI/MIT, Cambridge
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD 00: Proceedings of the 2000 ACM SIGMOD, ACM, New York, NY, USA, pp. 1–12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Hou W, Yang B, Xie Y, Wu C (2009) Mining multi-relational frequent patterns in data streams. In: BIFE 09: Proceedings of the Second International Conference on Business Intelligence and Financial Engineering, pp. 205–209
Kimball R, Ross M (2002) The data warehouse Toolkit: the complete guide to dimensional modeling, 2nd edn. Wiley, New York
Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1):1–30
Malerba D, Esposito F, Lisi FA (2001) A logical framework for frequent pattern discovery in spatial data. FLAIRS Conference. AAAI Press, Florida, USA, pp. 557–561
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB 02: Proceedings of the 28th International Conference on Very Large Data Bases, Morgan Kaufman, Hong Kong, China, pp. 346–357
Ng EKK, Fu AWC, Wang K (2002) Mining association rules from stars. In: ICDM 02: Proceedings of the 2002 IEEE International Conference on Data Mining, IEEE, Japan, pp. 322–329
Nijssen S, Kok JN (2001) Faster association rules for multiple relations. In: IJCAI 01: Proceedings of the 17th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco, CA, USA, vol 2, pp. 891–896
Rastogi R, Shim K (1998) Mining optimized association rules with categorical and numeric attributes. In: ICDE, pp. 503–512
Silva A, Antunes C (2010) Pattern mining on stars with fp-growth. In: MDAI 2010: Proceedings of the 7th International Conference on Modeling Decisions for Artificial Intelligence, Springer, Perpignan, France, pp. 175–186
Silva A, Antunes C (2012a) Finding patterns in large star schemas at the right aggregation level. In: MDAI 2012: Proceedings of the 9th International Conference on Modeling Decisions for Artificial Intelligence, Springer, pp. 329–340
Silva A, Antunes C (2012) Mining patterns from large star schemas based on streaming algorithms. In: Lee R (ed) Computer and information science 2012: studies in computational intelligence. Springer, Berlin
Silva A, Antunes C (2014a) Finding multi-dimensional patterns in healthcare. In: MLDM 14: Proceedings of the 10th International Conference on Machine Learning and Data Mining, Springer, St. Petersborg, Russia
Silva A, Antunes C (2014b) Mining multi-dimensional patterns for student modeling. In: EDM 14: Proceedings of the 7th International Conference on Educational Data Mining, London, UK
Silva A, Antunes C (2014c) Multi-dimensional pattern mining: a case study in healthcare. In: ICEIS 14: Proceedings of the 16th International Conference on Enterprise Information Systems, Morgan Kaufmann, Lisbon, Portugal
Watanabe T, Susuki E, Yokoi H, Takabayashi K, (2003) Application of prototypelines to chronic hepatitis data. In: ECML, PKDD 2003 Discovery Challenge. Cavtat, Croatia
Xu LJ, Xie KL (2006) A novel algorithm for frequent itemset mining in data warehouses. J Zhejiang Univ A 7(2):216–224
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(4):597–604
Acknowledgments
This work was partially supported by FCT – Fundação para a Ciência e a Tecnologia, under project educare (PTDC/EIA-EIA/110058/2009) and PhD Grant SFRH/ BD/64108/2009.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Bart Goethals.
Rights and permissions
About this article
Cite this article
Silva, A., Antunes, C. Multi-relational pattern mining over data streams. Data Min Knowl Disc 29, 1783–1814 (2015). https://doi.org/10.1007/s10618-014-0394-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0394-6