A formal series-based unification of the frequent itemset mining approaches

Oulad-Naoui, Slimane; Cherroun, Hadda; Ziadi, Djelloul

doi:10.1007/s10115-017-1048-y

A formal series-based unification of the frequent itemset mining approaches

Regular Paper
Published: 03 April 2017

Volume 53, pages 439–477, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

353 Accesses
4 Altmetric
Explore all metrics

Abstract

Over the last two decades, a great deal of work has been devoted to the algorithmic aspects of the frequent itemset (FI) mining problem, leading to a phenomenal number of algorithms and associated implementations, each of which claims supremacy. Meanwhile, it is generally well agreed that developing a unifying theory is one of the most important issues in data mining research. Hence, our primary motivation for this work is to introduce a high-level formalism for this basic problem, which induces a unified vision of the algorithmic approaches presented so far. The key distinctive feature of the introduced model is that it combines, in one fashion, both the qualitative and the quantitative aspects of this basic problem. In this paper, we propose a new model for the FI-mining task based on formal series. In fact, we encode the itemsets as words over a sorted alphabet and express this problem by a formal series over the counting semiring $(\mathbb N,+,\times ,0,1)$, whose range represents the itemsets, and the coefficients are their supports. The aim is threefold: First, to define a clear, unified and extensible theoretical framework through which we can state the main FI-approaches. Second, to prove a convenient connection between the determinization of the acyclic weighted automaton that represents a transaction dataset and the computation of the associated collection of FI. Finally, to devise a first algorithmic transcription, baptized Wafi, of our model by means of weighted automata, which we evaluate against representative leading algorithms. The obtained results show the suitability of our formalism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining frequent generators and closures in data streams with FGC-Stream

Article 03 April 2023

Interesting Patterns

Finding Frequent Patterns in Parallel Point Processes

Notes

In the counting semiring and by application of the $\otimes $ operation in general.
In our examples throughout the paper, we consider for easiness that items are sorted according to their lexicographic order.
In our model, an accessible frequent state is a state reachable, using or not $\epsilon $-moves, from the initial state, for which the corresponding coefficient of the associated path from the initial state is also greater than the support threshold.
The sense of the derivation does not matter and usually yields the same final coefficient. However, the number of steps needed may be different; it depends on the defined ordering and the given dataset.
To be precise: $|E| = |Q|-1$.

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington DC, USA, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB’94, proceedings of 20th international conference on very large data bases, 12–15 Sept 1994, Santiago de Chile, Chile, pp 487–499. http://www.vldb.org/conf/1994/P487.PDF
Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining–a general survey and comparison. Sigkdd Explor 2(1):58–64. doi:10.1145/360402.360421
Article Google Scholar
Goethals B, Zaki MJ (eds) (2003) FIMI ’03, In: Proceedings of the workshop on FIM Implementations, Melbourne, Florida, USA. CEUR workshop proceedings, vol. 90
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86. doi:10.1007/s10618-006-0059-1
Article MathSciNet Google Scholar
Borgelt C (2012) Frequent item set mining’. Wiley Interdisc Rew Data Min Knowl Discov 2(6):437–456. doi:10.1002/widm.1074
Aggarwal CC, Bhuiyan M, Hasan MA (2014) Frequent pattern mining algorithms: a survey. In: Frequent pattern mining, pp 19–64 doi:10.1007/978-3-319-07821-2_2
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, 24–27 Aug 2003, pp 326–335. doi:10.1145/956750.956788
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, 16–18 May 2000, Dallas, Texas, USA, pp 1–12. doi:10.1145/342009.335372
Bayardo R (1998) Efficiently mining long patterns from databases. In: SIGMOD 1998, proceedings ACM SIGMOD international conference on management of data, 2–4 June 1998, Seattle, Washington, USA, pp 85–93. doi:10.1145/276304.276313
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, ICDT ’99, Springer, Verlag, London, UK, pp 398–416. http://dl.acm.org/citation.cfm?id=645503.656256
Cheung DWL, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: Proceedings of the fifth international conference on database systems for advanced applications (DASFAA). World Scientific Press, pp 185–194. http://dl.acm.org/citation.cfm?id=646711.703155
Valtchev P, Missaoui R, Godin R (2008) A framework for incremental generation of closed itemsets. Discrete Appl Math 156(6):924–949. doi:10.1016/j.dam.2007.08.004
Article MathSciNet MATH Google Scholar
Barbut M, Monjardet B (1970) Ordre et classification: algèbre et combinatoire. Classiques Hachette, Hachette. http://books.google.fr/books?id=n3BpSgAACAAJ
Davey BA, Priestley HA (1990) Introduction to lattices and order. Cambridge University Press, Cambridge. http://www.worldcat.org/search?qt=worldcat_org_all&q=0521367662
Godin R, Missaoui R, Alaoui H (1995) Incremental concept formation algorithms based on galois (concept) lattices. Comput Intell 11:246–267. doi:10.1111/j.1467-8640.1995.tb00031.x
Article Google Scholar
Zaki MJ, Ogihara M (1998) Theoretical foundations of association rules. In: 3rd ACM SIGMOD workshop on research issues in data mining and knowledge discovery, June 1998
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(4):597–604. doi:10.1142/S0219622006002258
Article Google Scholar
Hoare T (1996) Unification of theories: a challenge for computing science. In: Haveraaen M, Owe O, Dahl O-J (eds) Recent trends in data type specification, 11th workshop on specification of abstract data types joint with the 8th COMPASS workshop, Oslo, Norway, 19–23 Sept 1995, selected papers, Springer, Berlin, Heidelberg, pp 49–57
Oulad-Naoui S, Cherroun H, Ziadi D (2015) A unifying polynomial model for efficient discovery of frequent itemsets. In: Proceedings of 4th international conference on data management technologies and applications, pp 49–59. doi:10.5220/0005516200490059
Salomaa A, Soittola M, Bauer F, Gries D (1978) Automata-theoretic aspects of formal power series. Texts and monographs in computer science. Springer, Verlag. http://books.google.fr/books?id=TtdQAAAAMAAJ
Berstel J, Reutenauer C (1988) Rational series and their languages. EATCS monographs on theoretical computer science. Springer, Verlag. http://books.google.fr/books?id=ZdhQAAAAMAAJ
Hopcroft JE, Motwani R, Ullman JD (2001) Introduction to automata theory, languages, and computation–Addison-Wesley series in computer science, 2nd edn. Addison-Wesley-Longman, Lodon
MATH Google Scholar
Pin J-E (1988) Tropical semirings. In: Gunawardena J (ed) Idempotency. Cambridge University Press, Cambridge, pp 50–69
Google Scholar
Cheung W, Zaïane OR (2003) Incremental mining of frequent patterns without candidate generation or support constraint. In: 7th International database engineering and applications symposium (IDEAS 2003), July 16–18 2003, Hong Kong, China, pp 111–116. doi:10.1109/IDEAS.2003.1214917
Goethals B (2004) Memory issues in frequent itemset mining. In: Proceedings of the 2004 ACM symposium on applied computing (SAC), Nicosia, Cyprus, 14-17 March 2004, pp 530–534
Totad SG, Geeta RB, Reddy PVGDP (2012) Batch incremental processing for fp-tree construction using fp-growth algorithm. Knowl Inf Syst 33(2):475–490. doi:10.1007/s10115-012-0514-9
Article Google Scholar
Droste M, Stüber T, Vogler H (2010) Weighted finite automata over strong bimonoids. Inf Sci 180(1):156–166. doi:10.1016/j.ins.2009.09.003
Article MathSciNet MATH Google Scholar
Pijls W, Kosters WA (2010) Mining frequent itemsets: a perspective from operations research. Stat Neerl. 64(4):367–387. doi:10.1111/j.1467-9574.2010.00452.x
Article MathSciNet Google Scholar
Achar A, Laxman S, Sastry P (2012) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31(2):223–250. doi:10.1007/s10115-011-0408-2
Article Google Scholar
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289. doi:10.1023/A:1009748302351
Article Google Scholar
Mohri M (2009) Weighted automata algorithms. In: Droste M, Kuich W, Vogler H (eds) Handbook of weighted automata, monographs in theoretical computer science. An EATCS series. Springer, Berlin, pp 213–254. doi:10.1007/978-3-642-01492-5_6
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng AFM, Liu B, Yu PS, Steinbach Zhou Z-H, M, Hand DJ, Steinberg D, (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37. doi:10.1007/s10115-007-0114-2
Schmidt-Thieme L (2004) Algorithmic features of eclat.In: FIMI ’04, proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, Brighton, UK, Nov 1. http://ceur-ws.org/Vol-126/schmidtthieme.pdf
Lv Deng Z-H, S-L, (2015) Prepost${}^{\text{+}}$: an efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst Appl 42(13):5424–5432. doi:10.1016/j.eswa.2015.03.004
Cohen E, Halperin E, Kaplan H, Zwick U (2002) Reachability and distance queries via 2-hop labels. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’02. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA pp 937–946
Deng Z-H, Wang Z (2010) A new fast vertical method for mining frequent patterns. Int J Comput Intell Syst 3(6):733–744. doi:10.1080/18756891.2010.9727736
Wang Deng Z-H, Z, Jiang J-J, (2012) A new algorithm for fast mining frequent itemsets using n-lists. Sci China Inf Sci 55(9):2008–2030. doi:10.1007/s11432-012-4638-z
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, Boston
MATH Google Scholar
fimdr (2003) Fimi repository for frequent itemset mining, implementations and datasets. http://fimi.ua.ac.be/data/
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fournier-Viger P, Lin JC-W, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery PKDD 2016, pp 36–40
Rácz B, Bodon F, Schmidt-Thieme L (2005) On benchmarking frequent itemset mining algorithms: From measurement to analysis. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, OSDM ’05, ACM, New York, NY, USA, pp 36–45. doi:10.1145/1133905.1133911

Download references

Acknowledgements

we would like to sincerely thank the anonymous reviewers for their time devoted to thoughtfully reading our manuscript, and for their various insightful remarks and comments which helped us to improve the quality of the paper. This work was supported by Algeria/South Africa joint project under code: A/AS-2013-002.

Author information

Authors and Affiliations

Laboratoire d’Informatique et de Mathématiques, Université Amar Telidji, Laghouat, Algeria
Slimane Oulad-Naoui & Hadda Cherroun
Laboratoire LITIS, EA 4108, Normandie Université, Rouen, France
Djelloul Ziadi

Authors

Slimane Oulad-Naoui
View author publications
You can also search for this author in PubMed Google Scholar
Hadda Cherroun
View author publications
You can also search for this author in PubMed Google Scholar
Djelloul Ziadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Slimane Oulad-Naoui.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 728 KB)

Appendix

1.1 Proof of Proposition 3

Proof

We construct the automaton $\mathscr {C}$ by determinizing both automata $\mathscr {A}$ and $\mathscr {B}$ seen as one with two start states, that is, determinizing the automaton $\mathscr {A} \cup \mathscr {B}$.

Let $\mathscr {A}=(Q_A,A_1,\mu _A,\lambda _A,\gamma _A)$ be a PWA isomorphic, via $h_A$, to the automaton $\mathcal{P}_{X} = (Q_X,A_X,\mu _X,\lambda _X,\gamma _X)$ which realizes the polynomial $\mathop {\mathbb {P}_{X}}$ associated with the dataset X and $\mathscr {B}=(Q_B,A_2,\mu _B,\lambda _B,\gamma _B)$ the one isomorphic, via $h_B$, to the automaton $\mathcal{P}_{Y} =(Q_Y,A_Y,\mu _Y,\lambda _Y,\gamma _Y)$ which realizes the polynomial $\mathop {\mathbb {P}_{Y}}$ associated with the dataset Y.

We give below firstly the construction of the automaton $\mathscr {C}$ and then a mapping h from the set of states $Q_C$ to the set of states of the automaton $\mathcal{P}_{X\cup Y}$ which is the range of the polynomial $\mathop {\mathbb {P}_{X\cup Y}}$.

We define $\mathscr {C}=(Q_C,A_1\cup A_2,\mu _C,\lambda _C,\gamma _C)$ the prefixial weighted automaton as follows ($q_A$ and $q_B$ are elements from $Q_A$ and $Q_B$, respectively):

1.
$ Q_C = T\cup W\cup Z,$ where:
- $T=\{ \{q_A,q_B\} \mid h_A(p) = h_B(q)\}$,
- $W=\{\{q_A\} \mid h_A(q_A) \in h_A(Q_A)\setminus h_B(Q_B)\}$,
- $Z= \{\{q_B\} \mid h_B(q_B) \in h_B(Q_B)\setminus h_A(Q_A)\}.$
2.
$\mu _C(\{q_A,q_B\})={\left\{ \begin{array}{ll} \begin{array}{ll} 1 &{} \text{ for } \text{ the } \text{ start } \text{ state } ({q_A}_0,{q_B}_0) \text{ obtained } \text{ by } \text{ pairing } \text{ those } \text{ of } \mathscr {A} \text{ and } \mathscr {B},\\ 0 &{} \text{ otherwise }. \end{array} \end{array}\right. } $
3.
$\lambda _C(q,a,q\prime ) \text{ is } \text{ a } \text{ binary } \text{ matrix, } \text{ for } q \text{ and } q\prime \in Q_C \text{ and } a\in A_1\cup A_2$, that is, the weight of each transition is either 0 or 1. We have five cases according to the subsets T, W or Z to which belong q and $q\prime $:
- $q \in T$ and $q\prime \in T$: $\lambda _C(\{q_A,q_B\},a,\{q\prime _A,q\prime _B\}) = 1$, if both $\lambda _A(q_A,a,q\prime _A)$ and $\lambda _B(q_B,a,q\prime _B)$ are defined,
- $q \in T$ and $q\prime \in W$: $\lambda _C(\{q_A,q_B\},a,\{q\prime _A\}) = 1$, if only $\lambda _A(q_A,a,q\prime _A)$ is defined,
- $q \in T$ and $q\prime \in Z$: $\lambda _C(\{q_A,q_B\},a,\{q\prime _B\}) = 1$, if only $\lambda _B(q_B,a,q\prime _B)$ is defined,
- $q\in W$: if defined $q\prime $ can only belong to W (closure of prefixial sets): $\lambda _C(\{q_A\},a,\{q\prime _A\}) = 1$ if $\lambda _A(q_A,a,q\prime _A)$ exists,
- $q\in Z$: for the same reason, if defined $q\prime $ can only belong to Z: $\lambda _C(\{q_B\},a,\{q\prime _B\}) = 1$ if $\lambda _B(q_B,a,q\prime _B)$ exists,
In order to simplify the proof, let $\Delta _C$ symbolizes the weight function $\lambda _C$, and let also $\delta _A$, $\delta _B$, $\delta _X$ and $\delta _Y$ denote, respectively, the functions $\lambda _A$, $\lambda _B$, $\lambda _X$ and $\lambda _Y$, and we can recapitulate the function $\lambda _C$ in the following cases: $\Delta _C(q,a)={\left\{ \begin{array}{ll} \begin{array}{lll} \{q\prime _A,q\prime _B\}\quad &{} \text{ if } \quad &{} q=\{q_A,q_B\}\in T \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) =q\prime _B,\\ \{q\prime _A\}\quad &{} \text{ if } \quad &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) = \emptyset ,\\ q=\{q_A\}\in W \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A. \end{array} \end{array}\right. }\\ \{q\prime _B\}\quad &{} \text{ if } &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B\text{, } \text{ and } \delta _A(q_A,a) = \emptyset ,\\ q=\{q_B\}\in Z \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B. \end{array} \end{array}\right. }\\ \end{array} \end{array}\right. }$
4.
$\gamma _C(I)={\left\{ \begin{array}{ll} \begin{array}{lll} \gamma _A(q_A)+\gamma _B(q_B)\quad &{} \text{ if } \quad &{} I=\{q_A,q_B\} \in T ,\\ \gamma _A(q_A)\quad &{} \text{ if } \quad &{} I=\{q_A\}\in W ,\\ \gamma _B(q_B)\quad &{} \text{ if } \quad &{} I=\{q_B\} \in Z. \end{array} \end{array}\right. }$

Define also the mapping h from $Q_C$ to $\mathrm{range}(\mathop {\mathbb {P}_{X\cup Y}})$ as follows:

$$\begin{aligned} h(I)= {\left\{ \begin{array}{ll} \begin{array}{lll} h_A(\{q_A\})\quad &{} \text{ if } \quad &{} I=\{q_A,q_B\} \in T \text{, } \text{ or } I=\{q_A\} \in W,\\ h_B(\{q_B\})\quad &{} \text{ if } \quad &{} I =\{q_B\} \in Z. \end{array} \end{array}\right. } \end{aligned}$$

Now, we must verify that h defines well an automata isomorphism.

Let I be a state of $Q_C$:

1.
It is clear from the definition of the vector $\mu _C$ that the start state of $\mathscr {C}$ is mapped by h to the start state of $\mathrm{range}(\mathop {\mathbb {P}_{X\cup Y}})$ which is the pair $(\varepsilon ,\varepsilon )$
2.
$h(\Delta _C(q,a))={\left\{ \begin{array}{ll} \begin{array}{lll} h(\{q\prime _A,q\prime _B\})\quad &{} \text{ if } \quad &{} q=\{q_A,q_B\}\in T \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) =q\prime _B,\\ h(\{q\prime _A\})\quad &{} \text{ if } &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) = \emptyset ,\\ q=\{q_A\}\in W \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A. \end{array} \end{array}\right. }\\ h(\{q\prime _B\})\quad &{} \text{ if } \quad &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B\text{, } \text{ and } \delta _A(q_A,a) = \emptyset ,\\ q=\{q_B\}\in Z \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B. \end{array} \end{array}\right. }\\ \end{array} \end{array}\right. }$ According to the definition of the mapping h, we obtain:
$$\begin{aligned}&h(\Delta _C(q,a))\\&\quad ={\left\{ \begin{array}{ll} \begin{array}{lll} h_A(\{q\prime _A\})\quad &{} \text{ if } \quad &{} q=\{q_A,q_B\}\in T \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) =q\prime _B,\\ h_A(\{q\prime _A\})\quad &{} \text{ if } \quad &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) = \emptyset ,\\ q=\{q_A\}\in W \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A. \end{array} \end{array}\right. }\\ h_B(\{q\prime _B\})\quad &{} \text{ if } \quad &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B\text{, } \text{ and } \delta _A(q_A,a) = \emptyset ,\\ q=\{q_B\}\in Z \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B 1. \end{array} \end{array}\right. }\\ \end{array} \end{array}\right. }\\&\quad ={\left\{ \begin{array}{ll} \begin{array}{lll} h_A(\delta _A(q_A,a))\quad &{} \text{ if } \quad &{} q=\{q_A,q_B\}\in T \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) =q\prime _B,\\ h_A(\delta _A(q_A,a))\quad &{} \text{ if } \quad &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) = \emptyset ,\\ q=\{q_A\}\in W \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A. \end{array} \end{array}\right. }\\ h_B(\delta _B(q_B,a))\quad &{} \text{ if } \quad &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B\text{, } \text{ and } \delta _A(q_A,a) = \emptyset ,\\ q=\{q_B\}\in Z \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B. \end{array} \end{array}\right. }\\ \end{array} \end{array}\right. } \end{aligned}$$
Since both $h_A$ and $h_B$ are weighted automata isomorphisms:
$$\begin{aligned}&h(\Delta _C(q,a))\\&\quad ={\left\{ \begin{array}{ll} \begin{array}{lll} \delta _X(h_A(\{q_A\}),a)\quad &{} \text{ if } \quad &{} q=\{q_A,q_B\}\in T \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) =q\prime _B,\\ \delta _X(h_A(\{q_A\}),a)\quad &{} \text{ if } \quad &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A\text{, } \text{ and } \delta _B(q_B,a) = \emptyset ,\\ q=\{q_A\}\in W \quad &{} \text{ and } \delta _A(q_A,a)=q\prime _A. \end{array} \end{array}\right. }\\ \delta _Y(h_B(\{q_B\}),a)\quad &{} \text{ if } &{} {\left\{ \begin{array}{ll} \begin{array}{ll} q=\{q_A,q_B\}\in T \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B\text{, } \text{ and } \delta _A(q_A,a) = \emptyset ,\\ q=\{q_B\}\in Z \quad &{} \text{ and } \delta _B(q_B,a)=q\prime _B. \end{array} \end{array}\right. }\\ \end{array} \end{array}\right. } \end{aligned}$$
According to our defined mapping h, and since all our weight functions are binary we obtain for each case: $h(\Delta _C(q,a))=\delta _{X\cup Y}(h(q),a)$
3.
Now, we consider the output weight property, for only the first case (the other cases are simple to prove). Let $I\in Q_C$. When $I\in T$, we have:
$$\begin{aligned} \gamma _C(I)= & {} \gamma _C(\{q_A,q_B\}) \\= & {} \gamma _A(q_A)+\gamma _B(q_B) \\= & {} \gamma _X(h_A(\{q_A\}))+\gamma _Y(h_B(\{q_B\})) \text{ since } h_A\text{, } \text{ and } h_B \text{ are } \text{ isomorphisms } \\= & {} \langle \mathop {\mathbb {P}_{X}},h_A(\{q_A\}) \rangle \!+ \!\langle \mathop {\mathbb {P}_{Y}},h_B(\{q_B\}) \!\rangle \text{: } \text{ by } \text{ the } \text{ definition } \text{ of } \text{ the } \text{ functions } \gamma _X \text{ and } \gamma _Y\\= & {} \langle \mathop {\mathbb {P}_{X}},h_A(\{q_A\}) \rangle + \langle \mathop {\mathbb {P}_{Y}},h_A(\{q_A\}) \rangle \text{: } \text{ because } I \in T\\= & {} \langle \mathop {\mathbb {P}_{X}}+\mathop {\mathbb {P}_{Y}},h_A(\{q_A\}) \rangle \\= & {} \langle \mathop {\mathbb {P}_{X\cup Y}},h_A(\{q_A\}) \rangle \\= & {} \langle \mathop {\mathbb {P}_{X\cup Y}},h(\{q_A,q_B\}) \rangle \\= & {} \langle \mathop {\mathbb {P}_{X\cup Y}},h(I) \rangle \\= & {} \gamma _{X\cup Y}(h(I)). \end{aligned}$$
4.
Finally, it is not hard to see that h is bijective since it is derived from two weighted automata isomorphisms. The definition of h involves $h_A$ or $h_B$ which are both bijective.

$\square $

1.2 Proof of Proposition 5

Proof

Let us start by checking that Proposition 5 is true for one transaction $t_i$ taken from the dataset D of n transactions. So, let $t_i=a_{i_1}a_{i_2}\ldots a_{i_k}$ be a k-itemset. According to the definitions in Sects. 3 and 4, and the convention $\overline{a_i}=a_i+1$, we have:

$$\begin{aligned} \mathop {\mathbb {P}_{t_i}}= & {} 1+a_{i_1}+a_{i_1}a_{i_2}+\ldots +a_{i_1}a_{i_2}a_{i_3} \ldots a_{i_k}\\ \text{ So, } \quad \overline{\mathop {\mathbb {P}_{t_i}}}= & {} 1+a_{i_1}+\overline{a_{i_1}}a_{i_2}+\ldots +\overline{a_{i_1}a_{i_2} \ldots a_{i_{k-1}}}a_{i_k}\\ \quad \text{ Since } \quad&\overline{a_i}=&1+a_i, \text{ we } \text{ can } \text{ write: } \\ \overline{\mathop {\mathbb {P}_{t_i}}}= & {} \overline{a_{i_1}}+\overline{a_{i_1}}a_{i_2}+\ldots +\overline{a_{i_1}a_{i_2}a_{i_3} \ldots a_{i_{k-1}}}a_{i_k} \\= & {} \overline{a_{i_1}}(1+a_{i_2})+\ldots +\overline{a_{i_1}a_{i_2}a_{i_3} \ldots a_{i_{k-1}}}a_{i_k} \\= & {} \overline{a_{i_1}a_{i_2}}+\ldots +\overline{a_{i_1}a_{i_2}a_{i_3} \ldots a_{i_{k-1}}}a_{i_k} \\= & {} \overline{a_{i_1}a_{i_2}}(1+a_{i_3})+\ldots +\overline{a_{i_1}a_{i_2}a_{i_3} \ldots a_{i_{k-1}}}a_{i_k} \\&\ldots&\\= & {} \overline{{a_{i_1}a_{i_2}a_{i_3} \ldots a_{i_{k-1}}a_{i_k}}}\\= & {} \mathop {\mathbb {S}_{t_i}} \end{aligned}$$

Now let us verify also the equality between the sum of the prefixial-bar polynomials and the prefixial-bar polynomial of the whole dataset D.

$$\begin{aligned} \overline{\mathop {\mathbb {P}_{t_i}}}= & {} \langle \mathop {\mathbb {P}_{t_i}}, \varepsilon \rangle + \sum _{ \mathop {a \in A}\limits ^{u \in A^*}} \langle \mathop {\mathbb {P}_{t_i}}, ua \rangle \overline{u}a\\ \sum _{i = 1}^{n}\overline{\mathop {\mathbb {P}_{t_i}}}= & {} \sum _{i = 1}^{n} (\langle \mathop {\mathbb {P}_{t_i}}, \varepsilon \rangle + \sum _{ \mathop {a \in A}\limits ^{u \in A^*}} \langle \mathop {\mathbb {P}_{t_i}}, ua \rangle \overline{u}a)\\ \sum _{i = 1}^{n}\overline{\mathop {\mathbb {P}_{t_i}}}= & {} \sum _{i = 1}^{n} \langle \mathop {\mathbb {P}_{t_i}}, \varepsilon \rangle +\sum _{i=1}^{n} \sum _{\mathop {a \in A}\limits ^{u \in A^*}}\langle \mathop {\mathbb {P}_{t_i}}, ua \rangle \overline{u}a\\= & {} \sum _{i = 1}^{n} \langle \mathop {\mathbb {P}_{t_i}}, \varepsilon \rangle +\sum _{\mathop {a \in A}\limits ^{u \in A^*}} \sum _{i=1}^{n} \langle \mathop {\mathbb {P}_{t_i}}, ua \rangle \overline{u}a\\= & {} \langle \mathop {\mathbb {P}_{D}}, \varepsilon \rangle +\sum _{\mathop {a \in A}\limits ^{u \in A^*}} \langle \mathop {\mathbb {P}_{D}}, ua \rangle \overline{u}a\\= & {} \overline{\mathop {\mathbb {P}_{D}}} \end{aligned}$$

We have found that: $\overline{\mathop {\mathbb {P}_{t_i}}} = \mathop {\mathbb {S}_{t_i}} \text{, } \text{ so } \displaystyle \sum _{i=1}^{n}\overline{\mathop {\mathbb {P}_{t_i}}} = \sum _{i = 1}^{n}\mathop {\mathbb {S}_{t_i}},\, \mathrm{which \ leads \ to } \; \overline{\mathop {\mathbb {P}_{D}}} = \mathop {\mathbb {S}_{D}}$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oulad-Naoui, S., Cherroun, H. & Ziadi, D. A formal series-based unification of the frequent itemset mining approaches. Knowl Inf Syst 53, 439–477 (2017). https://doi.org/10.1007/s10115-017-1048-y

Download citation

Received: 04 March 2016
Revised: 10 March 2017
Accepted: 21 March 2017
Published: 03 April 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s10115-017-1048-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A formal series-based unification of the frequent itemset mining approaches

Abstract

Access this article

Similar content being viewed by others

Mining frequent generators and closures in data streams with FGC-Stream

Interesting Patterns

Finding Frequent Patterns in Parallel Point Processes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 728 KB)

Appendix

1.1 Proof of Proposition 3

Proof

1.2 Proof of Proposition 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A formal series-based unification of the frequent itemset mining approaches

Abstract

Access this article

Similar content being viewed by others

Mining frequent generators and closures in data streams with FGC-Stream

Interesting Patterns

Finding Frequent Patterns in Parallel Point Processes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 728 KB)

Appendix

Appendix

1.1 Proof of Proposition 3

Proof

1.2 Proof of Proposition 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation