Discovering episodes with compact minimal windows

Tatti, Nikolaj

doi:10.1007/s10618-013-0327-9

Discovering episodes with compact minimal windows

Published: 28 June 2013

Volume 28, pages 1046–1077, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Nikolaj Tatti^1,2,3

1018 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Discovering the most interesting patterns is the key problem in the field of pattern mining. While ranking or selecting patterns is well-studied for itemsets it is surprisingly under-researched for other, more complex, pattern types. In this paper we propose a new quality measure for episodes. An episode is essentially a set of events with possible restrictions on the order of events. We say that an episode is significant if its occurrence is abnormally compact, that is, only few gap events occur between the actual episode events, when compared to the expected length according to the independence model. We can apply this measure as a post-pruning step by first discovering frequent episodes and then rank them according to this measure. In order to compute the score we will need to compute the mean and the variance according to the independence model. As a main technical contribution we introduce a technique that allows us to compute these values. Such a task is surprisingly complex and in order to solve it we develop intricate finite state machines that allow us to compute the needed statistics. We also show that asymptotically our score can be interpreted as a $P$ value. In our experiments we demonstrate that despite its intricacy our ranking is fast: we can rank tens of thousands episodes in seconds. Our experiments with text data demonstrate that our measure ranks interpretable episodes high.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

Hai Lan, Zhifeng Bao & Yuwei Peng

Algorithms for frequent itemset mining: a literature review

Article Open access 24 March 2018

Chin-Hoong Chee, Jafreezal Jaafar, … William Yeoh

Causal Structure Learning: A Combinatorial Perspective

Article Open access 01 August 2022

Chandler Squires & Caroline Uhler

Notes

The book was obtained from http://www.gutenberg.org/etext/15.
The abstracts were obtained from http://kdd.ics.uci.edu/databases/nsfabs/nsfawards.html.
The addresses were obtained from http://www.bartleby.com/124/pres68.
The abstracts were obtained from http://jmlr.csail.mit.edu/.
An episode is closed if there are no superepisode with the same support.

References

Achar A, Laxman S, Viswanathan R, Sastry PS (2012) Discovering injective episodes with general partial orders. Data Min Knowl Discov 25(1):67–108
Article MATH MathSciNet Google Scholar
Billingsley P (1995) Probability and measure, 3rd edn. Wiley, New York
MATH Google Scholar
Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: Proceedings of the 7th IEEE international conference on data mining (ICDM 2007), pp 83–92
Casas-Garriga G (2003) Discovering unbounded episodes in sequential data. In: Knowledge discovery in databases: PKDD 2003, 7th European conference on principles and practice of knowledge discovery in databases, pp 83–94
Cule B, Goethals B, Robardet C (2009) A new constraint for mining sets in sequences. In: Proceedings of the SIAM international conference on data mining (SDM 2009), pp 317–328
Gwadera R, Atallah MJ, Szpankowski W (2005a) Markov models for identification of significant episodes. In: Proceedings of the SIAM international conference on data mining (SDM 2005), pp 404–414
Gwadera R, Atallah MJ, Szpankowski W (2005b) Reliable detection of episodes in event sequences. Knowl Inf Syst 7(4):415–437
Article Google Scholar
Hirao M, Inenaga S, Shinohara A, Takeda M, Arikawa S (2001) A practical algorithm to find the best episode patterns. In: Discovery science, pp 435–440
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289. doi:10.1023/A:1009748302351
Article Google Scholar
Méger N, Rigotti C (2004) Constraint-based mining of episode rules and optimal window sizes. In: Knowledge discovery in databases: PKDD 2004, 8th European conference on principles and practice of knowledge discovery in databases, pp 313–324
Pei J, Wang H, Liu J, Wang K, Wang J, Yu PS (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11):1467–1481
Article Google Scholar
Tatti N (2009) Significance of episodes based on minimal windows. In: Proceedings of the 9th IEEE international conference on data mining (ICDM 2009), pp 513–522
Tatti N, Cule B (2011) Mining closed episodes with simultaneous events. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining (KDD 2011), pp 1172–1180
Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(1):34–66
Article MATH MathSciNet Google Scholar
Tatti N, Vreeken J (2012) The long and the short of it: summarising event sequences with serial episodes. In: The 18th ACM SIGKDD international conference on knowledge discovery and data mining, 2012, pp 462–470
Tronícek Z (2001) Episode matching. In: Combinatorial pattern matching, pp 143–146
van der Vaart AW (1998) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
Google Scholar
Webb GI (2007) Discovering significant patterns. Mach Learn 68(1):1–33
Article Google Scholar
Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. TKDD 4(1): 1–20
Google Scholar

Download references

Acknowledgments

Nikolaj Tatti was partly supported by a Post-Doctoral Fellowship of the Research Foundation—Flanders (fwo).

Author information

Authors and Affiliations

ADReM, Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
Nikolaj Tatti
DTAI, Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium
Nikolaj Tatti
Helsinki Institute of Information Technology (HIIT), Department of Information and Computer Science, Aalto University, Helsinki, Finland
Nikolaj Tatti

Authors

Nikolaj Tatti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaj Tatti.

Additional information

Responsible editor: Eamonn Keogh.

Appendix: Proofs

Proof

(Proof of Proposition 2) We will prove this by induction. Let $i$ be the source state of $M$. The proposition holds trivially when $X = \left\{ i\right\} $, a source state. Assume now that the proposition holds for all parent states of $X$.

Assume that $s$ covers $X$. Let $t$ be a subsequence of $s$ that leads ${{sm}}\mathopen {}\left( M\right) $ from the source state $\left\{ i\right\} $ to $X$. Let $s_e$ be the last symbol of $s$ occurring in $t$. Then a parent state $Y = \left\{ y_1 ,\ldots , y_L\right\} = {{par}}\mathopen {}\left( X; s_e\right) $ is covered by $s[1, e - 1]$. By the induction assumption at least one $y_k$ is covered by $s[1, e - 1]$. If there is $x_j \in X$ such that $x_j = y_k$, then $x_j$ is covered by $s$, otherwise there is $x_j$ that has $y_k$ as a parent state. The edge connecting $x_j$ and $y_k$ is labelled with $s_e$. Hence $s$ covers $x_j$ also.

To prove the other direction assume that $s$ covers $x_j$. Let $t$ be a sub-sequence that leads $M$ from $i$ to $x_j$. Let $s_e$ be the last symbol occurring in $t$. Let $y$ be the parent state of $x_j$ connected by an edge labelled with $s_e$. Since $s_e \in {{in}}\mathopen {}\left( X\right) $, we must have $Y$ as a parent state of $X$ such that $y \in Y$. By the induction assumption, $s[1, e - 1]$ covers $Y$. Hence $s$ covers $X$. $\square $

In order to prove Proposition 3 we need the following lemma.

Lemma 2

Let $G$ be an episode and assume a sequence $s = \left( s_1 ,\ldots , s_L\right) $ that covers $G$. Let $\mathcal{H } = \left\{ G - v; v \in {{sinks}}\mathopen {}\left( G\right) , {{lab}}\mathopen {}\left( v\right) = s_L\right\} $. If $\mathcal{H }$ is empty, then $s[1, L - 1]$ covers $G$. Otherwise, there is an episode $H \in \mathcal{H }$ that is covered by $s[1, L - 1]$.

Proof

Let $f$ be a valid mapping of $V(G)$ to indices of $s$ corresponding to the coverage. If $\mathcal{H }$ is empty, then $L$ is not in the range of $f$, then $s[1, L - 1]$ covers $G$. If $\mathcal{H }$ is not empty but $L$ is not in the range of $f$, then $s[1, L - 1]$ covers $G$, and any episode in $\mathcal{H }$.

Assume now that $L$ is in range of $f$, that is, there is a sink $v$ with a label $s_L$. Episode $G - v$ is in $\mathcal{H }$. Moreover, $f$ restricted to $G - v$ provides the needed mapping in order to $s[1, L - 1]$ to cover $G - v$. $\square $

Proof

(Proof of Proposition 3) If ${{g}}\mathopen {}\left( X, s\right) = \left\{ i\right\} $, then it is trivial to see that $s$ covers $X$.

Assume that $s$ covers $X$. We will prove this direction by induction over $L$, the length of $s$. The proposition holds for $L = 0$. Assume that $L > 0$ and that proposition holds for all sequences of length $L - 1$.

Let $Y = {{g}}\mathopen {}\left( X, s_L\right) $. Note that ${{g}}\mathopen {}\left( X, s\right) = {{g}}\mathopen {}\left( Y, s[1, L - 1]\right) $. Hence, to prove the proposition we need to show that $s[1, L - 1]$ covers $Y$.

If $Y = \left\{ i\right\} $, then $s[1, L - 1]$ covers $Y$. Hence, we can assume that $Y \ne \left\{ i\right\} $, that is, $Y = {{sub}}\mathopen {}\left( X; s_L\right) \cup {{stay}}\mathopen {}\left( X; s_L\right) $.

Proposition 2 implies that one of the states of $M_G$, say $x \in X$, is covered by $s$. Proposition 1 states that the corresponding episode, say $H$, is covered by $s$.

Assume that $x \in Y$. This is possibly only if $x \in {{stay}}\mathopen {}\left( X; s_L\right) $ that is there is no sink node in $H$ labelled as $s_L$. Lemma 2 implies that $s[1, L - 1]$ covers $H$, Propositions 1 and 2 imply that $s[1, L - 1]$ covers $Y$.

Assume that $x \notin Y$, Then ${{sub}}\mathopen {}\left( X; s_L\right) \subseteq Y$ contains all states of $M_G$ corresponding to the episodes of form $H - v$, where $v$ is sink node of $H$ with a label $s_L$. According to Lemma 2, $s[1, L - 1]$ covers one of these episodes, Propositions 1 and 2 imply that $s[1, L - 1]$ covers $Y$. $\square $

Proof

(Proof of Proposition 4) We will prove the proposition by induction over $L$, the length of $s$. The proposition holds when $L = 0$. Assume that $L > 0$ and that proposition holds for sequence of length $L - 1$.

Let $\beta = (y_1, y_2) = {{g}}\mathopen {}\left( \alpha , s_L\right) $. Then, by definition of $M^*$, $y_i = {{g}}\mathopen {}\left( x_i, s_L\right) $. Write $t = s[1, L - 1]$. Since

$$\begin{aligned} {{g}}\mathopen {}\left( \beta , t\right) = {{g}}\mathopen {}\left( \alpha , s\right) , \quad {{g}}\mathopen {}\left( y_1, t\right) = {{g}}\mathopen {}\left( x_1, s\right) , \quad {{g}}\mathopen {}\left( y_2, t\right) = {{g}}\mathopen {}\left( x_2, s\right) . \end{aligned}$$

and, because of induction assumption, ${{g}}\mathopen {}\left( \beta , t\right) = ({{g}}\mathopen {}\left( y_1, t\right) , {{g}}\mathopen {}\left( y_2, t\right) )$, we have ${{g}}\mathopen {}\left( \alpha , s\right) = ({{g}}\mathopen {}\left( x_1, s\right) , {{g}}\mathopen {}\left( x_2, s\right) )$. $\square $

Proof

(Proof of Proposition 5) Assume that $s$ is a minimal window for $G$. Since $s$ covers $S$ in $M$, ${{g}}\mathopen {}\left( S, s; M\right) = I$. This implies that ${{g}}\mathopen {}\left( S, s; M_1\right) = I$ or ${{g}}\mathopen {}\left( S, s; M_1\right) = J$. The latter case implies that $s[2, L]$ covers $S$ in $M$, which is a contradiction. Hence, ${{g}}\mathopen {}\left( S, s; M_1\right) = I$. Let $Z = {{g}}\mathopen {}\left( T, s; M_2\right) $. If $Z = I$, then $s[1, L - 1]$ covers $S$ in $M$, which is a contradiction. Hence $Z \ne I$. Proposition 4 implies that ${{g}}\mathopen {}\left( \alpha , s\right) = (I, Z)$.

Assume that ${{g}}\mathopen {}\left( \alpha , s\right) = (I, Y)$ such that $Y \ne I$. Proposition 4 implies that ${{g}}\mathopen {}\left( S, s; M_1\right) = I$ and ${{g}}\mathopen {}\left( T, s; M_2\right) \ne I$. The former implication leads to ${{g}}\mathopen {}\left( S, s; M\right) = I$ which implies that $s$ covers $G$.

If $s[2, L]$ covers $G$, then ${{g}}\mathopen {}\left( S, s[2, L]; M\right) = I$ and so ${{g}}\mathopen {}\left( S, s; M_1\right) = J$, which is a contradiction. Hence $s[2, L]$ does not cover $G$. The latter implication leads to ${{g}}\mathopen {}\left( S, s[1, L - 1]; M\right) \ne I$ which implies that $s[1, L - 1]$ does not cover $G$. This proves the proposition. $\square $

Proof

(Proof of Proposition 6) If $L = 0$, then ${{g}}\mathopen {}\left( x, s\right) = x$ which immediately implies the proposition. Assume that $L > 0$. Note that ${{g}}\mathopen {}\left( x, s\right) = {{g}}\mathopen {}\left( {{g}}\mathopen {}\left( x, s_L\right) , s[1, L - 1]\right) $.

$$\begin{aligned} p({{g}}\mathopen {}\left( x, s\right) \in Y \mid {\left| s\right| } = L)&= \sum _{a \in \Sigma } p(a) p({{g}}\mathopen {}\left( x, s\right) \in Y \mid {\left| s\right| } = L, s_L = a)\\&= \sum _{a \in \Sigma } p(a) p({{g}}\mathopen {}\left( {{g}}\mathopen {}\left( x, a\right) , s[1, L \!-\! 1]\right) \in Y \mid {\left| s\right| } \!=\! L, s_L \!=\! a). \end{aligned}$$

Since individual symbols in $s$ are independent, it follows that

$$\begin{aligned} p({{g}}\mathopen {}\left( {{g}}\mathopen {}\left( x, a\right) , s[1, L - 1]\right) \in Y \mid {\left| s\right| } = L, s_L = a) = {{pg}}\mathopen {}\left( {{g}}\mathopen {}\left( x, a\right) , Y, L - 1\right) . \end{aligned}$$

This proves the proposition. $\square $

Proof

(Proof of Lemma 1) Define $q = \sqrt{1 - \min _{a \in \Sigma } p(a)}$. Note that $q < 1$. We claim that for each $x$ there is a constant $C_x$ such that ${{pg}}\mathopen {}\left( x, Y, L\right) \le C_xq^L = O(q^{L})$ which in turns proves the lemma. To prove the claim we use induction over parenthood of $x$ and $L$.

Since the source node is not in $Y$, the first step follows immediately. Assume that the result holds for all parent states of $x$. Define

$$\begin{aligned} C_x \!=\! \max \bigg (1, \frac{1}{q(1 - q)} \mathop {\mathop {\sum }\limits _{a \in {{in}}\mathopen {}\left( x\right) }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) } p(a) C_y\bigg ) \text{ which } \text{ implies } q C_x + q^{-1}{\mathop {\mathop {\sum }\limits _{a \in {{in}}\mathopen {}\left( x\right) }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a)C_y \le C_x. \end{aligned}$$

Since $C_x \ge 1$, the case of $L = 0$ holds. Assume that the the induction assumption holds for $C_y$ and for $C_x$ up to $L - 1$. Let $r = 1 - \sum _{a \in {{in}}\mathopen {}\left( x\right) } p(a)$. Note that $r \le q^2$. According to Proposition 6 we have

$$\begin{aligned} {{pg}}\mathopen {}\left( x, Y, L\right)&= r {{pg}}\mathopen {}\left( x, Y, L - 1\right) + {\mathop {\mathop {\sum }\limits _{a \in {{in}}\mathopen {}\left( x\right) }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a) {{pg}}\mathopen {}\left( y, Y, L - 1\right) \\&\le r C_xq^{L - 1} + {\mathop {\mathop {\sum }\limits _{a \in {{in}}\mathopen {}\left( x\right) }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a)C_y q^{L - 1}\\&\le q^L\bigg (q C_x + q^{-1}{\mathop {\mathop {\sum }\limits _{a \in {{in}}\mathopen {}\left( x\right) }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a)C_y\bigg ) \le q^LC_x. \end{aligned}$$

This proves that ${{pg}}\mathopen {}\left( x, Y, L\right) $ decays at exponential rate. $\square $

Proof

(Proof of Proposition 8) The proposition follows by a straightforward manipulation of Eq. 1. First note that

$$\begin{aligned} \sum _{L = 1}^\infty f(L - 1) {{pg}}\mathopen {}\left( x, Y, L\right) = c{{m}}\mathopen {}\left( x, f, Y\right) + {{m}}\mathopen {}\left( x, h, Y\right) . \end{aligned}$$

(5)

Equation 1 implies that

$$\begin{aligned} \sum _{L = 1}^\infty f(L - 1) {{pg}}\mathopen {}\left( x, Y, L\right)&= {\mathop {\mathop {\sum }\limits _{a \in \Sigma }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a) \sum _{L = 1}^\infty f(L - 1) {{pg}}\mathopen {}\left( y, Y, L - 1\right) \nonumber \\&= {\mathop {\mathop {\sum }\limits _{a \in \Sigma }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a) (i(y) + \sum _{L = 1}^\infty f(L) {{pg}}\mathopen {}\left( y, Y, L\right) ) \nonumber \\&= {\mathop {\mathop {\sum }\limits _{a \in \Sigma }}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a) (i(y) + {{m}}\mathopen {}\left( y, f, Y\right) ) \nonumber \\&= q(i(x) \!+\! {{m}}\mathopen {}\left( x, f, Y\right) ) \!+\! {\mathop {\mathop {\sum }\limits _{{a \in {{in}}\mathopen {}\left( x\right) }}}\limits _{y = {{g}}\mathopen {}\left( x, a\right) }} p(a) (i(y) \!+\! {{m}}\mathopen {}\left( y, f, Y\right) ). \nonumber \\ \end{aligned}$$

(6)

Combining Eqs. 5 and 6 and solving ${{m}}\mathopen {}\left( x, f, Y\right) $ gives us the result. $\square $

To prove the asymptotic normality we will use the following theorem.

Theorem 1

(Theorem 27.4 in Billingsley 1995) Assume that $U_k$ is a stationary sequence with $\mathrm{E }\left[ U_k\right] = 0$, $\mathrm{E }\left[ U_k^{12}\right] < \infty $, and is $\alpha $-mixing with $\alpha (n) = O(n^{-5})$, where $\alpha (n)$ is the strong mixing coefficient,

$$\begin{aligned} \alpha (n) = \sup _{k, A, B} {\left| p(A, B) - p(A)p(B)\right| }, \end{aligned}$$

where $A$ is an event depending only on $U_{-\infty }, \ldots , U_k$ and $B$ is an event depending only on $U_{k + n}, \ldots ,U_{\infty }$. Let $S_k = U_1 + \cdots + U_k$. Then $\sigma ^2 = \lim _k 1/k \mathrm{E }\left[ S_k\right] $ exists and $S_k / \sqrt{k}$ converges to $N(0, \sigma ^2)$ and $\sigma ^2 = \mathrm{E }\left[ U_1^2\right] + 2\sum _{k = 2}^\infty \mathrm{E }\left[ U_1U_k\right] $.

Proof

(Proof of Proposition 10) Let us write $T_k = (Z_k, X_k) - (q, p)$ and $S_L = 1/\sqrt{L}\sum _{k = 1}^L T_k$. Assume that we are given a vector $r = (r_1, r_2)$ and write $U_k = r^TT_k$. We will first prove that $r^TS_L$ converges to a normal distribution using Theorem 1.

First note that $\mathrm{E }\left[ U_k\right] = 0$ and that

$$\begin{aligned} \mathrm{E }\left[ U_k^{12}\right] \!=\! \sum _{i = 0}^{12} \left( \begin{array}{l}12 \\ i \end{array}\right) r_1^ir_2^{12 - i}\mathrm{E }\left[ Z_k^iX_k^{12 - i}\right] \!=\! r_2^{12}\mathrm{E }\left[ X_k\right] \!+\! \sum _{i = 1}^{12} \left( \begin{array}{l}12 \\ i \end{array}\right) r_1^ir_2^{12 - i}\mathrm{E }\left[ Z_k^i\right] . \end{aligned}$$

Since every moment of $Z_k$ and $X_k$ is finite, $\mathrm{E }\left[ U_k^{12}\right] $ is also finite. We will prove now that $U_k$ is $\alpha $-mixing.

Fix $k$ and $N$. Write $W$ to be an event that $s[k + 1, N]$ covers $G$. If $W$ is true, then $X_l$ and $Z_l$ (and hence $U_l$) for $l \le k$ depends only $s[l, N]$, that is, either there is a minimal window $s[l, N^{\prime }]$, where $N^{\prime } < N$ or $X_l = Z_l = 0$.

Let $A$ be an event depending only on $U_{-\infty }, \ldots , U_k$ and $B$ be an event depending only on $U_{N + 1}, \ldots ,U_{\infty }$. Then $p(A,B \mid W) = p(A \mid W)p(B \mid W)$. We can rephrase this and bound $\alpha (n) \le p(s[1, n - 1] \text{ does } \text{ not } \text{ covers } G)$. To bound the right side, let $M = {{sm}}\mathopen {}\left( M_G\right) $, let $v$ be its sink state and let $V$ be all states save the source state. Then the probability is equal to

$$\begin{aligned} p(s[1, n - 1] \text{ does } \text{ not } \text{ covers } G) = {{pg}}\mathopen {}\left( v, V, n - 1\right) . \end{aligned}$$

Since $V$ does not contain the source node, the moment ${{m}}\mathopen {}\left( v, n \rightarrow n^5, V\right) $ is finite. Consequently, $n^5{{pg}}\mathopen {}\left( v, V, n\right) \rightarrow 0$ which implies that $\alpha (n) = O(n^{-5})$. Thus Theorem 1 implies that $r^TS_L$ converges to a normal distribution with the variance $\sigma ^2 = r_1^2C_{11} + 2r_1r_2C_{12} + r_2^2C_{22} = r^TCr$. Levy’s continuity theorem (Theorem 2.13 Vaart 1998) now implies that the characteristic function of $r^TS_L$ converges to a characteristic function of normal distribution $N(0, \sigma ^2)$,

$$\begin{aligned} \mathrm{E }\left[ \exp \mathopen {}\left( itr^TS_L\right) \right] \rightarrow \exp \mathopen {}\left( -1/2t^2r^TCr\right) . \end{aligned}$$

The left side is a characteristic function of $S_L$ (with $tr$ as a parameter). Similarly, the right side is a characteristic function of $N(0, C)$. Levy’s continuity theorem now implies that $S_L$ converges into $N(0, C)$. $\square $

Proof

(Proof of Proposition 11) Function $f(x, y) = x/y$ is differentiable at $(q, p)$. Since $1/\sqrt{L}\left( \sum _{k = 1}^L (Z_k, X_k) - (q, p)\right) $ converges to normal distribution, we can apply Theorem 3.1 in Vaart (1998) so that

$$\begin{aligned} \sqrt{L}\left( \frac{\sum _{k = 1}^L Z_k}{\sum _{k = 1}^L X_k} - \mu \right) = \sqrt{L} f\left( 1/L\sum _{k = 1}^L Z_k, 1/L\sum _{k = 1}^L X_k\right) - \sqrt{L}f(q, p) \end{aligned}$$

converges to $N(0, \sigma ^2)$, where $\sigma ^2 = \nabla f(q, p)^T C \nabla f(q, p)$. The gradient of $f$ is equal to $\nabla f(q, p) = (1/p, -\mu /p)$. The proposition follows. $\square $

Proof

(Proof of Proposition 12) To prove all four cases simultaneously, let us write write $A$ to be either $X_1$ or $Z_1$ and let $B_k$ to be either $X_k$ or $Z_k$. Let $a = \mathrm{E }\left[ A\right] $ and $b = \mathrm{E }\left[ B_k\right] $. First note that $\mathrm{E }\left[ (A - a)(B_k - b)\right] = \mathrm{E }\left[ A(B_k - b)\right] $, which allows us to ignore $a$ inside the mean.

Assume that we have $0 < n < k$. Then given that $Y_1 = n$, $A$ and $X_1$ depends only on $n$ first symbols of sequence. Since $B_k$ does not depend on $k - 1$ first symbols, this implies that

$$\begin{aligned} p(A, B_k \mid Y_1 = n) = p(A \mid Y_1 = n)p(B_k \mid Y_1 = n) = p(A \mid Y_1 = n)p(B_k), \end{aligned}$$

which in turns implies that $\mathrm{E }\left[ A (B_k - b) \mid Y_1 = n\right] = 0$.

Note that for $A = 0$ whenever $Y_1 = 0$. Consequently, we have

$$\begin{aligned} \mathrm{E }\bigg [{A\sum _{k = 2}^\infty (B_k - b)}\bigg ]&= \sum _{n = 1}^\infty \mathrm{E }\bigg [{A\sum _{k = 2}^\infty (B_k - b) \mid Y_1 = n}\bigg ] p(Y_1 = n) \\&= \sum _{n = 1}^\infty \mathrm{E }\bigg [{A\sum _{k = 2}^n (B_k - b) \mid Y_1 = n}\bigg ] p(Y_1 = n) \\&= \mathrm{E }\bigg [{A\sum _{k = 2}^{Y_1} (B_k - b)}\bigg ] = \mathrm{E }\bigg [{A \sum _{k = 2}^{Y_1} B_k}\bigg ] - \mathrm{E }\bigg [{A \sum _{k = 2}^{Y_1}b}\bigg ] \\&= \mathrm{E }\bigg [{A \sum _{k = 2}^{Y_1} B_k}\bigg ] - \mathrm{E }\left[ A(Y_1 - X_1)\right] b \\&= \mathrm{E }\bigg [{X_1A \sum _{k = 2}^{Y_1} X_kB_k}\bigg ] - \mathrm{E }\left[ A(Y_1 - X_1)\right] b, \end{aligned}$$

where the second last equality holds because $\sum _{k = 2}^{Y_1} 1 = Y_1 - X_1$ and the last equality follows since $X_k = X_k^2$ and $Z_k = X_kZ_k$ for any $k$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tatti, N. Discovering episodes with compact minimal windows. Data Min Knowl Disc 28, 1046–1077 (2014). https://doi.org/10.1007/s10618-013-0327-9

Download citation

Received: 20 September 2012
Accepted: 11 June 2013
Published: 28 June 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10618-013-0327-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering episodes with compact minimal windows

Abstract

Access this article

Similar content being viewed by others

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Algorithms for frequent itemset mining: a literature review

Causal Structure Learning: A Combinatorial Perspective

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proofs

Proof

Lemma 2

Proof

Proof

Proof

Proof

Proof

Proof

Proof

Theorem 1

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discovering episodes with compact minimal windows

Abstract

Access this article

Similar content being viewed by others

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Algorithms for frequent itemset mining: a literature review

Causal Structure Learning: A Combinatorial Perspective

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proofs

Appendix: Proofs

Proof

Lemma 2

Proof

Proof

Proof

Proof

Proof

Proof

Proof

Theorem 1

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation