Mining precise cause and effect rules in large time series data of socioeconomic indicators
 583 Downloads
Abstract
Discovery of cause–effect relationships, particularly in large databases of timeseries is challenging because of continuous data of different characteristics and complex lagged relationships. In this paper, we have proposed a novel approach, to extract cause–effect relationships in large time series data set of socioeconomic indicators. The method enhances the scope of relationship discovery to cause–effect relationships by identifying multiple causal structures such as binary, transitive, many to one and cyclic. We use temporal association and temporal odds ratio to exclude noncausal association and to ensure the high reliability of discovered causal rules. We assess the method with both synthetic and realworld datasets. Our proposed method will help to build quantitative models to analyze socioeconomic processes by generating a precise cause–effect relationship between different economic indicators. The outcome shows that the proposed method can effectively discover existing causality structure in large time series databases.
Keywords
Data mining Cause–effect relationships Causality Temporal association Temporal odds ratioBackground
Suppose we have set of indicators such as exercise, weight, diseases, calcium, alcohol, and bone growth etc. Various causal relationships can exists among them. An indicator may affect other instantly or after some time. For example, if a person takes alcohol he may feel a lack of energy (lethargy) instantly or after some time (Fig. 1a). If he takes alcohol frequently, the changes can be observed and it can be concluded that alcohol is one of the causes behind tiredness. We could identify the time between alcohol was taken and occurrence of lethargy and can also identify the amount of alcohol dose tends to cause the lethargy. More relationship like transitive can be analyzed between set of indicators (shown in Fig. 1), such as lack of exercise increases weight, which increases the chance of diseases (Fig. 1b, c). Many to one, shows the relationship such as if a person is taking the proper dose of calcium and vitamin D, it will help in bone growth i.e. bone growth requires both calcium and vitamin D. Figure 1d describes the cyclic relationship mean properties affecting each other in a cyclic manner, for example, lethargy increases weight which in turn also increases lethargy. These extracted relationships are referred as binary, transitive, many to one and cyclic respectively.
In this paper, we have proposed a method to extract various causal relationships as binary, transitive, many to one and cyclic with properties such as time required to occur an effect (as lag value), rate of change (of both cause and effect parameter) and strength of a relationship without using statistical information.
Related work and contributions
The common way to identify cause–effect relationships is to plan randomized controlled experiments, which is generally expensive and unattainable with a huge number of parameters. Therefore, much concentration is needed to discover cause–effect relationships from increased growth of the huge amount of observational data. Discovery of cause–effect relationships in large observational data is a demandable task. Pearl and Verma (1991) suggested a framework that discovered causal structures from connected conditional independence, based on that some techniques have been developed to identify the causal relationships. However, still it cannot discover causal structures effectively from large databases and also the computational cost is high for the discovery. Probabilistic dependence is one technique, used to represent causality. Probabilistic cause–effect relationships have been examined and suggested in the literature (Reinchenbach 1978; Reichenbach and Reichenbach 1991; Good 1959; Suppes 1970). More recently, Bayesian networks (Pearl 2014), graphical causal modeling have emerged as a leading technique for discovering causal relationships. Authors (Heckerman 1995, 1997; Zhang and Poole 1996; Waldmann and Martignon 1998; Nadkarni and Shenoy 2001) describe the techniques they have proposed for characterizing, interpreting and learning probabilistic independence among parameters. However, Bayesian network learning to discover complete cause–effect models is an NPcomplete problem (Chickering 1996). Constraintbased techniques are more efficient by avoiding the search for a generic Bayesian network. Currently, several constraintbased approaches have been implemented to identify causal relationships in large databases and achieved some satisfactory results (Cooper 1997; Silverstein et al. 2000; Mani et al. 2012; Pellet and Elisseeff 2008; Aliferis et al. 2010). These approaches use observational data to detect and learn causal structures using conditional independence among variables. It is significantly notable that these constraintsbased approaches directly or indirectly implement the concept of Bayesian network learning, by creating a directed acyclic graph (DAG) which describes the conditional independence between variables (parameters). Even constraintbased methods shown promising results with large databases, they typically are designed to detect causality with few fixed structures in a directed acyclic graph (DAG), such as Y structures (Mani et al. 2012), CCC (Cooper 1997), and CCU (Silverstein et al. 2000).
Another technique in this area is Granger causality (GC) (Granger 1969). It has also been discussed in the previous literature (Lozano et al. 2009a, b; Arnold et al. 2007; Pang and Su 2010) and well known in economics causal inference. The method calculates the impact of one time series on another by finding out whether the response prediction can be improved by including the knowledge of a predictor or not. GC is reported to perform well for stationary time series data but is sensitive to nonlinearity. All these methods infer directed networks. Although these methods are fast and, the inferred interactions are undirected. Moreover, these approaches are well suited for small sample data analysis (Veiga et al. 2007) but are not designed to detect combined causal parameters. Most of the time, two or more parameters may enhance the strength of effects. Even when individual parameter does not cause more effect, together they may do. We noticed that discovering causal structures in observational data only is insufficient. So, the discovered relationships have to be verified with time series data and controlled experiments. Still, it is acceptable to remove noncausal relationships discovered from data. Cause–effect relationship discovery is to find a brief list of rules that are probably causal. These causal rules provide a set of statistically decisive relationships which are acceptable to embed cause–effect relationships. This differentiates between the causal and normal rule discovery.
Association rule mining (Agrawal et al. 1993) has an efficient and versatile means for discovering relationships in data (Han et al. 2011). Authors (Jin et al. 2012; Li et al. 2013; Ma et al. 2016) use the advantage of association rule mining for causality discoveries. Jin et al. (2012) discovers the causal relationships with multiple cause variables in large databases of binary variables and excludes noncausal associations. Researchers (Li et al. 2013; Ma et al. 2016) discover potential causal rules using cohort study (Euser et al. 2009; Fleiss et al. 2003) and capable to generate combine causal rules in observational data. Author (Li et al. 2015) presented four approaches PC, HITONPC, CRPA and CRCS for causality detection around a given target variable and discuss their efficiency. The PC and HITONPC methods are based on Bayesian network learning theory and use conditional independence tests to eliminate non persistent associations, CRPA use association rule and partial association and CRCS uses the concept of a cohort study.
These proposed methods are able to find single and combined causal rules effectively in small and large database with low and high dimensional data, but they are restricted to discrete data and unable to extract the cyclic relationships and strength of relationships, although causality can be observed in various hidden relationships. However, statistically predictable associations do not illustrate cause–effect relationships, although mostly causality is usually observed as an association in the dataset. Therefore, in this paper, initially we use the concept of temporal association (Ji et al. 2011) and odds ratio (Fleiss et al. 2003) to extract binary causal relationship and further other relationships are extracted.
To the best of our knowledge, there is no previous work on discovering cyclic and transitive causal relationships with properties as the rate of change of parameters and their relationship strength in time series data. We should observe that discovering causal relationships in observational and constraintbased data only are insufficient.

First, we present a method to extract cause–effect relationships like binary, transitive, many to one and cyclic in large time series database.

Second, we define the concept of temporal association lag rule and temporal odds ratio to extract cause–effect relationships between various parameters.

Third, we are generating more specific cause–effect rules like binary, transitive, many to one and cyclic with their relationship strength which is useful for strategic decisions.
Our proposed method is useful to extract time lagged relationships across different field indicators that can be used to understand the lagged response of one indicator on another and various relationships such as binary, cyclic, many to one and transitive. We show the utility of our approach by extracting some relationships between different field indicators. For example, the rule (Cereal production, D, 2 %, 2) \(\Rightarrow\) (Agricultural raw materials exports, 3 %), indicates a causal rule that cereal production is directly related to agricultural raw materials exports and if it is changed by 2 %, it affects the export of agricultural raw material by 3 % after 2 years. The proposed approach can be broadly applied to other problems in the temporal domain to extract various time lagged relationships.
Preliminaries
In this section, first we define the terms used in this paper. Then we define the concepts for describing proposed cause–effect relationship extraction method. Finally, we describe the formal definition of various cause–effect relationships, discovering such causal relationships is the aim of this paper.
 n

Number of elements in timeseries
 z

Number of parameters in database P
 l

Lag parameter, l ≠ 0
 l _{ max }

Maximum lag difference value
 T _{ k }

Value of kth time unit
 P _{ i,k }

Value of P _{ i } parameter in kth time unit
 γ _{ i,k }
 Rate of change of parameter P _{ i } in kth year, can be calculated as:$$\gamma_{i,k} = \frac{{P_{i,k}  P_{i,k  1} }}{{P_{i,k  1} }}$$(1)
 \(\delta\)

Minimum rate of change used to consider a significant change
 R _{ i,k }
 Parameters indicate type of change, defined as:$$R_{i,k} = \left\{ {\begin{array}{*{20}l} {U\quad if \quad \, \gamma_{i,k} \ge \delta } \hfill \\ {D \quad if\quad \,\gamma_{i,k} \le  \delta } \hfill \\ {Q \quad if\quad  \delta \le \gamma_{i,k} \le \delta } \hfill \\ \end{array} } \right\}$$(2)
The time series of parameter P _{ i } is converted into a set of tuple 〈P _{ i }, T _{ k }, R _{ i,k }〉 where T _{ k } is kth time period and R _{ i } = R _{ i,k } ∊ {U, D, Q} indicates the positive, negative or no rate of change for kth time unit. For example, if GDP is having a positive rate of change in 1970 than it is indicated by tuple 〈GDP, 1970, U〉.
 D _{ i,j,k,l }
 Parameters indicate direct relationship, defined as:i.e. the rate of change of P _{ i } matches with the rate of change of P _{ j } after time period l$$D_{i,j,k,l} = \left\{ {\begin{array}{*{20}l} 1 \hfill &\quad {if\,(R_{i,k} = U\,and\,R_{j,k + l} = U)\,or\,(R_{i,k} = D\,and\,R_{j,k + l} = D)} \hfill \\ 0 \hfill &\quad {otherwise} \hfill \\ \end{array} } \right\}$$(3)
 \(S_{D} (P_{i} ,P_{j} ,l)\)
 Support count of direct relationship, defined as:$$S_{D} (P_{i} ,P_{j} ,l) = \mathop \sum \limits_{k = 1}^{n  l} D_{i,j,k,l}$$(4)
 \(\alpha_{D} (P_{i} ,P_{j} ,l)\)
 Support percent of direct relationship, defined as:$$\alpha_{D} (P_{i} ,P_{j} ,l) = \frac{{S_{D} (P_{i} ,P_{j} ,l)}}{n  l}$$(5)
 \(I_{i,j,k,l}\)
 Parameters indicate inverse relationship, defined as:i.e. the rate of change of P _{ i } is opposite to rate of change of P _{ j } after time period l$$I_{i,j,k,l} = \left\{ {\begin{array}{*{20}l} 1 \hfill &\quad {if\,(R_{i,k} = U\, and\, R_{j,k + l} = D) \,or\, (R_{i,k} = D\, and\, R_{j,k + l} = U)} \hfill \\ 0 \hfill & \quad{otherwise} \hfill \\ \end{array} } \right\}$$(6)
 \(S_{I} (P_{i} ,P_{j} ,l)\)
 Support count of inverse relationship, defined as:$$S_{I} (P_{i} ,P_{j} ,l) = \mathop \sum \limits_{k = 1}^{n  l} I_{i,j,k,l}$$(7)
 \(\alpha_{I} (P_{i} ,P_{j} ,l)\)
 Support percent of inverse relationship, defined as:$$\alpha_{I} (P_{i} ,P_{j} ,l) = \frac{{S_{I} (P_{i} ,P_{j} ,l)}}{n  l}$$(8)
 \(\varTheta_{R}\)
 Strength of relationship. It indicates toughness of relationship exists between parameters. The relationship between P _{ i } and P _{ j } is calculated as:$$\begin{aligned} \varTheta_{R} \left( { P_{i} , P_{j} } \right) & = \alpha * \log \left( n \right),\quad {\rm where} \quad \\ \alpha & = \alpha_{D} \left( {P_{i} ,P_{j} ,l} \right)\quad or\quad \alpha_{I} \left( {P_{i} ,P_{j} ,l} \right) \\ \end{aligned}$$(9)
 α _{1}

Support count threshold for all causal relationships (considered as 70 % for experimentation).
 β

Threshold for temporal odds ratio (considered as 3 for experimentation)
Definition 1
(Temporal association) Using direct or indirect relationship [Eqs. (3)–(8)] temporal association can be defined as follows.
Temporal direct association Temporal direct association between two parameters P _{ i } and P _{ j } for time lag l is defined as \(P_{i} \mathop \to \limits^{l} P_{j} \,if\, \alpha_{D} (P_{i} ,P_{j} ,l) \ge \alpha_{1}\).
Temporal inverse association Temporal inverse association between two parameters P _{ i } and P _{ j } for time lag l is defined as \(P_{i} \mathop \to \limits^{l} P_{j} \,if\, \alpha_{I} (P_{i} ,P_{j} ,l) \ge \alpha_{1}\).
Next, we define the terms to calculate the temporal odds ratio of temporally associated parameters to check whether the temporal association rule \(P_{i} \mathop \to \limits^{l} P_{j}\) is also causal rule or not.
Definition 2
(Temporal odds ratio) It quantifies how strongly the presence or absence of change in value of parameter P _{ i } effecting change in value of parameter P _{ j }. Using above terms [Eqs. (11)–(16)] temporal odds ratio is defined as follows.
Further causal rules are defined using terms define in Definitions 1 and 2.
Definition 3
(Binary rule) A binary causal rule \((P_{i} , D, l) \Rightarrow (P_{j} )\), exists between P _{ i } and P _{ j } if there is temporal association rule \(P_{i} \mathop \to \limits^{l} P_{j} \,{\text{and}}\, Oddratio_{D} (P_{i} ,P_{j} ,l) \ge \beta \,{\text{or}}\,Oddratio_{I} (P_{i} ,P_{j} ,l) \ge \beta\).
In experimentation results, we represent direct causal rule by \((P_{i} , D, l) \Rightarrow (P_{j} )\) and inverse by \((P_{i} , I, l) \Rightarrow (P_{j} )\).
This rule will serve as a forward pruning criterion where all parameters which are not associated with another parameter with nonzero lag value are excluded from the combination of future search. The minimum required support makes the search space manageable.
Definition 4
(Precise binary rule) A precise binary rule \((P_{i} , D, \delta_{1} ,l) \Rightarrow (P_{j} ,\delta_{2} )\), exists between P _{ i } and P _{ j } if there is binary rule \((P_{i} , D, l) \Rightarrow (P_{j} )\) and \(\left( {\delta = \delta_{1} } \right),\) i.e. minimum growth rate of change of P _{ i } and \(\left( {\delta = \delta_{2} } \right),\) i.e. minimum growth rate of change of P _{ j } and the rule will not hold either \(\delta > \delta_{1} \,\,for \,\,P_{i } \,\, or\,\, \delta > \delta_{2} \,\,for\,\,P_{j}\).
Definition 5
(\(fscore (\delta_{1} ,\delta_{2} )\)) A function is used to calculate the specificity of the rule. In the experimentation, it is defined as \(fscore (\delta_{1} ,\delta_{2} ) = \delta_{1}^{2} + \delta_{2}^{2}\). If rule \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,\delta_{2} )\) is satisfied for multiple value of \(\delta_{1} ,\delta_{2}\) than the rule which gives the maximum valid fscore is retained.
Based on binary causal rule, we try to extract other causal relationships as transitive, many to one (combined cause) and cyclic. We define these relationships as follows.
Definition 6
(Transitive rule) A transitive rule \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,D,\delta_{2} ,l_{2} ) \Rightarrow \left( {P_{k} , \delta_{3} } \right),\) exists between P _{ i }, P _{ j } and P _{ k } if there is \(r1:(P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,\delta_{2} ), r2: (P_{j} , D, \delta_{2} ,l_{2} ) \Rightarrow (P_{k} ,\delta_{3} ), (P_{i} , D, \delta_{1} ,l_{3} ) \Rightarrow (P_{k} ,\delta_{3} ),\) \(l_{3} \ge l_{1} + l_{2} \,and\, r_{1} (P_{j} ) \cap r_{2} (P_{j} ) \ne \emptyset\).
Definition 7
(Combined cause rule) A many to one rule \(\left( {\left( {P_{i} , D, \delta_{1} ,l_{1} } \right),\left( {P_{j} ,D,\delta_{2} ,l_{2} } \right)} \right) \Rightarrow (P_{k} ,\delta_{3} ),\) exists between P _{ i }, P _{ j } and P _{ k } if there is \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{k} ,\delta_{3} ), (P_{j} , D, \delta_{2} ,l_{2} ) \Rightarrow (P_{k} ,\delta_{3} ), S_{D} \left( {P_{i} ,P_{k} ,l_{1} } \right) \ge \alpha_{1} ,\) \(S_{D} (P_{j} ,P_{k} ,l_{2} ) \ge \alpha_{1} \, and\,S_{D} ((P_{i} , P_{j} ),P_{k} ,l_{1} ,l_{2} ) \ge \alpha_{1}\).
Definition 8
(Cyclic rule) A cyclic rule \((P_{i} , D, \delta_{1} ,l_{1} ) \Leftrightarrow (P_{j} ,D,\delta_{2} ,l_{2} ),\) exists between P _{ i } and P _{ j } if there is \((P_{i} , D, \delta_{1} ,l_{1} ) \Rightarrow (P_{j} ,\delta_{2} ), (P_{j} , D, \delta_{2} ,l_{2} ) \Rightarrow (P_{i} ,\delta_{1} ) \,\,and\,\,S_{D} ((P_{i} , P_{j} ),l_{1} ,l_{2} ) \ge \alpha_{1}\).
Proposed method
Abbreviation table
Abbreviation  Description 

TOR  Temporal odds ratio 
BRS  Binary rule set 
SRS  Specific rule set 
TRS  Transitive rule set 
MOS  Many to one rule set 
CRS  Cyclic rule set 
AG  Agriculture land 
AR  Arable land 
ARME  Agricultural raw materials exports 
CAB  Current account balance 
CY  Cereal yield 
CO2  CO2 emissions 
CP  Crop production 
CPI  Crop production index 
EDOE  Electronic data processing and office equipment 
FDI  Foreign direct investment 
FMP  Fuels and mining products 
FR  Forest rents 
GDP  Gross domestic product 
GGR  General government revenue 
GNS  Gross national savings 
I_{1} to I_{10}  No of indicators (10) 
ICEC  Integrated circuits and electronic components 
IS  Iron and steel 
OM  Other manufactures 
OTE  Office and telecom equipment 
TI  Total investment 
VEG  Volume of exports of goods 
VEGS  Volume of exports of goods and services 
VIG  Volume of imports of goods 
Step 1: Binary rule generation
A causal rule may be generated for multiple lag values, the lag value which gives maximum support of rule will be considered. Suppose P = {P _{1} , P _{2}, P _{3}, P _{4}, P _{5}}, set of time series dataset and using this step 1 BRS generated results are as follows.
BRS = {(P _{1}, P _{2}, D, l, 75, 4), (P _{1} , P _{3} , D, 2, 73, 4), (P _{2} , P _{3} , I, l, 77, 3), (P _{4} , P _{3} , I, l, 71, 6), (P _{2} , P _{5} , D, l, 76, 5), (P _{5} , P _{2} , D, l, 72, 4)}. Here (P _{1} , P _{2} , D, l, 75, 4), describes that parameters P _{1} and P _{2} have a direct relationship with lag 1, support 75 and TOR = 3, which indicates that (P _{1} , P _{2}) are causally related, i.e. P _{1} effects P _{2} after 1 year. Similarly, by comparing support and their odds ratio between parameters for each tuple, the other binary causal relationship can be extracted and interpreted.
Explanation
To describe this step, we consider the time series using rate of change as positive (U) or negative (D) of two parameters say P _{ i } and P _{ j } for a time period (91–97).

T = {1991, 1992, 1993, 1994, 1995, 1996, 1997}

P _{ i } = {U, U, U, U, U, D, U}

P _{ j } = {D, U, U, U, U, U, U}
Support value for lag value 1 α _{ D } (P _{ i } , P _{ j }, 1) = 83 % and temporal odd ratio (TOR), Oddratio _{ D } (P _{ i } , P _{ j }, 1) = 5.
Since calculated α _{ D } > α _{1} and TOR > 3 the rule \((P_{i} , D, 1) \Rightarrow (P_{j} )\), is correct and exists for lag value 1 (i.e. l ≠ 0).
Relationship strength [using Eq. (10)] of this rule is, 70.13.
If time series data are given for some parameters, we can calculate α _{ D } and TOR between parameters and rules can be extracted. So with the help of the above algorithm, we would be able to extract all twovariable causal relationships between parameters for a time series data set.
Step 2: Specific rules generation
In this step, we calculated the specific rule for binary causal rules generated in the above algorithm.
Let \(\gamma_{i}\) and \(\gamma_{j}\) are the rate of change of parameters P _{ i } and P _{ j } and parameters have a direct relationship.
\(Let \delta_{i} max\) = maximum value of the rate of change of P _{ i }, \(\delta_{j} max\) = maximum value of the rate of change of P _{ j }, \(\delta_{i} min\) = minimum value of the rate of change P _{ i }, \(\delta_{j} min\) = minimum value of the rate of change P _{ j }.
Let \(\delta_{1} , \delta_{2}\) is the minimum rate of change of parameters P _{ i } , P _{ j }. Then, using this step 2 more specific causal rules \((P_{i} , D, \delta_{1} ,l) \Rightarrow (P_{j} ,\delta_{2} )\) can be generated. The rule indicates that P _{ i } and P _{ j } have a direct causal relationship with lag 1 and if P _{ i } is changed by \(\delta_{1}\) it leads to change P _{ j } by \(\delta_{2}\). Based on BRS results assumed in step 1 more specific rules can be generated as follows:
SRS = {(P _{1} , P _{2} , D, 1 %, 2 %, 1), (P _{1} , P _{3} , D, 2 %, 1 %, 2), (P _{2} , P _{3} , D, 2 %, 1.5 %, 1), (P _{4} , P _{3} , I, 1.5 %, 2 %, l), (P _{2} , P _{5} , D, 2 %, 3 %, 1), (P _{5} , P _{2} , I, 3 %, 2 %, 1)}.
Step 3: Transitive rule generation
Explanation
To understand this, we consider the time series of three parameters P _{ i } , P _{ j }, and P _{ k } as follows.

\({\text{Support}}\,{\text{value}}\,{\text{of}}\,P_{i} \left( U \right)\,{\text{and}}\,P_{j} \left( D \right),\,\alpha_{ij} \left( {P_{i} , P_{j} ,1} \right) = 77.7.\)

\({\text{Support}}\,{\text{value}}\,{\text{of}}\,P_{j} \left( D \right)\,{\text{and }}\,P_{k} \left( D \right),\alpha_{jk} \left( {P_{j} , P_{k} ,1} \right) = 88.8,\)

\({\text{Support}}\,{\text{value}}\,{\text{of}}\,P_{i} \left( D \right)\,{\text{and }}\,P_{k} \left( D \right),\alpha_{ik} \left( {P_{i} , P_{k} ,2} \right) = 75,\)
Parameter time series
Time  P_{i}  P_{j}  P_{k} 

1991  U  D  U 
1992  U  D  D 
1993  U  D  D 
1994  U  D  D 
1995  U  D  D 
1996  D  D  D 
1997  D  D  D 
1998  U  D  U 
1999  D  D  D 
2000  U  D  D 
Step 4: Many to one (combined causal) rule generation
Based on SRS results in step 2, tuple (P _{1} , P _{3} , D, 2 %, 1 %, 2), (P _{4} , P _{3} , I, 1.5 %, 2 %, l) and using this step 4 generated combined causal rule is \(((P_{1} , D, 2\% ,2),(P_{4} ,I,1.5\% ,1)) \Rightarrow (P_{3} ,1\% ).\)
Explanation
Let we have the following values for parameters \(P_{i} , P_{j} , {\text{and }}P_{k}\).
Parameter time series
Time  P_{i}  P_{j}  P_{k} 

1991  U  U  D 
1992  U  U  D 
1993  U  D  D 
1994  D  U  D 
1995  U  U  D 
1996  U  U  D 
1997  U  U  D 
1998  U  U  D 
1999  U  U  D 
2000  U  U  D 
Calculated support values \(\alpha_{ik} , \alpha_{jk} \,{\text{and}}\,\alpha_{ijk} > \alpha_{1}\) which satisfies Definitions 4 and 7. In Table 3 highlighted rows indicates the \(((P_{i} , P_{j} ), P_{k} )\) relationship. Since all the conditions are satisfied the generated combined rule is\(((P_{i} ,I,\delta_{1} ,1),(P_{j} ,I,\delta_{2} ,1)) \Rightarrow (P_{k} , \delta_{3} )\).
Step 5: Cyclic rule generation
Based on SRS results in step 2, tuple (P _{2} , P _{5} , D, 2 %, 3 %, 1), (P _{5} , P _{2} , I, 3 %, 2 %, 1) and using this step generated cyclic rule is \((P_{2} , D, 2\% ,1) \Leftrightarrow \left( {P_{5} , D,3\% , 1} \right)\).
Explanation
To understand this rule, we consider two parameters say P _{ i }, and P _{ j }, for a time period 1998–2015. Let \(\delta_{1} \, {\text{and}}\, \delta_{2}\) are rate of change for parameters \(P_{i} , P_{j}\) which have the following values.
Parameter time series
Time  P_{i}  P_{j} 

1988  U  U 
1990  U  D 
1991  U  D 
1992  U  D 
1993  D  D 
1994  U  U 
1995  D  D 
1996  D  U 
1997  D  U 
1998  D  U 
Calculated support value α _{ ij } for parameters P _{ i } and P _{ j }: 75 %. Since α _{ ij } > α _{1} cyclic relation is satisfied and generated cyclic causal rule is (P _{ i }, I, δ _{1}, 1) ⇔ (P _{ j }, I, δ _{2}, 1).
Experiments
We implemented our method using Java programming language with Net Beans IDE 7.3. The computation time to check the causal relationship between parameters is high using serialized programming. So we use a parallelization approach in our program using threads in Java on a machine with configuration DualCore CPU contains 12Cores, 8 GB RAM, and 64bit Windows 7 Operating System. Our goal is to discover various causal relationships between the different economic parameters. Firstly, we find all the binary causal rules (i.e. one cause and one effect parameter) and then other causality rules are discovered using proposed method. For experimentation, minimum support threshold α _{1} is set 70 % and \(\beta\) is set 3.
Dataset
Datasets
Name  Length of time series (years)  No of indicators (parameters) 

Synthetic1  40  6 
Synthetic2  40  10 
WTO  31  30 
IMF  34  40 
World Bank  52  1346 
All the datasets are selected to test the effectiveness of proposed method. In our experiments first, we preprocess the continuous data set [Eq. (1)] and represented them by positive, negative and neutral (no) rate of change as U, D, and Q value [Eq. (2)] from the primitive data sets.
Results
Causality rules
Rules  Countries  Support  Strength 

Binary causal rules  
(Cereal production, D, 2 %, 2) \(\Rightarrow\) (agricultural raw materials exports, 3 %)  India  74  120.8767 
Pakistan  76  124.1436  
(Air transport, D, 1 %, 2) \(\Rightarrow\) (GDP growth, 0.22 %)  India  74  120.8767 
Nepal  79  129.0440  
(Cereal production, D, 3 %, 1) \(\Rightarrow\) (crop production index, 1 %)  Srilanka  76  124.1436 
Nepal  81  132.3109  
Afganistan  76  124.1436  
India  76  124.1436  
Transitive causal rules  
(Rural population, D, 1 %, 1) \(\Rightarrow\) (population density, D, 0.33 %, 1) \(\Rightarrow\) (population total, 0.68 %)  Afghanistan  74  120.8767 
India  83  135.5779  
Maldives  77  125.7771  
Nepal  71  115.9763  
(Land under cereal production, D, 3 %, 1) \(\Rightarrow\) (food exports, D, 1 %, 2) \(\Rightarrow\) (GDP growth, 1.5 %)  India  71  115.9763 
Pakistan  72  117.6097  
Bangladesh  71  115.9763  
(Arable land, D, 1 %, 1) \(\Rightarrow\) (agricultural land, D, 1 %, 3) \(\Rightarrow\) (CO2 emissions, 1.5 %)  India  71  115.9763 
Srilanka  71  115.9763  
India  70  114.3428  
Many to one (combined causal) causal rule  
{(Rural population, D, 2.3 %, 1), (urban population D, 0.5 %, 1)} \(\Rightarrow\) (population density, 1 %)  India  79  129.0440 
Afghanistan  72  117.6097  
Pakistan  72  117.6097  
{(Forest rents, I, 5 %, 2), (Foreign direct investment, D, 3 %, 1)} \(\Rightarrow\) (crop production index, 7 %)  Srilanka  72  117.6097 
{(Land under cereal production, D, 0.8 %, 1), (rural population, I, 1 %, 2)} \(\Rightarrow\) (cereal production, 2 %)  Afghanistan  73  119.2432 
India  72  117.6097  
Pakistan  70  114.3428  
Cyclic causal rules  
(Land under cereal production, D, 2.5 %, 2) ⇔ (agricultural land, D, 4.5 %, 1)  India  72  117.6097 
(Gross domestic savings, D, 1 %, 1) ⇔ (cereal yield, D, 0.5 %, 2)  Srilanka  70  114.3428 
India  70  114.3428 
Prediction effectiveness
Entropy of indicators
Indicators  Target indicator entropy  Proposed method conditional entropy after applying rule  Mutual information between indicators 

CP → ARME  1.0973  0.51  0.837 
AG → AR → CO2  1.0986  0.58  0.585 
(FDI, FR) → CPI  1.0972  0.035  0.595 
GDP ←→ CY  1.0961  0.37  0.583 
Table 7 results show that the target indicator entropy is decreased after the rule is applied, which represents that indicator value is more uncertain when it is considered alone. For example, the large value of mutual information between CP and ARME, indicates that the two indicators are related and the entropy of ARME is decreased after the rule CP → ARME is applied. So it can be concluded that the proposed method achieves high prediction effectiveness. We validated all the generated causal rules using the concept of decrease in entropy and mutual information to check their prediction effectiveness. Generated causal rules can also be validated using time series graphs shown in “Appendix”.
Scalability
As shown in Fig. 2, the extraction time increases squarely with the number of indicators. More important, the curve is parabolic, which means that the performance of our algorithm is nonlinearly related to the increase of number of indicators in binary causal rules. Though the time for generation of the binary causal rule is increasing squarely with a number of indicators, time for generation of other rules is not nonlinear because the generation of other rules uses the result of binary rule generation (in Fig. 3).
The proposed method is able to extract nonlinear relationship from extracted causal rules because we are dealing with change of values as the rate of change and this change can be linear or nonlinear.
Discussion
Comparison
To assess the efficiency of the proposed method, we compared proposed method with both statistical and non statistical methods. Statistical (Granger causality, Bayesian network) methods comparison is performed using R software packages as lmtest (Hothorn et al. 2015) for GC and bnlearn (Scutar 2016) for BN. In BN we calculate the results using constraint based local discovery algorithm hiton.pc (Aliferis et al. 2003). For nonstatistical approaches, we implemented the methods (Silverstein et al. 2000; Jin et al. 2012; Li et al. 2013) in Java for causal rule discovery.
Comparison of proposed method with statistical method
Dataset  Indicators relationships  Extracted rules  Statistical methods  

Proposed method  Granger causality  Bayesian network  
Synthetic1 (I_{1}–I_{6})  Binary  I_{1} → I_{3}  ✓  ✓  ✓ 
Many to one  (I_{2}, I_{4}) → I_{5}  ✓  
Transitive  I_{1} → I_{3} → I_{6}  ✓  ✓  
Cyclic  I_{1} ←→ I_{3}  ✓  
Synthetic2 (I_{1}–I_{10})  Binary  I_{1} → I_{7}, I_{2} → I_{7}, I_{7} → I_{2}, I_{1} → I_{3}, I_{7} → I_{8}  ✓  ✓  ✓ 
Many to one  (I_{6}, I_{9}) → I_{7}  ✓  
Transitive  I_{1} → I_{7} → I_{8}  ✓  ✓  
Cyclic  I_{2} ←→ I_{7}  ✓  
WTO  Binary  Chemicals → Textiles Chemicals → OTE  ✓  ✓  ✓ 
Many to one  (OTE, Textiles) → EDOE  ✓  
Transitive  IS → OM → ICEC  ✓  ✓  
Cyclic  OM ←→ IS  ✓  
IMF  Binary  GGR → VEG  ✓  ✓  ✓ 
Many to one  (GGR, GNS) → TI  ✓  
Transitive  GDP → VIG → TI  ✓  ✓  
Cyclic  CAB ←→ VEGS  ✓  
World Bank data  Binary  CP → ARME  ✓  ✓  ✓ 
Many to one  (FDI, FR) → CPI  ✓  
Transitive  AR → AG → CO2  ✓  ✓  
Cyclic  GDP ←→ CY  ✓ 
Comparison of proposed method with non statistical method
Dataset  Indicators relationships  Extracted rules  Nonstatistical methods  

Proposed method  Silverstein et al. (2000)  Jin et al. (2012)  Li et al. (2013)  
Synthetic1 (I_{1}–I_{6})  Binary  I_{1} → I_{3}  ✓  ✓  ✓  ✓ 
Many to one  (I_{2}, I_{4}) → I_{5}  ✓  ✓  ✓  
Transitive  I_{1} → I_{3} → I_{6}  ✓  ✓  
Cyclic  I_{1} ←→ I_{3}  ✓  
Synthetic2 (I_{1}–I_{10})  Binary  I_{1} → I_{7}, I_{2} → I_{7}, I_{7} → I_{2}, I_{1} → I_{3}, I_{7} → I_{8}  ✓  ✓  ✓  ✓ 
Many to one  (I_{6}, I_{9}) → I_{7}  ✓  ✓  ✓  
Transitive  I_{1} → I_{7} → I_{8}  ✓  ✓  
Cyclic  I_{2} ←→ I_{7}  ✓  
WTO  Binary  Chemicals → Textiles Chemicals → OTE  ✓  ✓  ✓  ✓ 
Many to one  (OTE, Textiles) → EDOE  ✓  ✓  ✓  
Transitive  IS → OM → ICEC  ✓  ✓  
Cyclic  OM ←→ IS  ✓  
IMF  Binary  GGR → VEG  ✓  ✓  ✓  ✓ 
Many to one  (GGR, GNS) → TI  ✓  ✓  ✓  
Transitive  GDP → VIG → TI  ✓  ✓  
Cyclic  CAB ←→ VEGS  ✓  
World Bank data  Binary  CP → ARME  ✓  ✓  ✓  ✓ 
Many to one  (FDI, FR) → CPI  ✓  ✓  
Transitive  AR → AG → CO2  ✓  ✓  
Cyclic  GDP ←→ CY  ✓ 
Second, we compared our method with nonstatistical methods. From Table 9 it can observe that binary and combined (many to one) causal relationship can be discovered by Jin et al. (2012) and Li et al. (2013) in all datasets. Silverstein et al. (2000) can also detect many to one rule but independently. For example, if we consider the rule (I _{2} , I _{4}) → I _{5} in the synthetic1 dataset it would be considered as I _{2} → I _{5} ← I _{4}, i.e. I _{2} and I _{4} affect I _{5} independently, so we have not considered the many to one rule generated in a method (Silverstein et al. 2000). A transitive relationship is extracted by Silverstein et al. (2000) and proposed method. Relationships extracted by various methods are shown in Tables 8 and 9.
Based on the experimental results, it is reasonable to conclude that proposed method is capable to extract various causal relationships and causal rules like cyclic and the transitive causal rule cannot be extracted by other methods. Although nonstatistical methods can generate combined causal rules, but are not generating specific rule and relationship strength. One more advantage of our method is that it also generates more specific rule and their strength between indicators. For example, when we run our algorithm on the synthetic1 dataset, rules are extracted with various properties as lag value (time period after which one affects another indicator), strength and the rate of change of indicators i.e. positive or negative percent change. Actually, the rule I _{1} → I _{3} is extracted as \((I_{1} , I, 2\% , 1) \Rightarrow \left( {I_{3} , 1\% } \right)\), 113.6, which indicates 2 % change in I _{1} inversely effect 1 % change in I _{3} after 1 year with 113.6 relationship strength. The results of proposed method are also demonstrated with real world data sets, as described in the following.
To investigate various causal rules in the real world cases, we run the proposed algorithm on the three real world data sets shown in Table 5 for performance evaluation. The proposed algorithm generates various binary, many to one, transitive and cyclic rules, some of the causal rules are reasonable as judged by common sense, shown in Table 8. For example, from the IMF data set, it is found that increases in general government revenue would also increase the volume of exports of goods, increase in growth of general government revenue and gross national saving effect to increase in total investment, and a decrease in government revenue can lead to decreased exports of goods too. Some interesting causal relationships are also extracted in the WTO and World Bank dataset. For example, if crop production of a country is increased, it effects to increase the export of agriculture raw material which helps to improve the economic growth of a country.
Performance evaluation
Prediction accuracy of proposed, statistical and nonstatistical methods on different scales
Accuracy parameters  Proposed method  Li et al. (2013)  Jin et al. (2012)  Silverstein et al. (2000)  Granger causality  Bayesian network 

WBD10, Rules: 50, CR:16, NCR: 34  
Sensitivity  0.94  0.81  0.75  0.69  0.69  0.75 
Specificity  0.91  0.82  0.74  0.65  0.68  0.79 
Precision  0.83  0.68  0.57  0.48  0.50  0.63 
FScore  0.88  0.74  0.65  0.56  0.58  0.69 
Accuracy  0.92  0.82  0.74  0.66  0.68  0.78 
Misclassification rate  0.08  0.18  0.26  0.34  0.32  0.22 
WBD20, Rules: 100, CR:38, NCR: 62  
Sensitivity  0.92  0.84  0.74  0.68  0.66  0.76 
Specificity  0.90  0.82  0.74  0.66  0.68  0.77 
Precision  0.85  0.74  0.64  0.55  0.56  0.67 
FScore  0.89  0.79  0.68  0.61  0.60  0.72 
Accuracy  0.91  0.83  0.74  0.67  0.67  0.77 
Misclassification rate  0.09  0.17  0.26  0.33  0.33  0.23 
WBD30, Rules: 150, CR: 65, NCR: 85  
Sensitivity  0.91  0.80  0.72  0.63  0.65  0.77 
Specificity  0.88  0.81  0.73  0.65  0.66  0.78 
Precision  0.86  0.76  0.67  0.58  0.59  0.72 
FScore  0.88  0.78  0.70  0.60  0.62  0.75 
Accuracy  0.89  0.81  0.73  0.64  0.65  0.77 
Misclassification rate  0.11  0.19  0.27  0.36  0.35  0.23 
WBD40, Rules: 200, CR: 88, NCR: 112  
Sensitivity  0.91  0.80  0.68  0.60  0.59  0.70 
Specificity  0.89  0.81  0.71  0.63  0.61  0.72 
Precision  0.87  0.77  0.65  0.56  0.55  0.65 
FScore  0.89  0.78  0.67  0.58  0.57  0.68 
Accuracy  0.90  0.81  0.70  0.62  0.60  0.72 
Misclassification rate  0.10  0.20  0.30  0.39  0.40  0.32 
WBD50, Rules: 250, CR: 112, NCR: 138  
Sensitivity  0.90  0.79  0.65  0.60  0.57  0.67 
Specificity  0.88  0.79  0.67  0.61  0.59  0.68 
Precision  0.90  0.75  0.62  0.55  0.53  0.63 
FScore  0.90  0.77  0.63  0.58  0.55  0.65 
Accuracy  0.89  0.79  0.66  0.60  0.58  0.68 
Misclassification rate  0.09  0.21  0.34  0.40  0.42  0.32 
In summary the comparison results show that the proposed method has high performance and also performs well in terms of all accuracy measures as compare to other compared methods.
Complexity
The steps defined in an algorithm to make minimum passes over the data. In the first pass, we calculate the growth rate of parameters and its positive, negative or neutral growth rate change value U, D, and Q are assigned to each parameter to perform the next steps. In the second pass, we calculate the support value and an odds ratio of all the individual parameters together with other parameters for different lag values. Nonzero lag value associations identified from the tests are considered. Associations with insufficient support and odds ratio will be eliminated directly. The cause–effect rules in current pairs can be determined from temporal associations and temporal odds ratio for nonzero lag value. At the end, causal pairs found previously are combined for the next steps to generate transitive, many to one and cyclic rule using basic causal binary rule. To achieve efficiency, all the combinations are not considered as a condition during the generation of other causality rules. Instead, we only investigate the combinations appearing in the data which are related to nonzero lag value. Since such combinations are very small as compared to total combinations, the cost of computation is reduced.
To analyze the performance of the algorithm with respect to time and space complexity, and the number of passes over the data set, we denote the set of parameter S, the number of parameters n, the length of the time series t, the number of extracted pairs m and the lag value l. The complexity of the method is discussed based on the extraction of binary causal rules in the form of P _{1} → P _{2} for lag value l.
The single parameters are paired and the support is calculated with O(n) passes over the data set. Each pair combination needs to test for l lag values to determine the association and causality, which requires O(n * l) passes. In the process of extracting binary causal relationships, a causal association will be examined on all combinations.
So the data set needs to scan as many as O (Pnl) times. This way we can conclude the passes over the data set is O (Pnl), and the time it takes is O (Pnlt). Complexity will be substantially reduced by firstly applying the pruning step1 (binary rule generation) before extraction of other relationships.
Conclusion
This paper proposed a novel method to extract various types of causal relationship like binary, transitive, many to one and cyclic in large time series database. The proposed method is generating more specific rules and their strength which are useful for strategic information. We also defined the concept of temporal odds ratio to categorize temporal association as a causal rule. Experiments have shown that the proposed algorithm can extract single, transitive, combined and cyclic causes from large time series data sets. Additionally, the extracted rules are validated to prove their accuracy and the algorithms have been shown to scale up well with respect to the number of indicators on time series data.
In future, the efficiency of the method can be improved by using fast algorithms of mining association rule. The concept of the algorithm can also be extended to other types of time series. The proposed method can be applied in various social, economic, agriculture domains to generate strategic rules for decision making. The method is also useful to detect the exact cause of fault for the large mechanical system which is monitored by various sensors generating time series data.
Notes
Authors’ contributions
SH conceived the idea, designed, analyzed and interpreted the data, involved in the system design and implementation, wrote and drafted the manuscript. PSD supervised the research, responsible for algorithm and manuscript revision for important intellectual content. He gave valuable advices on conducting the study and helped editing the article. Both authors read and approved the final manuscript.
Acknowledgements
The authors would like to thank the department of Computer Science and Engineering, VNIT, Nagpur, for making available required computing facilities.
Competing interests
The authors declare that they have no competing interests.
References
 Abolhosseini S, Heshmati A, Altmann J (2014) The effect of renewable energy development on carbon emission reduction: an empirical analysis for the EU15 countries. Institute for the Study of Labor, Germany. IZA DP no. 7989Google Scholar
 Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216. doi: 10.1145/170036.170072 CrossRefGoogle Scholar
 Aliferis CF, Tsamardinos I, Statnikov A (2003) HITON: a novel Markov Blanket algorithm for optimal variable selection. In: AMIA annual symposium proceedings, American Medical Informatics Association 2003, pp 21–25Google Scholar
 Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. J Mach Learn Res 11:171–234MathSciNetMATHGoogle Scholar
 Arnold A, Liu Y, Abe N (2007) Temporal causal modeling with graphical granger methods. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 66–75Google Scholar
 AsafuAdjaye J (2000) The relationship between energy consumption, energy prices and economic growth: time series evidence from Asian developing countries. Energyeconomics 22(6):615–625. doi: 10.1016/S01409883(00)000505 Google Scholar
 BIS (2011) https://www.gov.uk. Analyses the sources of economic growth in relation to trade and investment. Trade and investment analytical papers. Ref: BIS/11/723
 Cai B, Wang J, He J, Geng Y (2016) Evaluating CO2 emission performance in China’s cement industry: an enterprise perspective. Appl Energy 166:191–200CrossRefGoogle Scholar
 Chen X, Hoffman MM, Bilmes JA, Hesselberth JR, Noble WS (2010) A dynamic Bayesian network for identifying proteinbinding footprints from single moleculebased sequencing data. Bioinformatics 26(12):i334–i342. doi: 10.1093/bioinformatics/btq175 CrossRefPubMedPubMedCentralGoogle Scholar
 Chickering DM (1996) Learning Bayesian networks is NPcomplete. Learning from data. Springer, New York, pp 121–130CrossRefGoogle Scholar
 Chu T, Danks D, Glymour C (2005). Data driven methods for nonlinear granger causality. Clim Teleconnect Mech. doi:10.1.1.85.7974Google Scholar
 Cooper GF (1997) A simple constraintbased algorithm for efficiently mining observational databases for causal relationships. Data Min Knowl Discov 1(2):203–224. doi: 10.1023/A:1009787925236 CrossRefGoogle Scholar
 Deng Y, EbertUphoff I (2014) Weakening of atmospheric information flow in a warming climate in the Community Climate System Model. Geophys Res Lett 41(1):193–200. doi: 10.1002/2013GL058646 ADSCrossRefGoogle Scholar
 Easterly W, Levine R (2003) Tropics, germs, and crops: how endowments influence economic development. J Monet Econ 50(1):3–39. doi: 10.1016/S03043932(02)002003 CrossRefGoogle Scholar
 Ebeke C, Omgba LD (2011) Oil rents, governance quality, and the allocation of talents in developing countries. CERDI, Etudes et Documents, E 2011.23Google Scholar
 EbertUphoff I, Deng Y (2014) Causal discovery from spatiotemporal data with applications to climate science. In: 13th international conference on machine learning and applications, pp 606–613. doi: 10.1371/journal.pcbi.0030129
 Enyedi G, Volgyes I (2016) The effect of modern agriculture on rural development: comparative rural transformation series. Elsevier, Pergaman Press, USA. ISBN 9780080271798Google Scholar
 EPA (1970) https://www3.epa.gov/. United States Environmental Protection Energy, Washington, DC. Accessed 2 December 1970
 Euser AM, Zoccali C, Jager KJ, Dekker FW (2009) Cohort studies: prospective versus retrospective. Nephron Clin Pract 113(3):c214–c217. doi: 10.1159/000235241 CrossRefPubMedGoogle Scholar
 FAO (1945) http://www.fao.org/docrep/006/y4683e/y4683e06.htm#TopOfPage. Agriculture, food and water. chapter two: how the world is fed. Accessed 16 October 2016
 Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, London. ISBN 9780471526292CrossRefMATHGoogle Scholar
 Friedman N, Linial M, Nachman I, Pe’er D (2007) Using Bayesian networks to analyze expression data. J Comput Biol 7(3–4):601–620. doi: 10.1089/106652700750050961 Google Scholar
 Geweke J (1984) Inference and causality in economic time series models. Handb Econom 2:1101–1144CrossRefMATHGoogle Scholar
 Good IJ (1959) A theory of causality. Br J Philos Sci 9(36):307–310CrossRefGoogle Scholar
 Granger CW (1969) Investigating causal relations by econometric models and crossspectral methods. Econometrica 3(37):424–438Google Scholar
 Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier, USAMATHGoogle Scholar
 Heckerman D (1995) A Bayesian approach to learning causal networks. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 285–295Google Scholar
 Heckerman D (1997) Bayesian networks for data mining. Data Min Knowl Disc 1(1):79–119. doi: 10.1023/A:1009730122752 CrossRefGoogle Scholar
 Hothorn T, Zeileis A, Farebrother RW, Cummins C, Millo G, Mitchell D (2015) Package lmtest. In: Testing linear regression models. https://cran.rproject.org/web/packages/lmtest/lmtest.pdf. Accessed 6 June 2015
 International Monetary Fund (1945) US New Hampshire, Bretton Woods. http://www.imf.org. Accessed 1945
 Ji Y, Ying H, Dews P, Mansour A, Tran J, Miller RE, Massanari RM (2011) A potential causal association mining algorithm for screening adverse drug reactions in post marketing surveillance. IEEE Trans Inf Technol Biomed 15(3):428–437. doi: 10.1109/TITB.2011.2131669 CrossRefPubMedGoogle Scholar
 Jin Z, Li J, Liu L, Le TD, Sun B, Wang R (2012) Discovery of causal rules using partial association. In: IEEE 12th international conference in data mining (ICDM), pp 309–318. doi: 10.1109/ICDM.2012.36
 Li X (2005) Foreign direct investment and economic growth: an increasingly endogenous relationship. World Dev 33(3):393–407CrossRefGoogle Scholar
 Li J, Le TD, Liu L, Liu J, Jin Z, Sun (2013) Mining causal association rules. In: IEEE 13th international conference in data mining workshops (ICDMW), pp 114–123. doi: 10.1109/ICDMW.2013.88
 Li J, Liu L, Le T (2015) Practical approaches to causal relationship exploration. Springer, Berlin. doi: 10.1007/9783319144337 CrossRefGoogle Scholar
 Lozano AC, Abe N, Liu Y, Rosset S (2009a) Grouped graphical Granger modeling methods for temporal causal modeling. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 577–586. doi: 10.1145/1557019.1557085
 Lozano AC, Abe N, Liu Y, Rosset S (2009b) Grouped graphical Granger modeling for gene expression regulatory networks discovery. Bioinformatics 25(12):i110–i118CrossRefPubMedPubMedCentralGoogle Scholar
 Ma S, Li J, Liu L, Le TD (2016) Mining combined causes in large data sets. KnowlBased Syst 92:104–111. doi: 10.1016/j.knosys.2015.10.018 CrossRefGoogle Scholar
 Madsen H (2007) Time series analysis. Chapman and Hall/CRC Press, Taylor and Francis Group, Boca Raton. ISBN 9781420058670MATHGoogle Scholar
 Mani S, Spirtes PL, Cooper GF (2012) A theoretical study of Y structures for causal discovery. arXiv:1206.6853
 Marsh C (2013) Introduction to Continuous Entropy. http://www.crmarsh.com/static/pdf/Charles_Marsh_Continuous_Entropy.pdf. Accessed 13 December 2013
 Mehmood S (2012) Effect of different factors on gross domestic products: a comparative study of Bangladesh and Pakistan. doi: 10.1.1.403.5474Google Scholar
 Mellios G, Hausberger S, Keller M, Samaras C, Ntziachristos L, Dilara P, Fontaras G (2011) Parameterisation of fuel consumption and CO2 emissions of passenger cars and light commercial vehicles for modelling purposes. Publications Office of the European Union, EUR. 2011; 24927Google Scholar
 Meyer EP (2014) Package infotheo. In: InformationTheoretic Measures. https://cran.rproject.org/web/packages/infotheo/infotheo.pdf. Accessed 20 February 2015
 Nadkarni S, Shenoy PP (2001) A Bayesian network approach to making inferences in causal maps. Eur J Oper Res 128(3):479–498CrossRefMATHGoogle Scholar
 Neapolitan RE (2004) Learning Bayesian networks. Pearson Prentice Hall, Upper Saddle River. ISBN 9780130125347Google Scholar
 Needham CJ, Bradford JR, Bulpitt AJ, Westhead DR (2007) A primer on learning in Bayesian networks for computational biology. PLoS Comput Biol 3(8):e129. doi: 10.1371/journal.pcbi.0030129 ADSCrossRefPubMedPubMedCentralGoogle Scholar
 Ogawa K, Sterken E, Tokutsu I (2016) Public debt, economic growth and the real interest rate: a panel VAR approach to EU and OECD countries. doi: 10.2139/ssrn.2726367
 Pang DL, Su HW (2010) A test of Granger causality between internal and external imbalances: the case of China, Japan and United States. In: International conference in management and service science (MASS), pp 1–4. doi: 10.1109/ICMSS.2010.5577179
 Pearl J (2014) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Los Altos. ISBN 9780080514895MATHGoogle Scholar
 Pearl J, Verma T (1991) A theory of inferred causation. Knowledge representation and reasoning. In: Proceedings of the seventh annual symposium on principles of programming languages pp 441–452Google Scholar
 Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9:1295–1342. doi: 10.1023/A:1012487302797 MathSciNetMATHGoogle Scholar
 Pinna A, Soranzo N, de la Fuente A (2010) From knockouts to networks: establishing direct cause–effect relationships through graph analysis. PloS One 5(10):e12912. doi: 10.1371/journal.pone.0012912 ADSCrossRefPubMedPubMedCentralGoogle Scholar
 Rasmidatta P (2011) The relationship between domestic saving and economic growth and convergence hypothesis: case study of Thailand. Department of Economics, Sodertorns University. URN: urn:nbn:se:sh:diva9451Google Scholar
 Reichenbach H, Reichenbach M (1991) The direction of time. University of California Press, Berkeley. ISBN 9780520074149MATHGoogle Scholar
 Reinchenbach H (1978) The principle of causality and the possibility of its empirical confirmation. Springer, Netherlands, 1909–1953, pp 345–371. doi: 10.1007/9789400998551_14
 Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal proteinsignaling networks derived from multiparameter singlecell data. Science 308(5721):523–529. doi: 10.1126/science.1109447 ADSCrossRefPubMedGoogle Scholar
 Scutar M (2016) Package bnlearn. In: Bayesian network structure learning, parameter learning and inference. https://cran.rproject.org/web/packages/bnlearn/bnlearn.pdf. Accessed 16 May 2016
 Shipley B (2002) Cause and correlation in biology: a user’s guide to path analysis, structural equations and causal inference. Cambridge University Press, CambridgeGoogle Scholar
 Silverstein C, Brin S, Motwani R, Ullman J (2000) Scalable techniques for mining causal structures. Data Min Knowl Disc 4(2–3):163–192CrossRefGoogle Scholar
 Spirtes P, Glymour CN, Scheines R (2000) Causation, prediction, and search. MIT Press, Cambridge. doi: 10.1007/9781461227489 MATHGoogle Scholar
 StatsCan (1971) Statistics Canada: http://www.statcan.gc.ca/pub/16201x/2009000/partpartie1eng.htm#wbcont. Ottawa, ON
 Stewart A, HopeMorley A, Mock P (2015) For comments or queries please contact: quantifying the impact of realworld driving on total CO2 emissions from UK cars and vans for The Committee on Climate Change. Element Energy Limited, Terrington House, CambridgeGoogle Scholar
 Suppes P (1970) A probabilistic theory of causality. NorthHolland, Amsterdam. doi: 10.1086/288485 Google Scholar
 Tian X, Geng Y, Dai H, Fujita T, Wu R, Liu Z, Masui T, Yang X (2016) The effects of household consumption pattern on regional development: a case study of Shanghai. Energy 103:49–60CrossRefGoogle Scholar
 Veiga DFT, Vicente FFR, Grivet M, De la Fuente A, Vasconcelos ATR (2007) Genomewide partial correlation analysis of Escherichia coli microarray data. Genet Mol Res 6:730–742PubMedGoogle Scholar
 Waldmann MR, Martignon L (1998) A Bayesian network model of causal learning. In: Proceedings of the twentieth annual conference of the Cognitive Science Society, pp 1102–1107Google Scholar
 World Bank Data (1944) USA Washington, DC. http://www.worldbank.org. Accessed 1944
 World Trade Organization (1995) Switzerland. http://www.wto.org. Accessed 1 January 1995
 Zhang NL, Poole D (1996) Exploiting causal independence in Bayesian network inference. J Artif Intell Res 5:301–328Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.