Structural identifiability of cyclic graphical models of biological networks with latent variables
 1k Downloads
 2 Citations
Abstract
Background
Graphical models have long been used to describe biological networks for a variety of important tasks such as the determination of key biological parameters, and the structure of graphical model ultimately determines whether such unknown parameters can be unambiguously obtained from experimental observations (i.e., the identifiability problem). Limited by resources or technical capacities, complex biological networks are usually partially observed in experiment, which thus introduces latent variables into the corresponding graphical models. A number of previous studies have tackled the parameter identifiability problem for graphical models such as linear structural equation models (SEMs) with or without latent variables. However, the limited resolution and efficiency of existing approaches necessarily calls for further development of novel structural identifiability analysis algorithms.
Results
An efficient structural identifiability analysis algorithm is developed in this study for a broad range of network structures. The proposed method adopts the Wright’s path coefficient method to generate identifiability equations in forms of symbolic polynomials, and then converts these symbolic equations to binary matrices (called identifiability matrix). Several matrix operations are introduced for identifiability matrix reduction with system equivalency maintained. Based on the reduced identifiability matrices, the structural identifiability of each parameter is determined. A number of benchmark models are used to verify the validity of the proposed approach. Finally, the network module for influenza A virus replication is employed as a real example to illustrate the application of the proposed approach in practice.
Conclusions
The proposed approach can deal with cyclic networks with latent variables. The key advantage is that it intentionally avoids symbolic computation and is thus highly efficient. Also, this method is capable of determining the identifiability of each single parameter and is thus of higher resolution in comparison with many existing approaches. Overall, this study provides a basis for systematic examination and refinement of graphical models of biological networks from the identifiability point of view, and it has a significant potential to be extended to more complex network structures or highdimensional systems.
Keywords
Biological network Graphical model Structural identifiability analysis Structural equation model Symbolicfree eliminationBackground
Although the reductionism approaches have led to tremendous success in advancing our knowledge and understanding of individual biological components and their functions, it has been broadly recognized that many organic/cellular functions or disorders cannot be attributed to an individual molecule [1]. Instead, numerous biological components interact with each other and orchestrate various dynamic events that are critical to the beginning and extension of life [2]. To systematically investigate and understand such complex interactions, a variety of biological networks (e.g., transcriptional and posttranscriptional regulatory networks [3, 4, 5, 6], functional RNA networks [7, 8, 9], proteinprotein interaction networks [10, 11], and metabolic networks [12, 13]) have necessarily been constructed based on experimental observations or predictions. Nowadays, biological networks are playing critical roles in biomedical research and practice at multiple levels or scales (e.g., genetics [14], immunology [15], cancer [16], drug discovery [17, 18]), and the associated modeling and computation techniques and tools are under active development for network property investigation, network structure identification, experimental data analysis and interpretation, and so on [1, 15, 16, 17, 18, 19].
Graphical models are one of the most powerful mathematical languages for biological network representation, and have long been used for various quantitative analysis tasks [19, 20, 21]. In particular, the determination of unknown model parameter values from experimental data is of fundamental importance to many other critical tasks (e.g., computer simulation or prediction, network structure refinement), and it should be stressed that parameter identifiability is one of the first questions that needs to be answered before any statistical method can be applied to obtain accurate and reliable estimates of unknown parameters [20]. More specifically, limited by resources or technical capabilities, it is not uncommon that only part of the nodes or interactions (i.e., edges) in a biological network can be experimentally observed such that the values of certain unknown parameters associated with those unobserved nodes or edges cannot be uniquely determined from experimental data due to the lack of information. However, even if all the nodes and edges are observed, identifiability issues may still occur due to, e.g., model misspecification. It is thus necessary to develop identifiability analysis techniques for graphical models with or without latent variables.
Since graphical models refer to a broad range of mathematical formulations [19, 20, 21, 22], it is impossible to explore the identifiability analysis techniques for all different types of graphical models in one study. Here we focus on the structural identifiability analysis problem of static linear structural equation model (SEM), which is a representative and generic graphical model type that has been widely used in many different research areas such as clinical psychology, education, cognitive science, behavioral medicine, developmental psychology, casual inference [23, 24], and systems biology [25, 26, 27]. A number of previous studies have proposed identifiability analysis techniques for linear SEMs with or without latent variables [23, 24, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43]. More specifically, the traditional method described in [23] constructs a socalled system matrix from a given model structure and derives the rank and order conditions based on this matrix for identifiability analysis. However, this approach can only handle comparatively simple network structures (e.g., block recursive models [23]) without latent variables, and cannot deal with the disturbance correlation between variables (i.e., nodes). To deal with a broader range of model structures, investigators from different disciplines have made further attempts by considering the topological or other features of certain networks. For instance, several previous studies have derived the sufficient criteria for parameter identifiability based on local characteristics of subnetworks, including Pearl’s back door and front door criteria [24], Brito and Pearl’s generalized instrumental variable criterion [30], and Tian’s accessory set approach [41]. For certain network structures, sufficient conditions for parameter identifiability have also been established for the entire network instead of subnetworks; e.g., Brito and Pearl’s conditions for bowfree models [28], Brito and Pearl’s auxiliary sets condition for directed acyclic graph (DAG) models [36], Drton’s condition based on injective parametrization of mixed graphs [35], and Foygel’s halftrek criterion for mixed graphs [37]. While the criteria and conditions mentioned above are important progresses made in the field, they only provide a partial or overall assessment of parameter identifiability. To determine the identifiability of every single parameter in the model, Tian [32] adopted the partial regression analysis technique, but this approach can only handle a special class of Pstructurefree SEMs. Also, Sullivant et al. [34] tackled this problem using a computer algebra method, which turns out to be applicable only to SEMs with a small number of variables due to the prohibitive computation costs associated with Gröbner basis reduction. Therefore, it is still necessary to develop more efficient singleparameterlevel approaches for structural identifiability analysis of whole networks.
In this study, we developed a novel and efficient approach for structural identifiability analysis of cyclic linear SEMs with latent variables. The proposed method is applicable to both directed cyclic and acyclic graphs with or without latent variables, and thus presents an extension of existing algorithms in terms of generality. Different from other existing algebraic approaches, although our method uses the Wright’s path coefficient method to generate identifiability equations in forms of nonlinear symbolic polynomials, it avoids the expensive symbolic computations (e.g., Gröbner basis reduction) by converting identifiability equations to binary matrices, and is thus highly efficient. Moreover, in contrast to other methods that can only draw conclusions on the overall identifiability of a model, the proposed method can determine the identifiability of each single unknown parameter, and is thus of higher resolution and enables researchers to locate the problematic subnetwork structures to refine model structures or improve experimental design. We collected a number of benchmark models from literature and verified the validity of our method using those models. Finally, we applied our method to the network module for influenza A virus (IAV) withinhost replication to gain insights into parameter identifiability and experimental design.
Methods
The key definitions and steps involved in the proposed algorithm are described in this section, including the definition of structural identifiability analysis for cyclic SEMs, the generation of identifiability equations, the conversion to identifiability matrices, and the symbolicfree identifiability determination based on the reduced identifiability matrices. The necessary theoretical justification is also given.
SEM and structural identifiability
In general, the purpose of identifiability analysis is to verify whether certain unknown parameters can be uniquely and reliably determined for given model structures with or without considering data noise or model uncertainty [24, 28, 29, 30, 32, 34, 35, 37, 44]. Here the goal of structural identifiability analysis of SEMs is to determine whether the unknown parameters in matrices C and Ω can be unambiguously determined for a given network structure G = (V, D, U). This type of analysis does not take specific data distribution or noise level into consideration as its primary concern is not the robustness but the accuracy of parameter estimation via examining possible flaws in model structure or experimental design. More importantly, the structural identifiability of a parameter can be verified by checking its number of solutions to a system of polynomial equations. That is, a parameter is globally identifiable if only one solution exists, locally identifiable if a finite number of solutions exist, and unidentifiable if an infinite number of solutions exist [20].
Generating identifiability equations
More specifically, for an acyclic linear SEM (also called recursive SEM that corresponds to a directed acyclic graph), the covariance σ _{ ij } of a pair of variables Y _{ i } and Y _{ j } is calculated as \( {\sigma}_{ij}={\displaystyle \sum_{pat{h}_k}}{\displaystyle \prod_{edg{e}_l}}{\theta}_l \), where θ _{ l } is the coefficient of the lth edge in path k (i.e., c _{ pq } or ω _{ pq }associated with a directed edge V _{ q } → V _{ p } or an undirected V _{ q } ↔ V _{ p }). Note that each path includes at most one undirected edge and must be unblocked [29, 30, 45, 46] (i.e., the two end nodes of a path are connected in the directed graph part G = (V, D)). For a cyclic linear SEM (also called nonrecursive), the directed graph part G = (V, D) contains one or multiple cycles such that we need to enumerate all distinct cycles and paths. The key issue is that, for two nodes in the same cycle, there are two different sets of paths V _{ i } → ⋯ → V _{ j }and V _{ j } → ⋯ → V _{ i } according to the Wright’s method. That is, two different sets of equations can be generated for σ _{ ij } and σ _{ ji }, respectively, although σ _{ ij } = σ _{ ji }. Furthermore, for any latent variable Y _{ i } in a SEM, the covariance between Y _{ i } and any other variable is unknown and cannot be used to generate identifiability equations (see Σ _{ b }, the corresponding covariance matrix of Fig. 1b). In short, the existence of cycles or latent variables will lead to the increase or decrease of the number of identifiability equations, respectively, and thus will eventually affect the number of solutions of unknown model parameters.
Generating identifiability matrices
The identifiability equations are symbolic polynomials and are nonlinear with respect to unknown parameters. Simplifying and solving such equations using the computer algebra algorithms usually presents significant computational challenges [34]. Here we propose a novel and efficient approach, and the basic idea is to convert the identifiability equations to binary matrices, called identifiability matrices.
From Eq. (7), we can generate three matrices for σ _{12}, σ _{14} and σ _{24}, respectively, which are the same as those from Eq. (6) and thus not shown here.
Reducing identifiability matrices
If all elements are 0 in an identifiability matrix M, it is simply a zero matrix (denoted by M _{ Z }). Such matrices may occur during the reduction process. However, a zero matrix is not useful to identifiability analysis because it contains no unknown parameters. Therefore, once an identifiability matrix becomes a zero matrix after a certain number of reduction operations, it can be removed. For the same reason, a zero row in an identifiability matrix can also be deleted.
Given an identifiability matrix M with a row number N _{ R }(M) greater than 1, if all the rows in M are the same, such a matrix is called a repeated matrix (denoted by M _{ R }). The corresponding identifiability equation of a repeated matrix is \( {\sigma}_{ij}={a}_1{\displaystyle \prod_l}{\theta}_l+{a}_2{\displaystyle \prod_l}{\theta}_l\cdots +{a}_K{\displaystyle \prod_l}{\theta}_l \), where all the monomials are the same except for the constant coefficients {a _{1}, a _{2}, …, a _{ K }} in the front. Since the equation can be simplified to \( {\sigma}_{ij}=A\cdot {\displaystyle \prod_l}{\theta}_l \), where A = a _{1} + a _{2} + ⋯ + a _{ K }, the repeated identifiability matrix can be replaced by a single row without loss of information (denoted by M _{ RI }).
 i)
Row swap. Let R _{ i } and R _{ j } (i ≠ j) denote two different rows of an identifiability matrix M _{1}, and let M _{2} denote the matrix generated after swapping R _{ i } and R _{ j }, then M _{1} ~ M _{2}.
 ii)
Redundant row removal. Let R _{ i } and R _{ j } (i ≠ j) denote two different rows of an identifiability matrix M _{1}. If R _{ i } = R _{ j } and let M _{2} denote the matrix generated after removing R _{ i } or R _{ j }, then M _{1} ~ M _{2}.
 iii)Row deletion. Let M _{1} and M _{2} be two identifiability matrices, which correspond to two different identifiability equations, such that N _{ R }(M _{1}) > 1 and M _{2} ⊆ M _{1}. Also, let M _{3} = sub(M _{1}) be a submatrix consisting of M _{1}’s rows that M _{2} has in M _{1}. See Fig. 3 for examples.

If Rem(M _{1 − 2}) ≠ M _{ Z } and Comp(M _{3} − M _{2}) = M _{ Z }, then M _{1} can be reduced to Rem(M _{1 − 2}) without altering the parameter identifiability;

If Rem(M _{1 − 2}) ≠ M _{ Z } and Comp(M _{3} − M _{2}) = M _{ R }, then M _{1} can be reduced to \( \left[\begin{array}{c}\hfill Rem\left({\mathbf{M}}_{12}\right)\hfill \\ {}\hfill {M}_{RI}\hfill \end{array}\right] \) without altering the parameter identifiability;

If Rem(M _{1 − 2}) = M _{ Z } and Comp(M _{3} − M _{2}) = M _{ R }, then M _{1} can be reduced to M _{ RI } without altering the parameter identifiability;

If Rem(M _{1 − 2}) = M _{ Z } and Comp(M _{3} − M _{2}) = M _{ z } (i.e., M _{1} = M _{2} = M _{3}), and take the row which has the least “1” elements in M _{1} to form a new matrix M _{4}, then M _{1} can be reduced to M _{4} without altering the parameter identifiability.

Determining parameter identifiability
After all identifiability matrices are reduced to the simplest forms using the operations described in the previous section, the identifiability of all the unknown parameters can be determined. The simplest case is to find out the globally identifiable, That is, if a matrix has only one row and this row has only one “1” element, the parameter corresponding to that “1” element is then globally identifiable, because the associated identifiability equation is in the form θ _{ i } = const. For example, the matrix of the bottom σ _{34} matrix in Fig. 4 has only one row with only one “1” element, so the parameter c _{34} corresponding to the “1” element is globally identifiable.
 (i)
Apply the bitOR operation to the first two rows, and then to the result and the 3rd row, and so on until the last row of a matrix to generate an indicator vector R _{ p } such that each “1” element in this vector indicates the existence of a certain parameter;
 (ii)
Initialize an output vector R _{ out } as the vector R _{ p } that contains largest number of “1” elements among all R _{ p } s;
 (iii)
Check each of the R _{ p } vectors to verify whether it has any common “1” element with R _{ out } using the bitAND operation. If the bitAND result is not a zero vector, then the identifiability matrix corresponding to R _{ p } will be added to the current group. Then update R _{ out } by applying the bitOR operation to R _{ out } and the bitAND result;
 (iv)
Repeat Step (iii) until no more matrices can be added to the current group;
 (v)
Remove all the matrices of the current group, and repeat steps (ii) to (iv) until all different groups are found.
The identifiability of all the parameters in the same group are determined together. According to the definition of identifiability matrix, one can tell that all the matrices of the same group correspond to a system of coupled polynomial equations, and the critical issue here is to determine the number of solutions of each parameter to these equations. Garcia and Li [47] have theoretically investigated this problem and shown that for a system of n polynomial equations with n complex variables, the number of solutions is equal to \( q={\displaystyle \prod_{i=1}^n}{q}_i \), where q _{ i } is the degree (the power of the highest ordered term) of equation i. Therefore, every unknown variable of the system has a unique solution when q = 1, and has multiple solutions if q > 1. Based on the work of Garcia and Li, we establish the theoretical connection between parameter identifiability and the grouped identifiability matrices, and the theoretical proof is given in Additional file 2 for interested readers.
Theorem 1

When N _{ P } > N _{ M }, all the parameters in the same group are unidentifiable;

When N _{ P } = N _{ M }, the parameters are globally identifiable if N _{max} = 1, and locally identifiable if N _{max} > 1;

When N _{ P } < N _{ M }, the parameters are at least locally identifiable.
Based on Theorem 1, we can determine the structural identifiability of each parameter for the models in Fig. 1. That is, for the model in Fig. 1a, one can tell that the parameter c _{34} is globally identifiable. The remaining matrices are of the same group; and the number of matrices is N _{ M } = 6, the number of unknown parameters is N _{ P } = 5, and N _{max} = 5 is greater than 1. Therefore, all the unknown parameters {c _{31}, c _{42}, c _{43}, ω _{12}, ω _{23}} are locally identifiable. Similarly for the model in Fig. 1 bone can tell N _{ M } = 3 and N _{ P } = 5 so all the parameters {c _{31}, c _{34}, c _{42}, c _{43}, ω _{12}, ω _{23}} are unidentifiable.
Results and discussion
Overview of the framework
Graphical models have long been used to describe biological networks for a variety of important tasks like network structure identification. Many such quantitative analyses involve determination of unknown model parameters from experimental data, and identifiability analysis is a necessary step to perform before parameter estimation to assure the accuracy or robustness of the estimates. In particular, structural identifiability analysis can help to locate misspecified substructures of models or improve experimental design with considering unobserved variables. A number of previous studies have proposed identifiability analysis techniques for structural equations models, with particular attention paid to specific network structures (e.g., directed acyclic graphs) or experimental conditions (e.g., without latent variables). Also, existing methods usually give an overall assessment instead of verifying the identifiability of each single parameter, and the use of symbolic computation algorithms (e.g., Gröbner basis reduction) is computationally expensive and has significantly limited the applications of these methods in more complex biological network structures and moderate to highdimensional systems.
Verification using benchmark models
In order to verify the validity of the proposed method, we have collected a number of benchmark models available in public literature to check whether the identifiability results obtained using our method are consistent with those obtained by other existing methods. Since these existing models do not contain any latent variable, we also consider a model with latent variables at the end of this section to show the capacity of our method.
The first benchmark model is for investigating the effects of smoking on lung cancer [24], the graph contains three nodes (variables), two directed edges, and one undirected edge (disturbance correlation). All the parameters in this model are found to be globally identifiable and the detailed analysis process have been shown in Fig. 6b. The second benchmark model was previously studied by Sullivant et al. [34], and its graph contains three nodes, one directed edge, and two undirected edges. Again, all the parameters in the second model turn out to be globally identifiable and the analysis details are given in Additional file 3. The third benchmark model investigated by Drton et al. [35] is for an acyclic graph with four nodes, three directed edges, and three undirected edges. From the same literature (Ref. [35]), we collected the fourth benchmark model that is more complicated in terms of number of variables and their interactions. The fifth benchmark model derived from the work of Kline el al. [22] is a cyclic graph with six nodes, six directed edges and three undirected edges. The purpose of this model is to show that the proposed approach can deal with cyclic graphs. We derived the sixth benchmark model from the work of Drton et al. [35]. This cyclic graph has six nodes, six directed edges, and three undirected edges; however, for this model, we also considered the case of multigraph (i.e., there exist both a directed edge and an undirected edge between two nodes), which has been paid particular attention in the previous study of Brito and Pearl [36]. We reported the structural identifiability analysis details and results of the third to sixth models also in Additional file 3.
Applications to real biological networks
Numerous biological networks can be found in a variety of databases or knowledge repositories [50, 51]; limited by resources, here we only consider a subnetwork structure of the withinhost influenza virus life cycle as an application example. More specifically, influenza A virus (IAV) can infect multiple species including birds and human, and it has long been a major threat to public health by causing seasonal epidemics or sporadic pandemics [52]. A systematic understanding of IAV infection and immune response mechanisms is thus of significant scientific interest nowadays. For this purpose, a comprehensive map of the influenza virus life cycle together with molecularlevel host responses has been previously constructed from hundreds of related publications by Matsuoka et al. [53], including several critical network modules like virus entry, virus replication and transcription, posttranslational processing, transportation of virus proteins, and packaging and budding. Here we choose the subnetwork of virus replication, to which particular attention has been paid by many previous experimental studies [54, 55, 56, 57].
From the results in Fig. 9b, we can also tell that local network topological structures may have an important effect on parameter identifiability. For example, the NP inhibitor node has an indegree 0 and is unobserved, which is the direct reason why all the edges starting from such a node are unidentifiable. In addition, both the cRNA and cRNP nodes have a comparatively high total degree (an indegree 4 and an outdegree 1 for both nodes); however, the cRNP node is unobserved such that all the edges connected with it are unidentifiable, while the four incoming edges to the cRNA nodes are globally identifiable. The implication of such observations on experimental design is that, the nodes with an indegree or outdegree 0 and the nodes with a high total degree (e.g., hub genes) are suggested to be experimentally observed to reduce the identifiability problem.
Conclusions
In this study, we proposed a novel method for structural identifiability analysis of cyclic graphical models with explicit latent variables. Briefly, to deal with a broader range of network structures, the Wright’s path coefficient method is adapted to generate the identifiability equations and particular attention has been paid to cyclic mixed graphs (as well as the multigraph case, see Benchmark Model 5 in Additional file 3) with explicit latent variables. To achieve high computing efficiency, the identifiability equations are converted to binary identifiability matrices and the necessary strategies have been developed for matrix reduction and regrouping. Parameter identifiability can then be verified at the single parameter level based on the reduced and grouped identifiability matrices after a connection between the number of nonzero matrix elements and the theoretical work of Garcia and Li. The validity of the proposed approach was theoretically justified and further verified using existing benchmark models. In addition, the proposed approach was applied to a real network structure for influenza A virus replication to gain insights into experimental design.
In summary, this study provides a basis for efficient model refinement and informative experiment design, and thus may facilitate investigators to expedite our understanding of network structure and interaction mechanisms in complex biological systems. However, we recognize that many real biological networks are highdimensional with complex nonlinear interactions. Therefore, the proposed approach will need to be extended to deal with more realistic problems in the future.
Abbreviations
DAG, directed acyclic graph; IAV, influenza A virus; ODE, ordinary differential equation; SEM, structure equation model
Notes
Acknowledgements
The authors thank Dr. Yu Luo and Ms. Lijie Wang for useful suggestions and discussions.
Funding
This work was partially supported by the Fundamental Research Funds for the Central Universities of China (ZYGX2014J064).
Availability of data and material
The network structure data used in this study are all selected from public literature, including the FluMap database [53].
Authors’ contributions
YW contributed to method development, computational analyses, and manuscript writing. NL participated the discussions of problem formulation and real network analyses. HM proposed the idea, oversaw the study, and significantly contributed to manuscript preparation. All authors have read and approved the final version of the manuscript.
Authors’ information
YW is Assistant Professor at School of Computer Science and Engineering, University of Electronic Science and Technology of China. NL is Associate Professor at Systems Engineering Institute, Xi’an Jiaotong University, China. HM is Associate Professor at the Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, USA.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Supplementary material
References
 1.Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004;5(2):101–13.CrossRefPubMedGoogle Scholar
 2.Rolland T, Taşan M, Charloteaux B, Pevzner Samuel J, Zhong Q, Sahni N, et al. A proteomescale Map of the human interactome network. Cell. 2014;159(5):1212–26. doi: 10.1016/j.cell.2014.10.050.CrossRefPubMedPubMedCentralGoogle Scholar
 3.Carninci P, Kasukawa T, Katayama S, Gough J, Frith M, Maeda N, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309(5740):1559–63.CrossRefPubMedGoogle Scholar
 4.Minguez P, Letunic I, Parca L, Bork P. PTMcode: a database of known and predicted functional associations between posttranslational modifications in proteins. Nucleic Acids Res. 2013;41(D1):D306–11.CrossRefPubMedGoogle Scholar
 5.Minguez P, Parca L, Diella F, Mende DR, Kumar R, Helmer‐Citterich M, et al. Deciphering a global network of functionally associated post‐translational modifications. Mol Syst Biol. 2012;8(1):599.PubMedPubMedCentralGoogle Scholar
 6.Liu ZP, Wu H, Zhu J, Miao H. Systematic identification of transcriptional and posttranscriptional regulations in human respiratory epithelial cells during influenza A virus infection. BMC Bioinformatics. 2014;15(1):336. doi: 10.1186/1471210515336.CrossRefPubMedPubMedCentralGoogle Scholar
 7.Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120(1):15–20.CrossRefPubMedGoogle Scholar
 8.Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova A. Rational siRNA design for RNA interference. Nat Biotechnol. 2004;22(3):326–30.CrossRefPubMedGoogle Scholar
 9.Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136(4):629–41.CrossRefPubMedGoogle Scholar
 10.Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, et al. A human proteinprotein interaction network: a resource for annotating the proteome. Cell. 2005;122(6):957–68.CrossRefPubMedGoogle Scholar
 11.Rual JF, Venkatesan K, Hao T, HirozaneKishikawa T, Dricot A, Li N, et al. Towards a proteomescale map of the human proteinprotein interaction network. Nature. 2005;437(7062):1173–8.CrossRefPubMedGoogle Scholar
 12.Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási AL. The largescale organization of metabolic networks. Nature. 2000;407(6804):651–4.CrossRefPubMedGoogle Scholar
 13.Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci. 2007;104(6):1777–82.CrossRefPubMedPubMedCentralGoogle Scholar
 14.Greene CS, Krishnan A, Wong AK, Ricciotti E, Zelaya RA, Himmelstein DS, et al. Understanding multicellular function and disease with human tissuespecific networks. Nat Genet. 2015;47(6):569–76. doi: 10.1038/ng.3259. CrossRefPubMedPubMedCentralGoogle Scholar
 15.Kidd BA, Peters LA, Schadt EE, Dudley JT. Unifying immunology with informatics and multiscale biology. Nat Immunol. 2014;15(2):118–27. doi: 10.1038/ni.2787.CrossRefPubMedPubMedCentralGoogle Scholar
 16.Pujana MA, Han JDJ, Starita LM, Stevens KN, Tewari M, Ahn JS, et al. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet. 2007;39(11):1338–49.CrossRefPubMedGoogle Scholar
 17.Butcher EC, Berg EL, Kunkel EJ. Systems biology in drug discovery. Nat Biotech. 2004;22(10):1253–9.CrossRefGoogle Scholar
 18.Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a networkbased approach to human disease. Nat Rev Genet. 2011;12(1):56–68.CrossRefPubMedPubMedCentralGoogle Scholar
 19.Domke J. Learning graphical model parameters with approximate marginal inference. IEEE Trans Pattern Anal Mach Intell. 2013;35(10):2454–67.CrossRefPubMedGoogle Scholar
 20.Miao H, Xia X, Perelson AS, Wu H. On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM Rev. 2011;53(1):3–39.CrossRefGoogle Scholar
 21.Giraud C, Tsybakov A. Discussion: Latent variable graphical model selection via convex optimization. Ann Stat. 2012;40(4):1984–8.CrossRefGoogle Scholar
 22.Mincheva M, Roussel MR. Graphtheoretic methods for the analysis of chemical and biochemical networks. I. Multistability and oscillations in ordinary differential equation models. J Math Biol. 2007;55(1):61–86. doi: 10.1007/s0028500700991.CrossRefPubMedGoogle Scholar
 23.Kline RB. Principles and practice of structural equation modeling. 2nd ed. New York: Guilford Press; 2005.Google Scholar
 24.Pearl J. Causality: models, reasoning, and inference (2nd Edition). Cambridge: Cambridge University Press; 2009.CrossRefGoogle Scholar
 25.Shamaiah M, Lee SH, Vikalo H. Graphical models and inference on graphs in genomics: challenges of highthroughput data analysis. IEEE Signal Process Mag. 2012;29(1):51–65. doi: 10.1109/MSP.2011.943012.CrossRefGoogle Scholar
 26.Cai XBJ, Giannakis GB. Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations. PLoS Comput Biol. 2013;9(5), e1003068. doi: 10.1371/journal.pcbi.1003068.CrossRefPubMedPubMedCentralGoogle Scholar
 27.Dong ZST, Yuan C. Inference of gene regulatory networks from genetic perturbations with linear regression model. PLoS One. 2013;8(12), e83263. doi: 10.1371/journal.pone.0083263.CrossRefPubMedPubMedCentralGoogle Scholar
 28.Brito C, Pearl J. A new identification condition for recursive models with correlated errors. Struct Equ Model. 2002;9(4):459–74.CrossRefGoogle Scholar
 29.Brito C, Pearl J. A graphical criterion for the identification of causal effects in linear models. 18th National Conference on Artificial Intelligence. Edmonton, Alberta, Canada, Amerian Association for Artificial Intelligence: 533538.Google Scholar
 30.Brito C, Pearl J. Generalized instrumental variables, Uncertainty in Artificial Intelligence. 2002. p. 85–93.Google Scholar
 31.Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A. A linear nonGaussian acyclic model for causal discovery. J Mach Learn Res. 2006;7:2003–30.Google Scholar
 32.Tian J, editor. Parameter identification in a class of linear structural equation models, IJCAI. 2009.Google Scholar
 33.Hyttinen A, Eberhardt F, Hoyer PO. Causal discovery for linear cyclic models with latent variables on Probabilistic Graphical Models. 2010. p. 153.Google Scholar
 34.Sullivant S, GarciaPuente LD, Spielvogel S, editors. Identifying causal effects with computer algebra, Proceedings of the twentysixth conference on uncertainty in artificial intelligence. Corvallis: AUAI Press; 2010.Google Scholar
 35.Drton M, Foygel R, Sullivant S. Global identifiability of linear structural equation models. Ann Stat. 2011;39(2):865–86.CrossRefGoogle Scholar
 36.Brito C, Pearl J. Graphical condition for identification in recursive SEM. arXiv:1206. 6821. 2012.Google Scholar
 37.Foygel R, Draisma J, Drton M. Halftrek criterion for generic identifiability of linear structural equation models. Ann Stat. 2012;40(3):1682–713.CrossRefGoogle Scholar
 38.Hoyer PO, Hyvarinen A, Scheines R, Spirtes PL, Ramsey J, Lacerda G, et al. Causal discovery of linear acyclic models with arbitrary distributions. Uncertainty in Artificial Intelligence  UAI. 2008;282289.Google Scholar
 39.Hyttinen A, Eberhardt F, Hoyer PO. Learning linear cyclic causal models with latent variables. J Mach Learn Res. 2012;13(1):3387–439.Google Scholar
 40.Pearl J. The causal foundations of structural equation modeling. 2012. DTIC Document.Google Scholar
 41.Tian J. A criterion for parameter identification in structural equation models. arXiv:12065289. 2012.Google Scholar
 42.Peters J, Bühlmann P. Identifiability of Gaussian structural equation models with equal error variances. Biometrika. 2013. doi: 10.1093/biomet/ast043.Google Scholar
 43.Chen B, Tian J. Pearl J, editors. 2014. Testable Implications of Linear Structural Equation Models. Proceedings of the TwentyEighth AAAI Conference on Artificial Intelligence TECHNICAL REPORT.Google Scholar
 44.Tian J, editor Identifying linear causal effects. AAAI; 2004.Google Scholar
 45.Wright S. Path coefficients and path regressions: alternative or complementary concepts? Biometrics. 1960;16(2):189–202.CrossRefGoogle Scholar
 46.Wright S. The method of path coefficients. Ann Math Stat. 1934;5(3):161–215.CrossRefGoogle Scholar
 47.Garcia C, Li T. On the number of solutions to polynomial systems of equations. SIAM J Numer Anal. 1979.Google Scholar
 48.Bellu G, Saccomani M, Audoly S, Angio L. DAISY: A new software tool to test global identifiability of biological and physiological systems. Comput Methods Programs Biomed. 2007;88(1):52.CrossRefPubMedPubMedCentralGoogle Scholar
 49.Chis OT, Banga J, BalsaCanto E. Structural identifiability of systems biology models: a critical comparison of methods. PLoS One. 2011;6(11):e27755. doi: 10.1371/journal.pone.0027755.
 50.Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012;489(7414):91–100.CrossRefPubMedPubMedCentralGoogle Scholar
 51.Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 2014;42(D1):D199–205.CrossRefPubMedGoogle Scholar
 52.Lakdawala SS, Jayaraman A, Halpin RA, Lamirande EW, Shih AR, Stockwell TB, et al. The soft palate is an important site of adaptation for transmissible influenza viruses. Nature. 2015;526(7571):122–5. doi: 10.1038/nature15379.CrossRefPubMedPubMedCentralGoogle Scholar
 53.Matsuoka Y, Matsumae H, Katoh M, Eisfeld AJ, Neumann G, Hase T, et al. A comprehensive map of the influenza A virus replication cycle. BMC Syst Biol. 2013;7(1):97.CrossRefPubMedPubMedCentralGoogle Scholar
 54.Watanabe T, Kiso M, Fukuyama S, Nakajima N, Imai M, Yamada S, et al. Characterization of H7N9 influenza A viruses isolated from humans. Nature. 2013;501(7468):551–5. doi: 10.1038/nature12392.CrossRefPubMedPubMedCentralGoogle Scholar
 55.Konig R, Stertz S, Zhou Y, Inoue A, Hoffmann HH, Bhattacharyya S, et al. Human host factors required for influenza virus replication. Nature. 2010;463(7282):813–7. CrossRefPubMedPubMedCentralGoogle Scholar
 56.York A, Hutchinson E, Fodor E. Interactome analysis of the influenza A virus transcription/replication machinery identifies protein phosphatase 6 as a cellular factor required for efficient virus replication. J Virol. 2014;88(22):13284–99.CrossRefPubMedPubMedCentralGoogle Scholar
 57.Honda A, Mizumoto K, Ishihama A. Minimum molecular architectures for transcription and replication of the influenza virus. Proc Natl Acad Sci U S A. 2002;99(20):13166–71.CrossRefPubMedPubMedCentralGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.