Hierarchical coordination of periodic genes in the cell cycle of Saccharomyces cerevisiae
Gene networks are a representation of molecular interactions among genes or products thereof and, hence, are forming causal networks. Despite intense studies during the last years most investigations focus so far on inferential methods to reconstruct gene networks from experimental data or on their structural properties, e.g., degree distributions. Their structural analysis to gain functional insights into organizational principles of, e.g., pathways remains so far under appreciated.
In the present paper we analyze cell cycle regulated genes in S. cerevisiae. Our analysis is based on the transcriptional regulatory network, representing causal interactions and not just associations or correlations between genes, and a list of known periodic genes. No further data are used. Partitioning the transcriptional regulatory network according to a graph theoretical property leads to a hierarchy in the network and, hence, in the information flow allowing to identify two groups of periodic genes. This reveals a novel conceptual interpretation of the working mechanism of the cell cycle and the genes regulated by this pathway.
Aside from the obtained results for the cell cycle of yeast our approach could be exemplary for the analysis of general pathways by exploiting the rich causal structure of inferred and/or curated gene networks including protein or signaling networks.
KeywordsShort Path Transcriptional Regulatory Network Periodic Gene Causal Interaction Causal Network
Technological progress during the last decade has generated the innovation of new high-throughput devises in molecular biology that allow to measure the molecular orchestra of genes and products thereof on a genomic scale. Mass data from such experiments, e.g., DNA microarray, yeast two-hybrid or ChIP-chip assay, possess considerable challenges for their statistical data analysis. Due to the fact that a functional understanding of a molecular biological system can only be achieved by studying interactions among gene products network based analysis methods have gained considerable popularity [1, 2, 3, 4] because they represent inherently a systems approach [5, 6, 7, 8]. The difficulty in analyzing gene networks, e.g., metabolic, signaling or the transcriptional regulatory network [8, 9, 10, 11] stems at least partly from the fact that many approaches have been developed outside a biological context [12, 13] investigating, e.g., the small-world [14, 15] or scale-free [10, 16] property of networks. However, so far it is largely unknown how to connect such properties meaningfully to the biological function of a molecular biological system.
In this paper we use the transcriptional regulatory network of yeast to analyze cell cycle regulated genes. More precisely, the major purpose of this article is to shed light on the principal mechanism organizing the cell cycle of Saccharomyces cerevisiae by using a novel approach based on the notion of causal membership. Our overall approach to analyze cell cycle-regulated genes , which are also called periodic genes , is based on the transcriptional regulatory network of yeast and a list of known genes to be periodically expressed during the cell cycle. No other data are used. This means explicitly that we do not use time series data of, e.g., DNA microarray experiments that would allow to test for a 'periodic behavior' of genes. Hence, our approach is fundamentally different to all other approaches we are aware of studying cell cycle regulated genes of yeast [19, 20, 21, 22, 23, 24]. The seeming contradiction to study periodically expressed genes without time series data is resolved quickly by clarifying some terms. First, we want to emphasize that we are interested in genes that are cell cycle regulated. That means genes that belong to or participate in a certain biological process namely the cell cycle. From a biological point of view this means we are searching for genes that have a biological function that is important for the coordinated initialization and progression of the cell cycle. Hence, statistically we are searching for genes that are causally connected to the cell cycle. This is the most precise definition we can give formulated in statistical terms. As we see, logically, there is no need to quantify or qualify further entities including, e.g., the periodicity of genes regarding the shape of their signal, to enhance our definition. The causal membership of a gene in the biological process cell cycle is all we need. Approaches developed so far focus entirely on the periodicity of genes in time series as suggested measure in this respect [19, 20, 21, 22, 23, 24, 25]. However, as we explained above it is not imperative to use measures utilizing the periodicity of genes. For this reason we pursue in this paper a novel conceptual path based on the causal membership of genes.
The paper is organized as follows. In the next section we introduce our method and describes the data we use for our analysis. Then we presents numerical results and finish with a discussion and conclusions.
High-throughput technologies enable nowadays to tackle the problem of causal inference of gene networks from experimental data [3, 26, 27, 28] on a genomic scale. Despite the tremendous difficulty of this problem enormous progress has been made during the last years since the seminal work of PEARL et al. [29, 30, 31]. In this paper we use a (directed, unweighted) transcriptional regulatory network (TRN) of yeast that has been assembled from different types of high-throughput data [32, 33] to ensure that the interactions present in the network correspond to real biologically observable interactions (low number of false positive edges) and, hence, to represent a causal interaction structure. An edge in the TRN connecting, e.g., gene A with gene B implies that there exists a biochemical interaction that has been observed experimentally. For example, gene A might be a transcription factor that is involved in the control of the transcription of gene B. In this paper we study the structure of this causal network to gain functional understanding of the cell cycle of yeast. For clarity, we define now the causal membership of a gene.
For example, it is known that MNN1 (YER001W)  is a cell cycle regulated gene. In our terminology this means I cm (cell cycle|MNN1) = 1. Hence, MNN1 is a member of the category cell cycle. In principle, it is possible that one gene is member of more than one category, however, this is not of importance for our investigation because we will focus on just one biological process namely the cell cycle. By introducing definition 1 we want to emphasize the fact that when talking about the biological function of a gene we are interested in the causal involvement of a gene in a certain biological process instead of talking about biochemical properties. If viewed this way it is entirely natural that genes participating, e.g., in the cell cycle can be studied with the help of a causal network representing interactions among these genes. With other words, introducing this level of abstraction helps to see the problem in a different light that would have been overlooked otherwise.
For investigating the organizational structure of genes that are causal members of the biological process cell cycle we use a transcriptional regulatory network G and a list of genes known to be periodically expressed. In the following we make the assumption that this transcriptional regulatory network represents all possible causal interactions among genes. No other interactions can occur.
Assumption 2 The transcriptional regulatory network G represents all possible causal interactions among genes.
We are aware that this assumption is not entirely true because there is also communication among genes involving, e.g., phosphorylation or signaling in general. However, as we will see in the results section, despite the incompleteness of information regarding the consideration of all possible causal interactions, our assumption is sufficient to reveal remarkable results. More information regarding the limitations and possible extensions of our assumption will be given in the discussions section of this article.
Assumption 3 The information among genes can only be transmitted by causal interactions.
The next assumption makes the purpose of causal interactions clear, their purpose is to transmit information among genes. The information transmission between non adjacent genes is less trivial and far from being fully understood. For this reason we make the following simplified assumption.
Assumption 4 The information between non adjacent genes is transmitted via shortest paths.
Finally, we use a property of gene networks to introduce a heterogeneity among genes regarding the transmission of information. It is known that genes and, hence, gene networks, are hierarchically organized [37, 38, 39]. In the following we report a property of the TRN that allows to introduce a two-level hierarchy. The transcriptional regulatory network can be partitioned by the presence or absence of cycles connecting genes. In mathematical terms a part of the network that is cyclic is also called a strongly connected component (SCC) . For example, for a SCC containing at least three genes, A i , A j , A k there exists a cycle A i → ... → A j → ... → A k → ... → A i . The dots indicate that there are possibly other genes involved. However, the important thing is that there exists a cycle on which all three genes appear. This observation is important because the presence of a cycle in a network is a necessary condition that truly periodic behavior can be observed because these genes have the ability to interact (activate/inhibit) each other consecutively and, hence, can form a limit cycle . This leads us to the separation of the genes in two classes. The first class consists of genes that belong to the SCC. The genes in the second class do not belong to the SCC. Further the two classes are not equal but the information should flow in one direction namely from SCC → G/SCC. The reason is that only genes in the SCC can establish a periodic behavior, as explained above, while genes in G/SCC cannot. Based on this classification and hierarchy we state the following assumption.
Assumption 5 The main information flow for the cell cycle in the transcriptional regulatory network connecting periodic genes is organized hierarchically from the SCC to G/SCC.
From all assumptions we made so far we are now in a position to formulate the hypothesis we will investigate in this paper.
Hypothesis 6 Periodic genes in the SCC of the transcriptional regulatory network of yeast coordinate cell cycle regulated genes via shortest path communication.
The reason why we formulated this as a hypothesis rather than a theorem is that hypothesis 6 is based on many assumptions (2 – 4) which are difficult to proof theoretically. However, with the help of experimental data (the transcriptional regulatory network and a list of genes known to be periodic) we can falsify our hypothesis 6 numerically. In the results section we will determine all shortest path from periodic genes in SCC to periodic genes in G/SCC and investigate the structure of the subnetwork obtained this way. This in turn will provide us with information and insights about our hypothesis.
For our analysis we use the transcriptional regulatory network (TRN) of Saccharomyces cerevisiae [32, 33]. This network was assembled from genetic, biochemical and ChIP (chromatin immunoprecipitation)-chip experiments providing above all information about the involvement of transcription factors in the transcription of genes. This network is a directed but unweighted network and each edge represents a biochemical interaction observed experimentally. From this network we extract a weakly connected component (WCC) consisting of 3357 genes and 7230 interactions. The weakly connected component of a network is defined as the directed subnetwork that connects every pair of nodes by at least one directed path . In contrast, the strongly connected component (SCC) is defined as subnetwork that connects each pair of genes in both directions. That means there exists a path connecting, e.g., gene A with gene B but there exists also a path connecting gene B with gene A. The TRN from [32, 33] consists of two strongly connected components. One consists of 36 and the other of just 2 genes. When we speak in the following of the SCC of the TRN we speak always about the larger subnetwork also called the giant strongly connected component . The strongly connected component is part of a weakly connected component, SCC ⊆ WCC. We use a list of ZHAO et al. to label genes as 'periodic' . All other genes not labeled 'periodic' are assumed to be 'non-periodic'. ZHAO et al. categorized 260 genes as periodic, however, only 179 periodic genes are in the WCC we use for our analysis. The reason why we restrict our analysis to the WCC is two fold. First, the TRN of yeast is not known entirely. Second, the knowledge of the TRN is not homogeneous but certain regions are better studied than others with respect to the molecular interactions among genes. The WCC can be seen as filtered network providing the highest quality subnetwork of the TRN currently available. Using in addition other parts of the network would increase the noise level considerable and, hence, be counter productive for our analysis.
We begin our analysis by showing the results we obtain by applying Hypothesis 6 to the transcriptional regulatory network of yeast.
Organization of the cell cycle
List of periodic genes in the SCC (green nodes in Fig. 1).
Statistical evaluation of the network structure
Next, we assess statistically our observations made in Fig. 2 and 3. First, we evaluate the number of periodic genes directly connected to the SCC by calculating the probability to find more than 127 periodic genes connected to the SCC. Because this gives us a p-value for the observed structure. We do this with a hypergeometric distribution assuming that k = 2113 genes (this is the number of genes directly connected to the SCC) are drawn independently from the total set of available genes comprising m = 179 periodic and n = 3178 non-periodic genes (3357 genes in the WCC minus 179 periodic genes) among which are more than x = 127 periodic genes. This gives P hyp (x > 127; m, n, k) = 0.0083 indicating that such a clustering of periodic genes is unlikely to occur by chance.
From these statistical evaluations we conclude the following: First, the high number of periodic genes connected to the SCC cannot be explained by chance. Second, selecting periodic genes from the SCC as starting point for the determination of shortest paths connecting to other periodic genes is absolutely necessary to reach such a high number of periodic genes (141) in the WCC.
Evaluating non-periodic genes
Genes declared to be periodic by Cyclebase (per).
Johnansson et al.
de Lichtenberg et al.
The interpretation from these results is that the application of hypothesis 6 to the transcriptional regulatory network (TRN) of Saccharomyces cerevisiae [32, 33] and a list of periodic genes reveals that our hypothesis is consistent with the data capable of explaining nearly 80% of all periodic genes in the WCC. We found that there are only seven genes inconsistent with hypothesis 6 assuming that the reference list of periodic genes is absolutely true (no false positives, no false negatives).
In this paper we raised a hypothesis regarding the organizational structure of the cell cycle of S. cerevisiae. To formulate our hypothesis we partitioned the set of periodic genes in two groups according to a graph theoretical property leading to a hierarchy in the transcriptional regulatory network from the SCC to G/SCC. We hypothesized that periodic genes in the SCC coordinate cell cycle regulated genes via shortest paths. We presented numerical results testing our hypothesis by using the transcriptional regulatory network from [32, 33] and a list of genes known to be periodically expressed .
Our numerical results demonstrate that by applying our hypothesis to the data (transcriptional regulatory network and the reference list of genes known to be periodically expressed) we find a subnetwork of the overall transcriptional regulatory network connecting almost 80% of all periodic genes in the WCC. A statistical evaluation of the observed network structure revealed that, first, the high number of periodic genes connected to the SCC cannot be explained by chance. Second, selecting periodic genes from the SCC as starting point for the determination of shortest paths connecting to other periodic genes is absolutely necessary to reach such a high number of connected periodic genes (141). This indicates that our at first sight ad hoc hypothesis reflects structural information manifested by the transcriptional regulatory network. Further, we interpret our results conceptionally in a way that the SCC forms a pacemaker of the cell cycle because only genes in the SCC can form cycles (closed paths) and, hence, only these genes can be truly periodic mathematically . To our knowledge the SCC of the transcriptional regulatory network has so far not been interpreted as pacemaker of the cell cycle of yeast.
We based our hypothesis on the transcriptional regulatory network of yeast assuming that this network represents all important causal interactions among genes that might play a crucial role for the information transmission of the system. It is clear that this is a simplification ignoring signaling among genes, e.g., via phosphorylation, to name just one additional effect. For this reason there is another way to interpret our results: How much information regarding the organizational structure of the cell cycle is contributed by the transcriptional regulatory network only. As demonstrated by our numerical results the transcriptional regulatory network seems to make a remarkable high contribution to this because otherwise our hypothesis would not span nearly 80% of the periodic genes in the WCC. This is an interesting result for itself. Furthermore, it would be interesting to see if using additional networks, e.g., the signaling network, helps to improve our results considerably. Also, it is clear that the transcriptional regulatory network we used for our analysis is not complete (false negative edges) nor absolutely correct (false positive edges). For this reason we filtered the overall network using only the WCC to extract a high quality subnetwork. It will be interesting to repeat our analysis in a couple of years using a revised version of the transcriptional regulatory network to see if this network leads to an improvement of our results. Due to the fact that the used network has been assembled from different sources of high-throughput data [32, 33] the probability of false positive edges is expected to be quite low. This means that further experimental results are unlikely to reduce the quality of our results. In contrast, there are certainly quite a few interactions among genes (edges in the network) that are currently absent in the used network (false negative edges). These edges can only lead to an improvement of our results because additional edges can only lead to new paths but not destroy existing ones.
Based on our observations one might speculate that our hypothesis may not only hold for the cell cycle of S. cerevisiae but also for the cell cycle of other organisms. If this would be true then the organizational structure would be evolutionary conserved among organisms. This would provide another important feature for evolutionary biologists next to, e.g., the conservation of protein sequences and structures as well as network motifs, allowing to assess homology on a systems level comprising a functional biological pathway. Further, our concept of causal membership to a biological process may also be extendable to other biological processes than the cell cycle as well as other organisms. Our approach does not utilize information that specifically holds only for the problem studied in this paper. For example, biological processes like apoptosis, cellular differentiation or cell signaling could be studied. It would also be interesting to use our approach in the context of complex diseases like cancer to study pathological modifications of such biological pathways. This seems to be feasible provided the causal network used for the analysis contains sufficient information covering essential aspects of the underlying molecular interactions. In this respect it might be beneficial to combine the transcriptional regulatory network with the signaling or protein network. Not only because this may lead to an increased performance but also to learn about differences of the information encoded in these networks. The latter point could contribute to enhance our understanding of the integration of different types of gene networks which has not been received much attention so far.
Generally, we want to remark that the property cyclicity of a network, used in this paper to define the SCC, has been already used previously to meaningfully separate molecular networks  but in the context to identify structural domains of proteins.
We are of the opinion that approaches similar to ours [33, 45], exploiting the causal network structure of a gene network, will gain rapidly more attention because with the availability of estimation methods to infer causal network structures from high-throughput data [26, 28] the interest will gradually shift towards their analysis. The reason therefor is that gene networks are certainly of interest themselves, however, more interesting is it to use them to disclose functional biological information. If the overall process of the yeast cell cycle follows our hypothesis will be subject to further studies. However, the conceptual structure as revealed by our simple organization of the data, in form of the transcriptional regulatory network, could be exemplary for general studies aiming not only to identify 'important' genes but also to shed light on working principles.
We would like to thank Galina Glazko and Earl Glynn for fruitful discussions and comments on the manuscript and four anonymous reviewers whose comments helped to improve the article. For our numerical simulations we used R  and the igraph package .
This work was partially supported by the Queen's University Belfast and the Centre for Cancer Research and Cell Biology and the COMET Center ONCOTYROL and funded by the Federal Ministry for Transport Innovation and Technology (BMVIT) and the Federal Ministry of Economics and Labour/the Federal Ministry of Economy, Family and Youth (BMWA/BMWFJ), the Tiroler Zukunftsstiftung (TZS) and the State of Styria represented by the Styrian Business Promotion Agency (SFG) [and supported by the the University for Health Sciences, Medical Informatics and Technology and BIOCRATES Life Sciences AG].
- 3.Emmert-Streib F, Dehmer M, : Analysis of Microarray Data: A Network Based Approach. 2008, Wiley-VCH, WeinheimGoogle Scholar
- 5.Alon U: An Introduction to Systems Biology: Design Principles of Biological Circuits. 2006, Chapman & Hall/CRC, Boca Raton, FLGoogle Scholar
- 6.von Bertalanffy L: General System Theory: Foundation, Development, Application. 1968, New York, George BrazillerGoogle Scholar
- 7.Kitano H: Foundations of Systems Biology. 2001, MIT Press, Cambridge MAGoogle Scholar
- 9.Bornholdt S, Schuster H, : Handbook of Graphs and Networks: From the Genome to the Internet. 2003, Wiley-VCH, WeinheimGoogle Scholar
- 15.Watts D: Small Worlds: The Dynamics of Networks between Order and Randomness. 1999, Princeton University Press, Princeton NJGoogle Scholar
- 17.Emmert-Streib F, Dehmer M: Organizational Structure of the Transcriptional Regulatory Network of Yeast: Periodic Genes. Complex Sciences, of Lecture Notes of the Institute for Computer Sciences. Edited by: Zhou J. 2009, 4: 140-148. Springer Berlin HeidelbergGoogle Scholar
- 29.Geiger D, Verma T, Pearl P: D-separation: From Theorems to Algorithms. CSD-890040, UCLA, Computer Science Dept. 1989Google Scholar
- 30.Pearl J: Probabilistic Reasoning in Intelligent Systems. 1988, Morgan-KaufmannGoogle Scholar
- 31.Verma T, Pearl J: Causal networks: semantics and expressiveness. Proceedings of the 4th workshop on uncertainly in artificial intelligence, Mountain View CA. 1988, 352-359.Google Scholar
- 41.Schuster HG: Deterministic Chaos. 1988, Wiley-VCH, WeinheimGoogle Scholar
- 47.Csardi G, Nepusz T: igraph-package. 2008, http://cneurocvs.rmki.kfki.hu/igraph/doc/R/aaa-igraph-package.htmlGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.