The role of master regulators in gene regulatory networks

Gene regulatory networks present a wide variety of dynamical responses to intrinsic and extrinsic perturbations. Arguably, one of the most important of such coordinated responses is the one of amplification cascades, in which activation of a few key-responsive transcription factors (termed master regulators, MRs) lead to a large series of transcriptional activation events. This is so since master regulators are transcription factors controlling the expression of other transcription factor molecules and so on. MRs hold a central position related to transcriptional dynamics and control of gene regulatory networks and are often involved in complex feedback and feedforward loops inducing non-trivial dynamics. Recent studies have pointed out to the myocyte enhancing factor 2C (MEF2C, also known as MADS box transcription enhancer factor 2, polypeptide C) as being one of such master regulators involved in the pathogenesis of primary breast cancer. In this work, we perform an integrative genomic analysis of the transcriptional regulation activity of MEF2C and its target genes to evaluate to what extent are these molecules inducing collective responses leading to gene expression deregulation and carcinogenesis. We also analyzed a number of induced dynamic responses, in particular those associated with transcriptional bursts, and nonlinear cascading to evaluate the influence they may have in malignant phenotypes and cancer.


I. Introduction: Transcriptional master regulators
Phenotypic conditions in living cells are largely determined by the interplay of a multitude of molecules; in particular, genes and their protein products. The coordinated behavior of such a large number of players is often represented by means of a gene regulatory network (GRN). In a GRN, regulatory processes between genes, transcription factors and other molecular components are represented by nodes and links. One common * ehernandez@inmegen.gob.mx way of inferring this gene regulatory networks is by probabilistic analysis of whole genome gene expression data [1,2].
Specific, context-dependent analysis of regulatory activity of particular cellular phenotypes (say tumor cells) may also be performed with the aid of transcriptional interaction networks. Commonly, such GRNs present a complex topology, often compliant with a scale-free hierarchic nature, in which a relatively small number of key players dominate the function and dynamics of the network. Some of these key players in GRNs are transcription factors often known as master regulators (MRs). MRs are deemed responsible for the control of the whole regulatory program for cells under the associated phenotype [3,4]. Master regulators may, indeed, act over rather generalistic cellular processes 070011-1 arXiv:1511.09367v1 [q-bio.MN] 17 Oct 2015 Papers in Physics, vol. 7, art. 070011 (2015) / E. Hernández-Lemus et al. [5], but also on specific cellular phenotypes [4,6,7].
For instance, it is known that the mTOR molecule is active in concerting signals regulating control growth, metabolism, and longevity. Malfunction of mTOR complexes has been associated with developmental abnormalities, autoimmune diseases and cancer [5]. The main role of mTOR seems to be the regulation of protein synthesis. Detailed mechanisms remain unknown, but ribosome profiling seems to point out to translational regulation and transcriptional activation activity. Due to the multiplicity of mTOR signaling interactions, this molecule acts as a master regulator on a variety of phenotypes. More specific master regulatory activity may be exemplified by cases such as the one of VASH1 that has been identified as a master-regulator of endothelial cell apoptosis [4]; PAX5 is known to be a master regulator of B-cell development also involved in neoplastic processes in leukemogenesis [6] and the yeast protein Gcn4Pp (that contains a conserved domain cd12193 present in human JUN proto-oncogene) that is triggered by starvation and stress signals [7] and is an MR in the phenotypic response to such stimuli. Due to the complex mechanisms behind transcriptional regulation in eukaryotes, identification of MRs is mostly based on the (inferred or empirical) relationships between them and their downstream RNA targets in the GRN.
In brief, MRs are transcription factor genes that are located upstream in the genomic regulation programme, hence they possess a high hierarchy in the GRN. They are considered to be important players behind the presence of (some) amplification cascades in transcriptional regulatory networks, and it has been hypothesized that they may coordinate the dynamic transcriptional response and phenotype (in the case of eukaryotes) of the cells.
As it may be evident, MRs may have a big impact on cancer-related phenotypes. This is so since under genome instability conditions, the uncontrolled synthesis of these molecules may give rise to large amplification of transcriptional cascades. In Ref. [11], the role that some molecules (in particular, MEF2C) may have in processes involving metabolic deregulation and MR activity at the onset of primary breast cancer was studied. The approach followed there involved the inference of GRNs centered in a number of molecules considered to be candidate MRs associated with the breast cancer phenotypes at early stages (i.e., primary tumors). As it can be seen there, MEF2C resulted a quite promissory molecule due to the large number of (probabilistically inferred) targets it possesses, but also due to the main biological processes spanned by its targets.

i. A what if ? scenario
Now, let us resort for a moment to a hypothetical scenario: Imagine you have a eukaryotic cell with deregulated metabolism (e.g., large local free energy fluctuations) and a gene with transcription factor activity that has a low activation energy threshold (i.e., a relatively low absolute value for the free energy of formation). Now imagine that this gene is located (within the regulatory network) close to energy transduction pathways and that it also possesses a high hierarchy on the transcriptional regulatory network as well as a relatively large relaxation time.
To put it more clearly, the fact that a gene has low activation energies means that the amount of energy needed to activate its cellular biosynthesis is minimum [8,9], thus making this molecule more prone to be produced by large free energy fluctuations. The probability to have such large energy fluctuations may be increased under abnormal metabolism conditions [10,11,12] that may enable events leading to transcriptional cascades.
In such scenario, large local free energy fluctuations may randomly activate the transcription of such a gene that in turn may be able to activate long ranged transcriptional cascading before decaying, thus affecting to a large degree the whole transcriptional regulation programme of such cell. Indeed, such a gene may be acting as a transcriptional master regulator over that specific cell condition (phenotype).

II. MEF2C as a master regulator
In Ref. [11], we discussed the evidence that may point out to the MEF2C molecule as a candidate master regulator for the transcriptional regulator of human cells under the primary breast cancer phenotype. Regarding this molecule, we know the following: MEF2C is a transcription factor gene located (in humans) in 5q14.3 on the minus strand. This gene is 200,723 bp long and it encodes a 473aminoacid protein weighting 51.221 kDa. MEF2C is a member of the Mef2 family that by means of controlling gene expression (MEF2 molecules are commonly acting as activator transcription factors) is able to regulate cellular differentiation and development [13]. MEF2 members are highly versatile regulators since they contain both MEF2 and MADS-box DNA binding domains (see Fig.  1). The MADS-box serves as the minimal DNAbinding domain, however an adjacent 29-amino acid extension called the Mef2 domain is required for high affinity DNA-binding and dimerization, hence conferring a combinatorial DNA binding repertoire through a number of transcription factor recognition marks [14]. It is also known that the MEF2C protein interacts with MAPK7 (involved in proliferation and differentiation signaling) [15], EP300 (a transcription factor that regulates cell growth and cellular division) [16], TEAD1 (an enhancer TF that co-regulates transcription with MEF2C), as well as with a number of histone deacetylases, most notably HDAC4, HDAC7, and HDAC9 [17,18]. These protein-protein interactions, mostly with other transcription factors, enhancers or epigenomic regulators joined with their inherent binding-site transcriptional activity made MEF2C a quite functional and adaptable MR.
Aside from this, a non-equilibrium thermodynamics analysis of the coupling between transcriptional regulation processes and metabolic de-regulation in breast cancer cells has led to some further evidence pointing out to MEF2C as an MR that may be playing an important role in carcinogenic processes. This thermodynamic evidence has been supplemented with information given by probabilistically-inferred gene regulatory networks centered around genes coordinating the coupling of transcriptional control and metabolism. The GRN was inferred by mutual information calculations [19,20] on a database of 1191 whole genome gene expression experiments in biopsy captured tissue from primary breast cancer patients and healthy controls [11].
To further express that this behavior is related to the coupling between transcriptional regulation control and energy transduction pathways, let us resort to Fig. 2. Figure 2 shows a gene ontology network containing the biological processes statistically enriched in a list of MEF2C regulated genes, differentially expressed in a 1191-sample database of whole genome gene expression experiments curated in our group [11]. In this figure, we may see that statistically enriched biological processes are shown as color-coded (white to red) according to a p-value calculated from a hypergeometric urn model test and corrected via the false discovery rate (FDR) measure as it is explained elsewhere [11]. We can see that the two major families of biological processes enriched are precisely those related with energy-release metabolic pathways and transcriptional regulation. Figure 2: Hierarchical network displaying statistical enriched Gene Ontology Biological Process entries related with MEF2C cascading, we may see that this evidence supports the hypothesis made in Ref. [11] regarding a non-equilibrium coupling between metabolism and transcriptional regulation.

a. Transcription factor binding site analysis
In order to further validate the findings given by non-equilibrium thermodynamics and probabilistic regulatory networks, a computational analysis of databases for DNA transcription factor binding sites (TFBS) was performed. This study included a systematic TFBS analysis for MEF2C transcriptional influence by applying an algorithm (MotEvo) [21] that incorporated -via Bayesian optimization-information additional to the sequence (physicochemical and electrostatic features, motif conservation and phylogeny, ChIP experiments, DeepCAGE sequencing, etc.). Such analysis was performed with a stringent statistical significance level (Response values > 1.5 corresponding approx. to p* < 0.001) and showed that genome-wide MEF2C is able to regulate 200 genes directly.
In fact, second order transcriptional interactions increase the range of influence of MEF2C to a GRN composed of 1896 genes and 2156 regulatory interactions (see Fig. 3) that was able to further increase to 5852 genes and 18801 interactions up to third order.

III. Dynamics of master regulator activity
Cells sometimes present bursts or pulses of activity in their gene expression dynamic patterns. Bursting may result from a series of stochastic biochemical events and may be a source of large phenotypic heterogeneity of cell behavior and thus on cellular conditions and disease. Noise in transcriptional regulatory activity arises not only as a consequence of randomness of biochemical processes at the molecular level due to low molecule counts, it also may arise from thermodynamic fluctuations in cellular components and system level phenomena due to cooperativity. Different levels of gene regulation may be strongly coupled. When all these elements are present, we say that the cells are undergoing transcriptional bursts (TBs). Under such dynamic scenario, protein production occurs in pulses, each due to a single promoter or transcription factor binding event. It is in these instances that the phenomena can be related to the presence of master regulators in the transcriptional networks.

b. Bursting and synchronization
The non-linear behavior of GRN interactions can be better understood in the light of periodic or quasi-periodic expression levels for certain groups of genes. By means of Power Spectral Density (PSD) calculations we may gain greater insight in such dynamic behavior. Power spectral density is useful to describe the evolution of the variance that in turn provides us with greater insight on the correlation structure of the underlying regulatory processes. The power spectral density of stochastic quasi-stationary processes can be estimated when considering non-linear time series analysis as follows. Let us consider Γ as a series containing a time course for intensity levels of gene expresion of a single gene, then the associated power spectral density, I(ω) is given by: ; ω ∈ [0, π] (1) Periodic behavior could be detected in a linear model for Γ: β being a positive constant (amplitude), ω ∈ [0, π], φ is a uniformly distributed phase shift (φ ∈ (−π, π]) and { i } is a sequence of uncorrelated random variables with mean 0 and variance σ 2 independent of φ (i.e. a Gaussian noise). Under this model, periodic behavior could be traced-off by means of looking at significant peaks in the power spectral density, either within an ω-continuous process or more commonly with ω taking discrete values 2πk N ; k = 0, 1, 2 . . . , [ N 2 ], each of these discrete values is known as a Fourier frequency.
If a time course Γ has hidden periodic components, say with a given frequency ω , then the power spectral density will show a peak at ω . If, on the other hand, Γ is a random, aperiodic signal, then the I(ω) Vs ω plot will be a (noisy) straight line, mapping to β = 0 in the linear model as given by Eq. 2. Then we may test the null hypothesis β = 0 versus the data in order to determine significance . An early result from Fisher applies also to finite time series, the so called g-statistic [22]: Values of g larger than expected lead to the rejection of the null hypothesis (i.e., random processes). The exact g-distribution is given by: (4) α = N/2 and r is the largest integer less than 1/x. So if the observed value of g is g , then there is a p-value P (g > g ) to evaluate the null hypothesis.
In Fig. 4, we can see the results of the application of Fisher's significance analysis (red line) to the power spectral density profiles of two genes that are regulated by MEF2C. We can see that both genes present some significant peaks in the power spectral density profiles, which means that some quasi-periodicities are present (i.e., some frequency bands are enriched). Further analyses have shown that the peaks for different MEF2C targets are indeed heavily correlated in some instances. This is related to a mechanism of functional synchronization. For a more detailed description of such analysis, please see Ref. [23].

IV. Conclusions
As a consequence of the arguments just exposed here, we have been able to reach some conclusions about the role that master regulators (in this case MEF2C) may be playing in the function and dynamics of gene regulatory networks for particular phenotypes (in this case potentially related to the onset of primary breast cancer).
In the first place, we may mention that MEF2C, as a transcriptional master regulator, associated with tumor phenotypes has a number of important physico-chemical features: such as having a low activation energy, and long decay times. MEF2C also presents TFBSs of two quite general classes (MADS and MEf2) which makes it a highly versatile transcription factor molecule. Following stringent TFBS analyses, MEF2C is potentially involved in the regulation of up to 200 genes directly, about 2000 at second order, and almost 6000 at third order (almost one fourth of the entire human genome) and also a number of its target genes are, in turn, transcription factors some of them with global activity. However, we must stress that MEF2C is not the only molecule responsible for the regulatory programme of such molecules. To what extent its coordinated action is able to induce the phenotype in vivo is still something to be determined by further experimental data.
In relation to the dynamics of biological processes induced by MEF2C cascading, we have seen that MEF2C targets present 'stochastic bursting' and such bursts in gene expression activity are synchronized and long range correlated to a high degree (i.e., Almost 1/f correlations). Dynamic synchronization and long range correlation appear to be functional biological phenomena. However, ad hoc experimental testing is still in design.
With regards to the biological implications of such findings (especially in the context of cancer biology), we have observed that MEF2C may be associated with tumor phenotypes (at least in primary breast cancer, according to our results). This is so because MEF2C has a number of direct targets (and also many of the indirect ones) which are molecules with known oncogenic activity. The patterns of gene expression of MEF2C targets (even for genes that are not differentially expressed) are able to induce the phenotype consistently. Also, statistical enrichment analyses, both for biological processes and biochemical pathways, showed significant hits associated with cancer related categories.
After this, all that one has to say is that a lot of work is still to be done to understand the complex mechanisms behind the Master Regulatory control of phenotypes (especially in diseaserelated scenarios). Such investigations need to be multidisciplinary in nature and must be anchored in solid mathematical foundations compliant with the tenets of the theory of complex systems, while at the same time must be heavily relying on solid biological knowledge. Comprehensive integrative analyses under the systems biology paradigm will surely hence be a must at the center of discussion on these matters.