Abstract
Kauffman’s NK-model is a paradigmatic example of a class of stochastic models of genotypic fitness landscapes that aim to capture generic features of epistatic interactions in multilocus systems. Genotypes are represented as sequences of L binary loci. The fitness assigned to a genotype is a sum of contributions, each of which is a random function defined on a subset of \(k \le L\) loci. These subsets or neighborhoods determine the genetic interactions of the model. Whereas earlier work on the NK model suggested that most of its properties are robust with regard to the choice of neighborhoods, recent work has revealed an important and sometimes counter-intuitive influence of the interaction structure on the properties of NK fitness landscapes. Here we review these developments and present new results concerning the number of local fitness maxima and the statistics of selectively accessible (that is, fitness-monotonic) mutational pathways. In particular, we develop a unified framework for computing the exponential growth rate of the expected number of local fitness maxima as a function of L, and identify two different universality classes of interaction structures that display different asymptotics of this quantity for large k. Moreover, we show that the probability that the fitness landscape can be traversed along an accessible path decreases exponentially in L for a large class of interaction structures that we characterize as locally bounded. Finally, we discuss the impact of the NK interaction structures on the dynamics of evolution using adaptive walk models.
Similar content being viewed by others
Notes
Note that incorrect expressions for \(\rho (d)\) appear in some of the literature preceding [8].
References
Aita, T., Uchiyama, H., Inaoka, T., Nakajima, M., Kokubo, T., Husimi, Y.: Analysis of a local fitness landscape with a model of the rough Mt. Fuji-type landscape: application to prolyl endopeptidase and thermolysin. Biopolymers 54(1), 64–79 (2000)
Altenberg, L.: NK fitness landscapes. In: Bäck, T., Fogel, D.B., Michalewicz, Z. (eds.) Handbook of Evolutionary Computation. IOP Publishing Ltd and Oxford University Press, Oxford (1997)
Bank, C., Matuszewski, S., Hietpas, R.T., Jensen, J.D.: On the (un)predictability of a large intragenic fitness landscape. Proc. Nat. Acad. Sci. USA 113, 14085–14090 (2016)
Berestycki, J., Brunet, É., Shi, Z.: The number of accessible paths in the hypercube. Bernoulli 22, 653–680 (2016)
Berestycki, J., Brunet, É., Shi, Z.: Accessibility percolation with backsteps. ALEA, Lat. Am. J. Probab. Math. Stat. 14, 45–62 (2017)
Buzas, J., Dinitz, J.: An analysis of NK landscapes: interaction structure, statistical properties and expected number of local optima. IEEE Trans. Evolut. Comput. 18(6), 807–818 (2014)
Campos, P.R.A., Adami, C., Wilke, C.O.: Optimal adaptive performance and delocalization in NK fitness landscapes. Phys. A: Stat. Mech. Appl. 304, 495–506 (2002)
Campos, P.R.A., Adami, C., Wilke, C.O.: Optimal adaptive performance and delocalization in NK fitness landscapes (Erratum). Phys. A: Stat. Mech. Appl. 318, 637 (2003)
Carneiro, M., Hartl, D.L.: Adaptive landscapes and protein evolution. Proc. Nat. Acad. Sci. USA 107, 1747–1751 (2010)
Crona, K., Greene, D., Barlow, M.: The peaks and geometry of fitness landscapes. J. Theor. Biol. 318, 1–10 (2013)
Crona, K., Gavryushkin, A., Greene, D., Beerenwinkel, N.: Inferring genetic interactions from comparative fitness data. eLife 6, e28629 (2017)
de Oliviera, V.M., Fontanari, J.F., Stadler, P.F.: Metastable states in short-ranged \(p\)-spin glasses. J. Phys. A 32, 8793–8802 (1999)
de Visser, J.A.G.M., Krug, J.: Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet. 15, 480–490 (2014)
de Visser, J.A.G.M., Park, S.C., Krug, J.: Exploring the effect of sex on empirical fitness landscapes. Am. Nat. 174, S15–S30 (2009)
de Visser, J.A.G.M., Cooper, T.F., Elena, S.F.: The causes of epistasis. Proc. R. Soc. Lond. Ser. B 278, 3617–3624 (2011)
de Haan, L., Ferreira, A.: Extreme Value Theory: An Introduction. Springer Series in Operations Research. Springer, Berlin (2006)
Dean, D.S.: Metastable states of spin glasses on random thin graphs. Eur. Phys. J. B 15, 493–498 (2000)
DePristo, M.A., Hartl, D.L., Weinreich, D.M.: Mutational reversions during adaptive protein evolution. Mol. Biol. Evol. 24, 1608–1610 (2007)
Durrett, R., Limic, V.: Rigorous results for the NK model. Ann. Prob. 31, 1713–1753 (2003)
Evans, S.N., Steinsaltz, D.: Estimating some features of NK fitness landscapes. Ann. Appl. Probab. 12, 1299–1321 (2002)
Ferretti, L., Schmiegelt, B., Weinreich, D., Yamauchi, A., Kobayashi, Y., Tajima, F., Achaz, G.: Measuring epistasis in fitness landscapes: the correlation of fitness effects of mutations. J. Theor. Biol. 396, 132–143 (2016)
Fiocco, D., Foffi, G., Sastry, S.: Encoding of memory in sheared amorphous solids. Phys. Rev. Lett. 112, 025702 (2014)
Flyvbjerg, H., Lautrup, B.: Evolution in a rugged fitness landscape. Phys. Rev. A 46, 6714–6723 (1992)
Franke, J., Krug, J.: Evolutionary accessibility in tunably rugged fitness landscapes. J. Stat. Phys. 148, 705–722 (2012)
Franke, J., Klözer, A., de Visser, J.A.G.M., Krug, J.: Evolutionary accessibility of mutational pathways. PLoS Comput. Biol. 7(8), e1002,134 (2011)
Gavrilets, S.: Fitness Landscapes and the Origin of Species. Princeton University Press, Princeton (2004)
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate Normal and t Distributions. R package version 1.0-6 (2017)
Genz, A.: Numerical computation of multivariate normal probabilities. J. Comput. Gr. Stat. 1(2), 141–149 (1992)
Gillespie, J.H.: A simple stochastic gene substitution model. Theor. Popul. Biol. 23, 202–215 (1983)
Gillespie, J.H.: Molecular evolution over the mutational landscape. Evolution 38, 1116–1129 (1984)
Haldane, J.B.S.: A mathematical theory of natural selection, Part VIII: metastable populations. Proc. Camb. Philos. Soc. 27, 137–142 (1931)
Hartl, D.L.: What can we learn from fitness landscapes? Curr. Opin. Microbiol. 21, 51–57 (2014)
Hegarty, P., Martinsson, A.: On the existence of accessible paths in various models of fitness landscapes. Ann. Appl. Probab. 24, 1375–1395 (2014)
Hwang, S., Park, S.C., Krug, J.: Genotypic complexity of Fisher’s geometric model. Genetics 206, 1049–1079 (2017)
Isner, B.A., Lacks, D.J.: Generic rugged landscapes under strain and the possibility of rejuvenation in glasses. Phys. Rev. Lett. 96, 025506 (2006)
Jain, K.: Number of adaptive steps to a local fitness peak. Europhys. Lett. 96, 58006 (2011)
Jain, K., Seetharaman, S.: Multiple adaptive substitutions during evolution in novel environments. Genetics 189, 1029–1043 (2011)
Kanwal, R.P.: Linear Integral Equations: Theory & Technique. Modern Birkhäuser Classics. Birkhäuser, Basel (2012)
Kauffman, S.A.: The Origins of Order. Oxford University Press, Oxford (1993)
Kauffman, S., Levin, S.: Towards a general theory of adaptive walks on rugged landscapes. J. Theor. Biol. 128(1), 11–45 (1987)
Kauffman, S.A., Weinberger, E.D.: The NK model of rugged fitness landscapes and its application to maturation of the immune response. J. Theor. Biol. 141, 211–245 (1989)
Kimura, M.: On the probability of fixation of mutant genes in a population. Genetics 47, 713–719 (1962)
Kingman, J.F.C.: A simple model for the balance between selection and mutation. J. Appl. Probab. 15(1), 1–12 (1978)
Kondrashov, D.A., Kondrashov, F.A.: Topological features of rugged fitness landscapes in sequence space. Trends Genet. 31, 24–33 (2015)
Kouyos, R.D., Leventhal, G.E., Hinkley, T., Haddad, M., Whitcomb, J.M., Petropoulos, C.J., Bonhoeffer, S.: Exploring the complexity of the HIV-1 fitness landscape. PLoS Genet. 8, e100255151 (2012)
Levinthal, D.A.: Adaptation on rugged landscapes. Manag. Sci. 43, 934–950 (1997)
Limic, V., Pemantle, R.: More rigorous results on the Kauffman-Levin model of evolution. Ann. Prob. 32, 2149–2178 (2004)
Macken, C.A., Perelson, A.S.: Protein evolution on rugged landscapes. Proc. Nat. Acad. Sci. USA 86, 6191–6195 (1989)
Macken, C.A., Hagan, P.S., Perelson, A.S.: Evolutionary walks on rugged landscapes. SIAM J. Appl. Math. 51(3), 799–827 (1991)
Manukyan, N., Eppstein, M.J., Buzas, J.S.: Tunably rugged landscapes with known maximum and minimum. IEEE Trans. Evolut. Comput. 20, 263–274 (2016)
Martinsson, A.: Accessibility percolation and first-passage site percolation on the unoriented binary hypercube. Preprint arXiv:1501.02206 (2015)
Mustonen, V., Lässig, M.: From fitness landscapes to seascapes: non-equilbrium dynamics of selection and adaptation. Trends Genet. 25, 111–119 (2009)
Neidhart, J., Krug, J.: Adaptive walks and extreme value theory. Phys. Rev. Lett. 107, 178102 (2011)
Neidhart, J., Szendro, I.G., Krug, J.: Exact results for amplitude spectra of fitness landscapes. J. Theor. Biol. 332, 218–227 (2013)
Neidhart, J., Szendro, I.G., Krug, J.: Adaptation in tunably rugged fitness landscapes: the rough Mount Fuji Model. Genetics 198, 699–721 (2014)
Nowak, S., Krug, J.: Analysis of adaptive walks on NK fitness landscapes with different interaction schemes. J. Stat. Mech.: Theory Exp. 2015, P06014 (2015)
Nowak, S.: Properties of Random Fitness Landscapes and Their Influence on Evolutionary Dynamics. A Journey through the Hypercube. PhD dissertation, Cologne (2015)
Nowak, S., Krug, J.: Accessibility percolation on \(n\)-trees. Europhys. Lett. 101, 66004 (2013)
Nowak, S., Neidhart, J., Szendro, I.G., Krug, J.: Multidimensional epistasis and the transitory advantage of sex. PLoS Comput. Biol. 10, e1003836 (2014)
Ohta, T.: The meaning of near-neutrality at coding and non-coding regions. Gene 205, 261–267 (1997)
Orr, H.A.: The population genetics of adaptation: the adaptation of DNA sequences. Evolution 56, 1317–1330 (2002)
Orr, H.A.: A minimum on the mean number of steps taken in adaptive walks. J. Theor. Biol. 220, 241–247 (2003)
Orr, H.A.: The population genetics of adaptation on correlated fitness landscapes: the block model. Evolution 60, 1113–1124 (2006)
Østman, B., Hintze, A., Adami, C.: Impact of epistasis and pleiotropy on evolutionary adaptation. Proc. R. Soc. Lond. Ser. B 279, 247–256 (2012)
Park, S.C., Krug, J.: \(\delta \)-exceedance records and random adaptive walks. J. Phys. A 49, 315601 (2016)
Park, S.C., Simon, D., Krug, J.: The speed of evolution in large asexual populations. J. Stat. Phys. 138, 381–410 (2010)
Park, S.C., Szendro, I.G., Neidhart, J., Krug, J.: Phase transition in random adaptive walks on correlated fitness landscapes. Phys. Rev. E 91, 042707 (2015)
Park, S.C., Neidhart, J., Krug, J.: Greedy adaptive walks on a correlated fitness landscape. J. Theor. Biol. 397, 89–102 (2016)
Perelson, A.S., Macken, C.A.: Protein evolution on partially correlated landscapes. Proc. Natl. Acad. Sci. USA 92(21), 9657–9661 (1995)
Phillips, P.C.: Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat. Rev. Genet. 9, 855–867 (2008)
Poelwijk, F.J., Kiviet, D.J., Weinreich, D.M., Tans, S.J.: Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445, 383–386 (2007)
Poelwijk, F.J., Tănase-Nicola, S., Kiviet, D.J., Tans, S.J.: Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapes. J. Theor. Biol. 272, 141–144 (2011)
Poelwijk, F.J., Krishna, V., Ranganathan, R.: The context-dependence of mutations: a linkage of formalisms. PLoS Comput. Biol. 12, e1004,771 (2016)
Pokusaeva, V.O., Usmanova, D.R., Putintseva, E.V., Espinar, L., Sarkisyan, K.S., Mishin, A.S., Bogatyreva, N.S., Ivankov, D.N., Povolotskaya, I.S., Filion, G.J., Carey, L.B., Kondrashov, F.A.: Experimental assay of a fitness landscape on a macroevolutionary scale. Preprint bioRxiv 222778 (2017)
Provine, W.B.: Sewall Wright and Evolutionary Biology. University of Chicago Press, Chicago (1986)
Reidys, C.M., Stadler, P.F.: Combinatorial landscapes. SIAM Rev. 44, 3–54 (2002)
Richter, H., Engelbrecht, A. (eds.): Recent Advances in the Theory and Application of Fitness Landscapes. Springer, Berlin (2014)
Rowe, W., Platt, M., Wedge, D.C., Day, P.J., Kell, D.B., Knowles, J.: Analysis of a complete DNA-protein affinity landscape. J. R. Soc. Interface 7, 397–408 (2010)
Sailer, Z.R., Harms, M.J.: High-order epistasis shapes evolutionary trajectories. PLoS Comput. Biol. 13, e1005,541 (2017)
Schmiegelt, B.: Sign epistasis networks. Master thesis, Cologne (2016)
Schmiegelt, B., Krug, J.: Evolutionary accessibility of modular fitness landscapes. J. Stat. Phys. 154(1), 334–355 (2014)
Seetharaman, S., Jain, K.: Length of adaptive walk on uncorrelated and correlated fitness landscapes. Phys. Rev. E 90, 032703 (2014)
Stadler, P.F.: Landscapes and their correlation functions. J. Math. Chem. 20, 1–45 (1996)
Stadler, P.F., Happel, R.: Random field models for fitness landscapes. J. Math. Biol. 38, 435–478 (1999)
Stein, D.L. (ed.): Spin Glasses and Biology. World Scientific, Singapore (1992)
Svensson, E.I., Calsbeek, R. (eds.): The Adaptive Landscape in Evolutionary Biology. Oxford University Press, Oxford (2012)
Szendro, I.G., Schenk, M.F., Franke, J., Krug, J., de Visser, J.A.G.M.: Quantitative analyses of empirical fitness landscapes. J. Stat. Mech.: Theory Exp. 2013, P01005 (2013)
Tomassini, M., Vérel, S., Ochoa, G.: Complex-network analysis of combinatorial spaces: the NK landscape case. Phys. Rev. E 78, 066114 (2008)
Touchette, H.: The large deviation approach to statistical mechanics. Phys. Rep. 478(1), 1–69 (2009)
Valente, M.: An NK-like model for complexity. J. Evolut. Econ. 24, 107–134 (2014)
Weinberger, E.D.: Fourier and Taylor series on fitness landscapes. Biol. Cybern. 65, 321–330 (1991)
Weinberger, E.D.: Local properties of Kauffman’s N-k model: a tunably rugged energy landscape. Phys. Rev. A 44, 6399–6413 (1991)
Weinreich, D.M., Watson, R.A., Chao, L.: Sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59, 1165–1174 (2005)
Weinreich, D.M., Delaney, N.F., DePristo, M.A., Hartl, D.L.: Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006)
Weinreich, D.M., Lan, Y., Wylie, C.S., Heckendorn, R.B.: Should evolutionary geneticists worry about higher-order epistasis? Curr. Op. Genet. Dev. 23, 700–707 (2013)
Welch, J.J., Waxman, D.: The nk model and population genetics. J. Theor. Biol. 234, 329–340 (2005)
Whitlock, M.C., Phillips, P.C., Moore, F.B.G., Tonsor, S.J.: Multiple fitness peaks and epistasis. Annu. Rev. Ecol. Systemat. 26, 601–629 (1995)
Wilke, C.O., Martinetz, T.: Adaptive walks on time-dependent fitness landscapes. Phys. Rev. E 60, 2154–2159 (1999)
Wright, S.: The roles of mutation, inbreeding, crossbreeding and selection in evolution. In: Proceedings of the 6th International Congress of Genetics, vol. 1, pp. 356–366 (1932)
Wright, A.H., Thompson, R.K., Zhang, J.: The computational complexity of N-K fitness functions. IEEE Trans. Evolut. Comput. 4, 373–379 (2000)
Wu, N.C., Dai, L., Olson, C.A., Lloyd-Smith, J.O., Sun, R.: Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife 5, 16965 (2016)
Zagorski, M., Burda, Z., Waclaw, B.: Beyond the hypercube: evolutionary accessibility of fitness landscapes with realistic mutational networks. PLoS Comput. Biol. 12(12), e1005218 (2016)
Acknowledgements
We thank David Dean for useful discussions, and an anonymous reviewer for constructive remarks on the manuscript. JK acknowledges the kind hospitality of the MPI for Physics of Complex Systems (Dresden) and the Kavli Institute for Theoretical Physics (Santa Barbara) during the completion of the paper. This research was supported by DFG within SFB 680 Molecular basis of evolutionary innovations and SPP1590 Probabilistic structures in evolution, and in part by the National Science Foundation Grant No. NSF PHY-1125915, NIH Grant No. R25GM067110, and the Gordon and Betty Moore Foundation Grant No. 2919.01.
Author information
Authors and Affiliations
Corresponding author
Appendices
A Asymptotics of \( \pi _\mathrm {max}^{\mathrm {MF}} \) in the Joint Limit \(k, L \rightarrow \infty \)
We start from Eq. (34). Rescaling \(y \rightarrow \frac{\eta y}{\sqrt{2}}\), we rewrite the equation in terms of the CDF of a standard Gaussian distribution \( \varPhi (y) \) as
where \(\mu \equiv \frac{L \eta ^2}{2}\) which converges to \(\frac{(2-\alpha )}{\alpha } \) in the joint limit as can be seen from Eq. (32).
Interestingly, the only L-dependence shown in the above equation appears as an L-th power of the CDF \(\varPhi (y)\), which converges monotonically to unity as \( y \rightarrow \infty \). This implies that the conventional saddle point method cannot be applied here due to the absence of a maximum. Instead, we can rely on the extreme value theory by interpreting the term \(\varPhi (y)^L\) as the probability that L randomly sampled standard Gaussian random variables are less than y. This leads immediately to the limit relation [16]
where G(x) is the Gumbel CDF defined by \(G(x) = e^{- e^{-x}}\), and the two scaling factors are given by \(a_L = \sqrt{2 \ln L}\) and
After making the change of variable \(y = \frac{x}{a_L} + b_L\), the integral is now of the form
The evaluation of the integral with respect to x is greatly simplified once one notices that the term \(\frac{x^2}{a_L^2}\) in the exponent is sub-leading in L. Ignoring this term gives
where we have used the identity
for positive M. Next, expanding \(a_L\) and \(b_L\) and rearranging the terms gives
As expected from the formal analysis in Sect. 3.2.2, the leading order behavior is given by a power law with exponent \(\mu = (2-\alpha )/\alpha \). By contrast, the existence of a non-trivial logarithmic correction is unexpected, in particular since such a correction does not appear in the exact result \( \pi _\mathrm {max}^{\mathrm {HoC}} = (L+1)^{-1} \) for the HoC model (\(\alpha = \mu = 1\)). Remarkably, the logarithmic factors precisely cancel in this particular case.
B Variational Analysis at the Maximum of \(\lambda _k^\mathrm {AN}\)
In Fig. 4, we observed that \(\lambda _2^\mathrm {AN}\) for the negative gamma distribution with shape parameter s is maximized at \(s=1/2\). Furthermore, we claimed that this can be naturally generalized to arbitrary values of k if we replace the shape parameter by 1 / k. As a next question, one might further ask if \(\lambda _k^\mathrm {AN}\) is an extremum also with respect to arbitrary variations in the space of base fitness distributions \(p_f\). Here, we prove that this is indeed the case for distributions with support limited to the negative real axis.
Let us first evaluate the k-fold convolution of the gamma distribution needed to compute Eq. (42). This is easily achieved using the property that the gamma distribution is closed under the convolution operation, i.e., the k-fold convolution of the gamma distribution with shape parameter s is the gamma distribution with shape parameter sk. If we choose as our base distribution the negative gamma distribution with shape parameter \(s=1/k\),
the k-fold convolution yields the gamma distribution with unit shape parameter a.k.a. a (negative) exponential distribution, characterized by the CDF \( \tilde{F}_{1/k}^{(k)}(z) = e^z \) for \(z<0\). Since \( \tilde{F}_{1/k}^{(k)}(y_1 + y_2 + \cdots ) = e^{y_1} e^{y_2} \cdots \), Eq. (42) is fully factorized as
which is exactly the result for the block model obtained in Eq. (26).
Next, let us derive a useful general formula for \( \tilde{F}^{(k)}(z) \). Using the convolution theorem, it satisfies
where \( p^{(k-1)}_f(z) \) is the PDF of the \(k-1\) fold convolution of \( p_f(z) \). It will later be convenient to exchange the order of integrals:
In the first equality, we split the integral into two pieces to accommodate the condition \(p^{(k-1)}_f(z) =0\) for positive z. In the next equality, we have used the fact that \(\tilde{F}^{(k-1)}(0) =1\).
Now, we want to show that \(\pi _\mathrm {max}^{\mathrm {AN}}\) is maximized when the base fitness distribution is given by Eq. (106). To this end, let us introduce a small perturbation \(p_f(y) = p_{1/k}(y) + \epsilon \eta (y)\), with the properties that \(\int dy \, \eta (y) = 0\) and \(\eta (y) = 0\) for \(y > 0\). Since the probability Eq. (42) is given by the product of 2L terms, there will be 2L linear terms in \(O(\epsilon )\), i.e. \(\pi _\mathrm {max}^{\mathrm {AN}}\) changes by
The first term is straightforward to evaluate. Since \( \tilde{F}^{(k)}_{1/k}\left( \sum _{m=1}^{k} y_{(l + m) \, \text {mod} \,L} \right) \) is factorized, it readily follows that
To evaluate \(J_2\), let us rewrite it in the following way:
The argument of \(\delta \tilde{F}^{(k)}\) is the sum of the variables \(y_r\) that remain to be integrated over. To make them independent, let us introduce a delta function through the identity
or, in the Fourier representation,
where we impose the negativity of Y by inserting an additional theta function. Using the property \(\int dx \delta (x-a) f(x) = \int dx \delta (x-a) f(a)\), we may now complete the integrations over the \(y_r\) as
where we used Jordan’s lemma to evaluate the integral with respect to Z. With this result, \(J_2\) is of the relatively simple form
Next, let us evaluate \(\delta \tilde{F}^{(k)}(z)\). Using Eq. (109), we find that
where the factor k comes from the k different choices of \(p_f(y)\) in the variation of \(\tilde{F}^{(k)}\) and the fact that \(\int dy \, \eta (y) =0\) is used to eliminate the first term in the second equality. As expected, this implies that any perturbation made in the range \((-\infty ,z)\) does not change the behavior of \( \tilde{F}^{(k)}(z) \). Inserting this result into \(J_2\) gives
Now, the only technical point left is the integration with respect to Y. The integral domain is determined by two theta functions \(\varTheta (-Y)\) and \(\varTheta (y- Y)\), but since \(\eta (y)\) is assumed to be supported only on the negative real axis, the condition imposed by \(\varTheta (-Y)\) is irrelevant. Finally, using the identity
we find
Thus, the two terms in Eq. (110) perfectly cancel, which completes the proof that \( \delta \pi _\mathrm {max}^{\mathrm {AN}} = 0 \).
C General Bounds on \(\beta \) for Uniform and Regular Structures with Gaussian Fitness
In this appendix we derive some general upper and lower bounds on the coefficient \(\beta \), defined in Eq. (86), for NK structures that are both uniform and regular. For this purpose we write the probability of \(\sigma \) being a local optimum as
All fitness values of the partial landscapes \(f_r\) are i.i.d. random variables. If \(l\in B_r\), then \(f_r\left( {\downarrow _{B_r}}\varDelta _l\sigma \right) \) and \(f_r\left( {\downarrow _{B_r}}\sigma \right) \) are independent. Otherwise they are identical. Thus effectively only the sum over r with \(l\in B_r\) remains. Due to regularity there are \(\tilde{k} = \frac{Nk}{L}\) such elements for each l. For different r, the terms are always independent. The left-hand terms are also independent for different l. However the right-hand terms are correlated for different l but the same r, resulting in a non-trivial problem. Using these observations we can directly integrate out all terms \(f_r\left( {\downarrow _{B_r}}\varDelta _l\sigma \right) \) and arrive at
where \(\varPhi _{\tilde{k}}\) is the cumulative distribution function of the sum of \(\tilde{k}\) i.i.d. fitness values. Introducing the short-hand notation \(x_r = f_r\left( {\downarrow _{B_r}}\sigma \right) \), we can write the sum as a matrix product
where \({\mathbf {B}}\) is the incidence matrix of the NK structure, i.e. \({\mathbf {B}}_{lr} = b_{l,r} = 1\) if \(l\in B_r\) and 0 otherwise.
If the base fitness distribution is a standard normal distribution, then the sum of \(\tilde{k}\) i.i.d. fitness values is also normal distributed with variance \(\tilde{k}\). Consequently we can simplify as
The random vector \(y = \frac{1}{\sqrt{\tilde{k}}}{\mathbf {B}}x\) is then jointly normal distributed with zero mean and covariance matrix \(\mathbf {C} = \frac{1}{\tilde{k}}{\mathbf {B}}{\mathbf {B}}^T\). This matrix is positive-semidefinite, and therefore
We can shift the integrand by a yet to be specified vector z, which yields
Absorbing the first term in the exponent into a probability measure, we have again
where y is still jointly normal distributed with covariance matrix \(\mathbf {C}\).
Notice that the all-ones vector \(\bar{1}\) is an eigenvector of \(\mathbf {C}\) with the eigenvalue k. This can be seen through the relations \({\mathbf {B}}\bar{1} = \tilde{k}\bar{1}\) and \({\mathbf {B}}^T\bar{1} = k\bar{1}\), as there are exactly \(\tilde{k}\) ones in each row of \({\mathbf {B}}\) and k ones in each column. Thus let the \(z_l = \bar{z}\) be equal for all l. Then
1.1 C. 1 Lower Bound
By Jensen’s inequality we have
Because \(y_l\) has a symmetric distribution, the mean of \(\bar{z} y_l\) vanishes. The variance of \(y_l\) is always 1, because by regularity and uniformity the diagonal elements of \({\mathbf {B}}{\mathbf {B}}^T\) are \(\tilde{k}\), which is canceled to 1 by the pre-factor in \(\mathbf {C}\). If we then assume \(\bar{z}\) to be increasing in our limit of interest and noting that the Gaussian has a tail falling much quicker to zero than the tail of \(\ln \varPhi \) falls to \(-\infty \) at \(x\rightarrow -\infty \), we can establish the bound
which can be evaluated to
If we choose \(\bar{z} = 2\sqrt{\ln k}\), then asymptotically for large k
Note that choosing \(\bar{z} = \tilde{z} \sqrt{\ln k}\) with \(\tilde{z} < 2\) will not give a better bound, as the right-hand term in the exponent in Eq. (131) would then dominate and approach zero more slowly than \(\frac{\ln k}{k}\). This shows that \(\beta \le 2\) for uniform and regular structures. With the MF model, which is uniform and regular, we have an example of a realization of \(\beta = 2\). This shows that the bound is tight.
1.2 C. 2 Upper Bound
Starting from Eq. (128) we can find an upper bound by simply optimizing each term in the sum. The resulting sum is then an upper bound on the integrand, and because the expectation is taken with respect to a probability measure, it is bounded by the same value as well. If \(0< \frac{\bar{z}}{k} < \frac{1}{\sqrt{2\pi }}\), the optimum must be at \(y_l^\star +\bar{z} > 0\). Then by using the simplification \(\ln \varPhi (y_l+z_l) \le \varPhi (y_l+z_l) -1\), the optimum is found to be at
Inserting \(y_l^\star \) back into the simplified argument of the expectation and assuming \(\bar{z}\rightarrow \infty \) in the limit of interest we find
The left-most and right-most terms are of equal order, but the second one from the left is always of less significant order than the second from the right, as long as \(\bar{z} = o(k)\).
The second term from the right becomes equal in order to the other two if \(\bar{z} = \tilde{z}\sqrt{2\ln k}\) with a positive constant \(\tilde{z}\). This satisfies the condition \(\bar{z} = o(k)\) while still \(\bar{z} \rightarrow \infty \), as required by previous assumptions (given that \(k\rightarrow \infty \) in the limit of interest). With this we have
The bound is best for \(\tilde{z} = 1\), and so:
showing that \(\beta \ge 1\) for regular and uniform NK structures with Gaussian fitness. This bound is realized by the AN and BN structures, for example, and thus it is tight.
D Simulation of the Number of Local Maxima
As first realized in [6], the choice of a Gaussian base fitness distribution greatly simplifies the computation of \(\pi _\mathrm {max}\) through the numerical evaluation of Eq. (25), as it allows us to take advantage of an efficient algorithm. With this choice, the integrals over \(\mathbf {q}\) and \(\mathbf {y}\) can be cast into the form of multi-dimensional Gaussian integrals which may be evaluated for generally defined NK structures. Once these integrals are evaluated, we may construct a covariance matrix \(\varSigma \) that satisfies the relation
where \( \int \mathcal {D} \mathbf {u} = \frac{1}{\sqrt{(2\pi )^L \det \varSigma }} \int _{0}^{\infty } \prod _j du_j \) and the matrix elements of \(\varSigma \) are given by
Thus, the problem reduces to determining the probability that all the entries of the Gaussian random vector realized by the covariance matrix \(\varSigma \) are positive. Since finding the probability for rectangular domains of multivariate Gaussian distribution is a well-known problem, an efficient algorithm has been known for a long time [28] and its implementation has been provided by the original authors as an R library [27].
Roughly speaking, this algorithm consists of two steps: i) transforming to an integral over a unit rectangular domain such that a rejection-free Monte-Carlo simulation is possible and ii) finding an ordering of loci that minimizes the variance of the Monte-Carlo step. However, since the loci in the NK models we consider in this review are statistically identical, the second step is irrelevant in this particular case. Thus, here we describe briefly how the transformation can be achieved from Eq. (137).
Since \(\varSigma \) is positive-definite, the Cholesky decomposition ensures that there exists a triangular matrix C such that \(\varSigma = C C^T\). The substitution \(\mathbf {u} = C \mathbf {x}\) then diagonalizes the integral at the cost of nontrivial integral domain,
where the domain \(\mathcal {R} = (a_1, \infty ) \times (a_2, \infty ) \times \cdots (a_L, \infty )\) and \(a_j = - \sum _{l=1}^{j-1} x_l C_{j l } / C_{j j } \). Next, performing the canonical transformation to a standard uniform distribution \( z_i = \varPhi (x_i) \), where \(\varPhi (x)\) is the CDF of the standard Gaussian distribution, the integral becomes
where \( \mathcal {R'} = (d_1, 1) \times (d_2, 1) \times \cdots (d_L, 1) \) and \(d_j = \varPhi ( - \sum _{l=1}^{j-1} \varPhi ^{-1}(z_l) C_{j l} / C_{j j })\). Finally, another linear transformation \(z_j = d_j + w_j (1- d_j)\) brings the integral into the form
where \(\mathcal {R''} = (0,1)^{L}\). Now that the integral domain is the L-dimensional unit rectangle, this integral can be evaluated by sampling L random variables from a uniform distribution on (0, 1) and subsequently estimating the weight factors \(d_j\).
Rights and permissions
About this article
Cite this article
Hwang, S., Schmiegelt, B., Ferretti, L. et al. Universality Classes of Interaction Structures for NK Fitness Landscapes. J Stat Phys 172, 226–278 (2018). https://doi.org/10.1007/s10955-018-1979-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-018-1979-z