Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 10.3 KB

sweepshift.md

File metadata and controls

27 lines (14 loc) · 10.3 KB

When does coevolution occur via sweeps vs shifts?

Recent studies of polygenic adaptation via selection on a quantitative trait have investigated the conditions under which adaptation occurs via sweeps of beneficial alleles at a few loci (sweeps from hereon) versus subtle shifts of allele frequencies at many loci (shifts from hereon). The mode of adaptation used depends on both the genetic architecture of the trait and the model of selection. One example of the dependence of the mode of adaptation on model of selection was found by so and so et al. They found adaptation tended to occured via sweeps after the trait optima experienced a sudden change when the length of the sudden change is large relative to the amount of standing genetic variation in the population, and via shifts when the length of the sudden change is small. An example of the mode of adaptation depending on genetic architecture was found by Gostch and Burger. These authors found sweeps tended to occur more when the effects of each locus on the trait were heterogeneous across the genome, and that adaptation via shifts tended to occur when these effects were homogeneous across the genome.

Here I build on this body of work to explore the conditions under which co-adaptation between interacting species tends occur via sweeps or shifts. In addition, I hypothesize the mode of adaptation is related to patterns of local adaptation, particularly in host-parasite and predator-prey systems. For example, when a host-parasite interaction is mediated by a trait matching/mis-matching mechanism and initial trait distributions of the two species greatly overlap, we might expect the host to adapt via a sweep when a large effect mutation occurs. Otherwise, host adaptation would occur gradually via subtle shifts based on standing variation and input of some small effect loci. In the case of gradual adaptation, the parasite would be more likely to track the host, given the parasite maintains sufficient genetic variation. In contrast, when host and parasite trait distributions closely match, we would expect large effect mutations in the parasite to be selected against which then promotes evolution via shifts. This pattern of parasite adaptation may not be static, however, as the host might jump away from the parasite in trait space following a sweep for some novel large effect mutation which would then promote selection for large effect mutations in the parasite that, after a sweep, would move the parasite trait distribution back into overlapping with the host trait distribution. Once the parasite trait distribution arrives within the general vacinity of the host trait distribution, large effect mutations would likely be selected against and instead small effect mutations would be more likely to recieve positive selection, leading back to parasite adaptation via shifts. Hence, co-adaptation between interactring species may occur either via shifts, sweeps, or modes lying on the continuum between these two extremes depending on the genomic architecture of the traits mediating the interaction and possibly alternating between modes as described above. In this note, I am to make some first steps to understand the conditions under which different modes of adaptation occur, and conditions under which adaptation alternates between different modes, hopefully with enough resolution to compute the relative frequencies at which each mode occurs when they alternate.

Spatial structure is likely to be a major driver in determining the tempo and mode of adaptation, but for simplicity I will start by investigating a non-spatial model. This allows us to start by focusing on the effects of genetic architecture. For both species we assume diploid genomes and a single trait with a value that is determined additively from freely recombining causal loci that accumulate via an infinite-sites model of mutation. The sum of the effects across causal loci for an individual is referred to as that individuals breeding value. To model the expressed trait value (which determines fitness), we follow traditional quantitative genetics by setting the expressed trait equal to the breeding value plus a mean-zero normal random variable that has been referred to as developmental noise (and the variance of which is the developmental variance). Hence, norrow-sense heritability of the trait is equal to the variance of breeding values (i.e., the additive genetic variance $G$) divided by the variance of expressed traits across all individuals in the population (i.e., the phenotypic variance $P$). When the population is large, $P\approx G+\varepsilon_S$, where $\varepsilon_Q$ is the developmental variance of species $Q=H,P$, where $H$ and $P$ correspond to the host and parasite respectively. We assume a genome-wide constant rate of mutation that can differ between the two species (denoted $\mu_H,\mu_P$ for the host and parasite respectively). When a mutation occurs, its additive effect on the trait is drawn from a normal (better to be Laplacian or stable to get those large effect outliers, but maybe a mixed model where one normal distribution has larger variance than the other could be used and a probability determining which distribution is drawn from) distribution with mean zero and variance that can differ between the two species (denoted $\kappa_H,\kappa_P$).

Initial conditions are obtained by assuming each species have been independently evolving according to a Moran model for a long period of time up until time $t=0$. We write $S_0^H,S_0^P$ for the initial number of polymorphic loci encoding the trait in the host and parasite respectively. We write $M_t^H,M_t^P$ for the total number of mutations (not necessarily at polymorphic loci) that have accumulated in the host and parasite respectively by time $t$, including the initial number of polymorphic loci so that $M_\tau^Q-S_0^Q$ is the number of mutations that have occured in species $Q$ between times $t=0$ and $t=\tau$. The symbol $S_t^Q$ denotes the number of polymorphic loci in species $Q$ at time $t$. Initial trait variances are assumed to be random variables with distributions determined by a neutral drift-mutation balance with species $Q$ having effective size $N_e^Q$ that is assumed to be constant in time. Since mean trait values have no drift-mutation equilibrium, we arbitrarily set them each to $\bar z_0^H,\bar z_0^P$ for the host and parasite respectively.

Following our assumptions, we have $\mathbb ES_0^Q=\theta_Q(\gamma+\log2N_e^Q)$ and $\mathbb VS_0^Q=ES_0^Q+\theta_Q^2\pi^2/6$, where $\theta_Q=2N_e^Q\mu_Q$ and $\gamma$ is Euler's constant. We can also compute the distributions of initial trait variance in each species using results on the infinite-sites model. Denoting $\xi_i^Q$ the number of initial polymorphic loci in species $Q$ with frequency $i/2N_e^Q$, we have $\mathbb E\xi_i^Q=\theta_Q/i, \mathbb V\xi_i^Q=\theta_Q^2\sigma_{ii}^2+\theta_Q/i$, and $\mathbb C\xi_i^Q\xi_j^Q=\theta_Q^2\sigma_{ij}$ where $\sigma_{ii},\sigma_{ij}$ are given in Fu (1995). We assume phenotypic effects of ancestral states at polymorphic loci are zero and write the effect of a mutation in species $Q$ at locus $\ell=1,\dots,S_0^Q$ as $\alpha_\ell^Q\sim\mathcal N(0,\kappa_Q)$. Denoting $I_{\ell,1}^{Q,k},I_{\ell,2}^{Q,k}$ the indicator variables representing whether a mutation is present at the $\ell$th locus in haploid genotypes one and two respectively of the $k$th individual in species $Q$, the trait value of this individual is given by

$$z_k^Q=\sum_{\ell=1}^{S_0^Q}\alpha_\ell^Q(I_{\ell,1}^{Q,k}+I_{\ell,2}^{Q,k}).$$

Hence, the initial mean trait of species $Q$ can be expressed as

$$\bar z_0^Q=\mathcal A_Q+2\sum_{\ell=1}^{S_0^Q}\alpha_\ell^Qp_\ell^Q(0)$$

where $\mathcal A_Q$ is the ancestral trait value for species $Q$ (which can be set to any real number) and $p_\ell^Q(0)=(\sum_{k=1}^{N_e^Q}I_{\ell,1}^{Q,k}+I_{\ell,2}^{Q,k})/2N_e^Q$ is the initial frequency of the mutant allele occuring at locus $\ell$. In general we write $p_\ell^Q(t)$ for the allele frequency at locus $\ell=1,\dots,M_t^Q$ at time $t\geq0$, which may be zero or one since these loci need not be polymorphic for $t>0$.

Since we assume freely recombining loci, the order the loci are labelled is arbitrary (i.e., does not reflect location on a chromosome). Then order the loci in increasing order of mutant allele frequency so the first $\xi_1^Q$ loci in species $Q$ have frequency $1/2N_e^Q$ (where recall $\xi_i^Q$ is the number of loci in species $Q$ with mutant allele frequency $i/2N_e^Q$), and loci $\xi_1+1,\dots,\xi_1+\xi_2$ have frequency $2/2N_e^Q$, and so on so that in general loci $1+\sum_{j=1}^{i-1}\xi_j,\dots\sum_{j=1}^i\xi_j$ have mutant allele frequency $i/2N_e^Q$

The above results on $\mathbb ES_0^Q$ and $\mathbb E\xi_i^Q$ do not hold since we assume freely recombining loci. So we need a different approach...

another approach

Focsuing on a single species, assume the process has been running for an infinite amount of time. So there are a (countably) infinite number of mutations that have occured. Order the mutations by the time they appeared starting with the most recent. Then, labelling the time into the past in which the $\ell$th mutation occured by $T_m^\ell$, we have $T_m^1<T_m^2<\dots$. Setting $T_m^0=0$, the definition of our model implies $T_m^\ell-T_m^{\ell-1}\sim Exp(\mu)$ for each mutation $\ell=1,2,\dots$. In particular, $T_m^\ell=\sum_{j=1}^\ell T_m^j-T_m^{j-1}$ so that $T_m^\ell$ is a sum of $\ell$ iid Exponential random variables. This implies $T_m^\ell\sim Gamma(\ell,\mu)$. Each mutation corresponds to an independently evolving locus, which has its own independ coalescent process associated with it. Writing $T_{MRCA}^\ell$ as the time to the most recent common ancestor for the $\ell$th locus and $U_i^\ell\sim Exp(ic2)$ for the time to coalescence when there are $i$ lineages on the $\ell$th locus, we have $T_{MRCA}=\sum_{i=2}^{2N_e}U_i^\ell$. Thus, a necessary condition for the $\ell$th locus to be polymorphic at the present time is $T_m^\ell<T_{MRCA}^\ell$. We denote the probability of this event by $f(\ell)$. Given this condition, there will be $k\in{2,\dots,2N_e}$ lineages of the coalescent at time $T_m^\ell$ with probability $g(k,\ell)$. The mutation lands on one lineage (including those that don't appear in the coalescent) at random, so that the probability the $\ell$th mutation appears at the present, given all else, is $k/2N$. Hence, the probability of polymorphism at locus $\ell$ is $f(\ell)\sum_{k=2}^{2N_e}g(k,\ell)k/2N_e$.