|
201 | 201 | <p class="p3"><span class="s1">Generates a new random nucleotide sequence with </span><span class="s2">length</span><span class="s1"> bases.<span class="Apple-converted-space"> </span>The four nucleotides ACGT are equally probable if </span><span class="s2">basis</span><span class="s1"> is </span><span class="s2">NULL</span><span class="s1"> (the default); otherwise, </span><span class="s2">basis</span><span class="s1"> may be a 4-element </span><span class="s2">integer</span><span class="s1"> or </span><span class="s2">float</span><span class="s1"> vector providing relative fractions for A, C, G, and T respectively (these need not sum to </span><span class="s2">1.0</span><span class="s1">, as they will be normalized).<span class="Apple-converted-space"> </span>More complex generative models such as Markov processes are not supported intrinsically in SLiM at this time, but arbitrary generated sequences may always be loaded from files on disk.</span></p> |
202 | 202 | <p class="p3"><span class="s1">The </span><span class="s2">format</span><span class="s1"> parameter controls the format of the returned sequence.<span class="Apple-converted-space"> </span>It may be </span><span class="s2">"string"</span><span class="s1"> to obtain the generated sequence as a singleton </span><span class="s2">string</span><span class="s1"> (e.g., </span><span class="s2">"TATA"</span><span class="s1">), </span><span class="s2">"char"</span><span class="s1"> to obtain it as a </span><span class="s2">string</span><span class="s1"> vector of single characters (e.g., </span><span class="s2">"T"</span><span class="s1">, </span><span class="s2">"A"</span><span class="s1">, </span><span class="s2">"T"</span><span class="s1">, </span><span class="s2">"A"</span><span class="s1">), or </span><span class="s2">"integer"</span><span class="s1"> to obtain it as an </span><span class="s2">integer</span><span class="s1"> vector (e.g., </span><span class="s2">3</span><span class="s1">, </span><span class="s2">0</span><span class="s1">, </span><span class="s2">3, 0</span><span class="s1">), using SLiM’s standard code of A=</span><span class="s2">0</span><span class="s1">, C=</span><span class="s2">1</span><span class="s1">, G=</span><span class="s2">2</span><span class="s1">, T=</span><span class="s2">3</span><span class="s1">.<span class="Apple-converted-space"> </span>For passing directly to </span><span class="s2">initializeAncestralNucleotides()</span><span class="s1">, format </span><span class="s2">"string"</span><span class="s1"> (a singleton string) will certainly be the most memory-efficient, and probably also the fastest.<span class="Apple-converted-space"> </span>Memory efficiency can be a significant consideration; the nucleotide sequence for a chromosome of length 10</span><span class="s11"><sup>9</sup></span><span class="s1"> will occupy approximately 1 GB of memory when stored as a singleton string (with one byte per nucleotide), and much more if stored in the other formats.<span class="Apple-converted-space"> </span>However, the other formats can be easier to work with in Eidos, and so may be preferable for relatively short chromosomes if you are manipulating the generated sequence.</span></p> |
203 | 203 | <p class="p1"><b>3.3.<span class="Apple-converted-space"> </span>Population genetics utilities</b></p> |
| 204 | +<p class="p4">(float$)calcDxy(object<Haplosome> haplosomes1, object<Haplosome> haplosomes2, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL], [logical$ normalize = F])</p> |
| 205 | +<p class="p3">Calculates the estimated <i>D</i><span class="s4"><sub>xy</sub></span> between two <span class="s3">Haplosome</span> vectors for the set of mutations given in <span class="s3">muts</span>.<span class="Apple-converted-space"> </span><i>D</i><span class="s4"><sub>xy</sub></span> is the expected number of differences between two sequences, typically drawn from two different subpopulations whose haplosomes are given in <span class="s3">haplosomes1</span> and <span class="s3">haplosomes2</span>.<span class="Apple-converted-space"> </span>It is therefore a metric of genetic divergence, comparable in some respects to <i>F</i><span class="s4"><sub>ST</sub></span>; see Cruickshank and Hahn (2014, Molecular Ecology) for a discussion of <i>F</i><span class="s4"><sub>ST</sub></span> versus <i>D</i><span class="s4"><sub>xy</sub></span>.<span class="Apple-converted-space"> </span>This method implements <i>D</i><span class="s4"><sub>xy</sub></span> as defined by Nei (1987) in Molecular Evolutionary Genomics (eq. 10.20), with optimizations for computational efficiency based upon an assumption that that multiallelic loci are rare (this is compatible with the infinite-sites model).</p> |
| 206 | +<p class="p3">The calculation can be narrowed to apply to only a window – a subrange of the full haplosomes – by passing the interval bounds [<span class="s3">start</span>, <span class="s3">end</span>] for the desired window.<span class="Apple-converted-space"> </span>In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.<span class="Apple-converted-space"> </span>The default behavior, with <span class="s3">start</span> and <span class="s3">end</span> of <span class="s3">NULL</span>, provides the haplosome-wide <i>D</i><span class="s4"><sub>xy</sub></span>.</p> |
| 207 | +<p class="p3">If <span class="s3">normalize</span> is <span class="s3">F</span> (the default), the returned <span class="s3">float</span> value is simply the expected number of differences, following Nei.<span class="Apple-converted-space"> </span>Often, however, it will be desirable to normalize that value by dividing by the length of the sequence considered, yielding the expected number of differences <i>per site</i>, a metric that then does not depend upon the sequence length; passing <span class="s3">normalize=T</span> will return that normalized value, and that is probably what most users of this function will want.</p> |
| 208 | +<p class="p3">The implementation of <span class="s3">calcDxy()</span>, viewable with <span class="s3">functionSource()</span>, treats every mutation in <span class="s3">muts</span> as independent in its calculations (similar to <span class="s3">calcPi()</span>); in other words, if mutations are stacked, the <i>D</i><span class="s4"><sub>xy</sub></span> value calculated is <i>by mutation</i>, not <i>by site</i>.<span class="Apple-converted-space"> </span>Similarly, if multiple <span class="s3">Mutation</span> objects exist in different haplosomes at the same site (whether representing different genetic states, or multiple mutational lineages for the same genetic state), each <span class="s3">Mutation</span> object is treated separately for purposes of the calculation, just as if they were at different sites.<span class="Apple-converted-space"> </span>One could regard these choices as embodying an infinite-sites interpretation of the segregating mutations.<span class="Apple-converted-space"> </span>In most biologically realistic models, such genetic states will be quite rare, and so the impact of these choices will be negligible; however, in some models these distinctions may be important.<span class="Apple-converted-space"> </span>See <span class="s3">calcPairHeterozygosity()</span> for further discussion.</p> |
| 209 | +<p class="p3">All haplosomes and mutations must be associated with the same chromosome.<span class="Apple-converted-space"> </span>If <span class="s3">muts</span> is <span class="s3">NULL</span> (the default), all mutations in the population associated with the same chromosome as the given haplosomes will be used.</p> |
| 210 | +<p class="p3">This function was written by Vitor Sudbrack (currently affiliated with University of Lausanne).</p> |
204 | 211 | <p class="p4">(float$)calcFST(object<Haplosome> haplosomes1, object<Haplosome> haplosomes2, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])</p> |
205 | 212 | <p class="p3">Calculates the <i>F</i><span class="s4"><sub>ST</sub></span> between two <span class="s3">Haplosome</span> vectors – typically, but not necessarily, the haplosomes that constitute two different subpopulations (which we will assume for the purposes of this discussion).<span class="Apple-converted-space"> </span>In general, higher <i>F</i><span class="s4"><sub>ST</sub></span> indicates greater genetic divergence between subpopulations.</p> |
206 | 213 | <p class="p3">The calculation is done using only the mutations in <span class="s3">muts</span>; if <span class="s3">muts</span> is <span class="s3">NULL</span>, all mutations are used.<span class="Apple-converted-space"> </span>The <span class="s3">muts</span> parameter can therefore be used to calculate the <i>F</i><span class="s4"><sub>ST</sub></span> only for a particular mutation type (by passing only mutations of that type).</p> |
|
221 | 228 | <p class="p11"><i>B</i> = sum(<i>qs</i>) − sum(<i>q</i><span class="s12"><sup>2</sup></span><i>s</i>) − 2sum(<i>q</i>(1−<i>q</i>)<i>sh</i>)</p> |
222 | 229 | <p class="p3">where <i>q</i> is the frequency of a given deleterious allele, <i>s</i> is the absolute value of the selection coefficient, and <i>h</i> is its dominance coefficient.<span class="Apple-converted-space"> </span>Note that the implementation, viewable with <span class="s3">functionSource()</span>, sets a maximum |<i>s</i>| of <span class="s3">1.0</span> (i.e., a lethal allele); |<i>s</i>| can sometimes be greater than <span class="s3">1.0</span> when <i>s</i> is drawn from a distribution, but in practice an allele with <i>s</i> < <span class="s3">-1.0</span> has the same lethal effect as when <i>s</i> = <span class="s3">-1.0</span>.<span class="Apple-converted-space"> </span>Also note that this implementation will not work when the model changes the dominance coefficients of mutations using <span class="s3">mutationEffect()</span> callbacks, since it relies on the <span class="s3">dominanceCoeff</span> property of <span class="s3">MutationType</span>. Finally, note that, to estimate the diploid number of lethal equivalents (2<i>B</i>), the result from this function can simply be multiplied by two.</p> |
223 | 230 | <p class="p3">This function was contributed by Chris Kyriazis; thanks, Chris!</p> |
224 | | -<p class="p4">(float)calcLD_D(object<Mutation>$ mut1, object<Mutation> mut2, [No<Haplosome> haplosomes = NULL])</p> |
| 231 | +<p class="p4">(float)calcLD_D(object<Mutation>$ mut1, [No<Mutation> mut2 = NULL], [No<Haplosome> haplosomes = NULL])</p> |
225 | 232 | <p class="p3">Calculates the linkage disequilibrium (LD) coefficient <i>D</i> between a focal mutation <span class="s3">mut1</span> and one or more mutations in <span class="s3">mut2</span>, evaluated across a set of haplosomes given by <span class="s3">haplosomes</span>.<span class="Apple-converted-space"> </span>The result is a <span class="s3">float</span> vector that matches the size and order of <span class="s3">mut2</span>.<span class="Apple-converted-space"> </span>The implementation of this function, viewable with <span class="s3">functionSource()</span>, calculates <i>D</i> as defined by Hill and Robertson (1968, p. 226).<span class="Apple-converted-space"> </span>The coefficient <i>D</i> is within [−<i>p</i>(1−<i>p</i>), <i>p</i>(1−<i>p</i>)], where <i>p</i> is the frequency of the more common mutation (that is, <i>p</i> = max(<i>f</i><span class="s4"><sub>1</sub></span>, <i>f</i><span class="s4"><sub>2</sub></span>) where <i>f</i><span class="s4"><sub>1</sub></span> and <i>f</i><span class="s4"><sub>2</sub></span> are the frequencies of the two mutations for which <i>D</i> is being calculated); for the normalized LD metric <i>r</i><span class="s4"><sup>2</sup></span>, which is within [0, 1], see <span class="s3">calcLD_Rsquared()</span>.<span class="Apple-converted-space"> </span>Departures of <i>D</i> from zero indicate LD; more specifically, <i>D</i> > 0 indicates that the mutations occur together more often than expected by chance (positive linkage), whereas <i>D</i> < 0 indicates they occur together less often than expected by chance (negative linkage).</p> |
226 | 233 | <p class="p3">All mutations in <span class="s3">mut2</span> must be associated with the same chromosome as <span class="s3">mut1</span>; this function does not currently calculate LD between mutations associated with different chromosomes.<span class="Apple-converted-space"> </span>If <span class="s3">mut2</span> is <span class="s3">NULL</span> (the default), all such mutations in the population (including <span class="s3">mut1</span> itself) will be used.<span class="Apple-converted-space"> </span>Similarly, all haplosomes must be associated with the same chromosome as <span class="s3">mut1</span>.<span class="Apple-converted-space"> </span>If the <span class="s3">haplosomes</span> parameter is <span class="s3">NULL</span> (the default), all such haplosomes in the population will be used.</p> |
227 | 234 | <p class="p3">This function was written by Vitor Sudbrack (currently affiliated with University of Lausanne).</p> |
228 | | -<p class="p4">(float)calcLD_Rsquared(object<Mutation>$ mut1, object<Mutation> mut2, [No<Haplosome> haplosomes = NULL], [logical$ squared = T])</p> |
| 235 | +<p class="p4">(float)calcLD_Rsquared(object<Mutation>$ mut1, [No<Mutation> mut2 = NULL], [No<Haplosome> haplosomes = NULL], [logical$ squared = T])</p> |
229 | 236 | <p class="p3">Calculates the linkage disequilibrium (LD) squared correlation coefficient <i>r</i><span class="s4"><sup>2</sup></span> between a focal mutation <span class="s3">mut1</span> and one or more mutations in <span class="s3">mut2</span>, evaluated across a set of haplosomes given by <span class="s3">haplosomes</span>.<span class="Apple-converted-space"> </span>The result is a <span class="s3">float</span> vector that matches the size and order of <span class="s3">mut2</span>.<span class="Apple-converted-space"> </span>The implementation of this function, viewable with <span class="s3">functionSource()</span>, calculates <i>r</i><span class="s4"><sup>2</sup></span> as defined by Hill and Robertson (1968, p. 227).<span class="Apple-converted-space"> </span>The squared correlation coefficient <i>r</i><span class="s4"><sup>2</sup></span> is a normalized measure of LD within [0, 1] (for the unnormalized LD coefficient <i>D</i>, see <span class="s3">calcLD_D()</span>).<span class="Apple-converted-space"> </span>When <i>r</i><span class="s4"><sup>2</sup></span> = 0, there is no statistical association between the mutations; they co-occur as expected by chance.<span class="Apple-converted-space"> </span>A value of <i>r</i><span class="s4"><sup>2</sup></span> = 1 indicates complete correlation: the mutations either always appear together or never appear together, depending on the sign of the underlying correlation coefficient <i>r</i>.<span class="Apple-converted-space"> </span>To obtain the raw (signed) <i>r</i> value instead of <i>r</i><span class="s4"><sup>2</sup></span>, you can pass <span class="s3">squared=F</span> instead of the default of <span class="s3">T</span>.</p> |
230 | 237 | <p class="p3">All mutations in <span class="s3">mut2</span> must be associated with the same chromosome as <span class="s3">mut1</span>; this function does not currently calculate LD between mutations associated with different chromosomes.<span class="Apple-converted-space"> </span>If <span class="s3">mut2</span> is <span class="s3">NULL</span> (the default), all such mutations in the population (including <span class="s3">mut1</span> itself) will be used.<span class="Apple-converted-space"> </span>Similarly, all haplosomes must be associated with the same chromosome as <span class="s3">mut1</span>.<span class="Apple-converted-space"> </span>If the <span class="s3">haplosomes</span> parameter is <span class="s3">NULL</span> (the default), all such haplosomes in the population will be used.</p> |
231 | 238 | <p class="p3">This function was written by Vitor Sudbrack (currently affiliated with University of Lausanne).</p> |
|
0 commit comments