Skip to content

Commit 043dd54

Browse files
committed
fix #527, add calcLD_D() and calcLD_Rsquared()
1 parent 1b7c724 commit 043dd54

4 files changed

Lines changed: 281 additions & 7 deletions

File tree

QtSLiM/help/SLiMHelpFunctions.html

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<meta http-equiv="Content-Style-Type" content="text/css">
66
<title></title>
77
<meta name="Generator" content="Cocoa HTML Writer">
8-
<meta name="CocoaVersion" content="2299.77">
8+
<meta name="CocoaVersion" content="2487.7">
99
<style type="text/css">
1010
p.p1 {margin: 18.0px 0.0px 3.0px 0.0px; font: 11.0px Optima}
1111
p.p2 {margin: 9.0px 0.0px 3.0px 36.0px; text-indent: -22.3px; font: 9.0px Menlo}
@@ -221,6 +221,14 @@
221221
<p class="p11"><i>B</i> = sum(<i>qs</i>) − sum(<i>q</i><span class="s12"><sup>2</sup></span><i>s</i>) − 2sum(<i>q</i>(1−<i>q</i>)<i>sh</i>)</p>
222222
<p class="p3">where <i>q</i> is the frequency of a given deleterious allele, <i>s</i> is the absolute value of the selection coefficient, and <i>h</i> is its dominance coefficient.<span class="Apple-converted-space">  </span>Note that the implementation, viewable with <span class="s3">functionSource()</span>, sets a maximum |<i>s</i>| of <span class="s3">1.0</span> (i.e., a lethal allele); |<i>s</i>| can sometimes be greater than <span class="s3">1.0</span> when <i>s</i> is drawn from a distribution, but in practice an allele with <i>s</i> &lt; <span class="s3">-1.0</span> has the same lethal effect as when <i>s</i> = <span class="s3">-1.0</span>.<span class="Apple-converted-space">  </span>Also note that this implementation will not work when the model changes the dominance coefficients of mutations using <span class="s3">mutationEffect()</span> callbacks, since it relies on the <span class="s3">dominanceCoeff</span> property of <span class="s3">MutationType</span>. Finally, note that, to estimate the diploid number of lethal equivalents (2<i>B</i>), the result from this function can simply be multiplied by two.</p>
223223
<p class="p3">This function was contributed by Chris Kyriazis; thanks, Chris!</p>
224+
<p class="p4">(float)calcLD_D(object&lt;Mutation&gt;$ mut1, object&lt;Mutation&gt; mut2, [No&lt;Haplosome&gt; haplosomes = NULL])</p>
225+
<p class="p3">Calculates the linkage disequilibrium (LD) coefficient <i>D</i> between a focal mutation <span class="s3">mut1</span> and one or more mutations in <span class="s3">mut2</span>, evaluated across a set of haplosomes given by <span class="s3">haplosomes</span>.<span class="Apple-converted-space">  </span>The result is a <span class="s3">float</span> vector that matches the size and order of <span class="s3">mut2</span>.<span class="Apple-converted-space">  </span>The implementation of this function, viewable with <span class="s3">functionSource()</span>, calculates <i>D</i> as defined by Hill and Robertson (1968, p. 226).<span class="Apple-converted-space">  </span>The coefficient <i>D</i> is within [−<i>p</i>(1−<i>p</i>), <i>p</i>(1−<i>p</i>)], where <i>p</i> is the frequency of the more common mutation (that is, <i>p</i> = max(<i>f</i><span class="s4"><sub>1</sub></span>, <i>f</i><span class="s4"><sub>2</sub></span>) where <i>f</i><span class="s4"><sub>1</sub></span> and <i>f</i><span class="s4"><sub>2</sub></span> are the frequencies of the two mutations for which <i>D</i> is being calculated); for the normalized LD metric <i>r</i><span class="s4"><sup>2</sup></span>, which is within [0, 1], see <span class="s3">calcLD_Rsquared()</span>.<span class="Apple-converted-space">  </span>Departures of <i>D</i> from zero indicate LD; more specifically, <i>D</i> &gt; 0 indicates that the mutations occur together more often than expected by chance (positive linkage), whereas <i>D</i> &lt; 0 indicates they occur together less often than expected by chance (negative linkage).</p>
226+
<p class="p3">All mutations in <span class="s3">mut2</span> must be associated with the same chromosome as <span class="s3">mut1</span>; this function does not currently calculate LD between mutations associated with different chromosomes.<span class="Apple-converted-space">  </span>If <span class="s3">mut2</span> is <span class="s3">NULL</span> (the default), all such mutations in the population (including <span class="s3">mut1</span> itself) will be used.<span class="Apple-converted-space">  </span>Similarly, all haplosomes must be associated with the same chromosome as <span class="s3">mut1</span>.<span class="Apple-converted-space">  </span>If the <span class="s3">haplosomes</span> parameter is <span class="s3">NULL</span> (the default), all such haplosomes in the population will be used.</p>
227+
<p class="p3">This function was written by Vitor Sudbrack (currently affiliated with University of Lausanne).</p>
228+
<p class="p4">(float)calcLD_Rsquared(object&lt;Mutation&gt;$ mut1, object&lt;Mutation&gt; mut2, [No&lt;Haplosome&gt; haplosomes = NULL], [logical$ squared = T])</p>
229+
<p class="p3">Calculates the linkage disequilibrium (LD) squared correlation coefficient <i>r</i><span class="s4"><sup>2</sup></span> between a focal mutation <span class="s3">mut1</span> and one or more mutations in <span class="s3">mut2</span>, evaluated across a set of haplosomes given by <span class="s3">haplosomes</span>.<span class="Apple-converted-space">  </span>The result is a <span class="s3">float</span> vector that matches the size and order of <span class="s3">mut2</span>.<span class="Apple-converted-space">  </span>The implementation of this function, viewable with <span class="s3">functionSource()</span>, calculates <i>r</i><span class="s4"><sup>2</sup></span> as defined by Hill and Robertson (1968, p. 227).<span class="Apple-converted-space">  </span>The squared correlation coefficient <i>r</i><span class="s4"><sup>2</sup></span> is a normalized measure of LD within [0, 1] (for the unnormalized LD coefficient <i>D</i>, see <span class="s3">calcLD_D()</span>).<span class="Apple-converted-space">  </span>When <i>r</i><span class="s4"><sup>2</sup></span> = 0, there is no statistical association between the mutations; they co-occur as expected by chance.<span class="Apple-converted-space">  </span>A value of <i>r</i><span class="s4"><sup>2</sup></span> = 1 indicates complete correlation: the mutations either always appear together or never appear together, depending on the sign of the underlying correlation coefficient <i>r</i>.<span class="Apple-converted-space">  </span>To obtain the raw (signed) <i>r</i> value instead of <i>r</i><span class="s4"><sup>2</sup></span>, you can pass <span class="s3">squared=F</span> instead of the default of <span class="s3">T</span>.</p>
230+
<p class="p3">All mutations in <span class="s3">mut2</span> must be associated with the same chromosome as <span class="s3">mut1</span>; this function does not currently calculate LD between mutations associated with different chromosomes.<span class="Apple-converted-space">  </span>If <span class="s3">mut2</span> is <span class="s3">NULL</span> (the default), all such mutations in the population (including <span class="s3">mut1</span> itself) will be used.<span class="Apple-converted-space">  </span>Similarly, all haplosomes must be associated with the same chromosome as <span class="s3">mut1</span>.<span class="Apple-converted-space">  </span>If the <span class="s3">haplosomes</span> parameter is <span class="s3">NULL</span> (the default), all such haplosomes in the population will be used.</p>
231+
<p class="p3">This function was written by Vitor Sudbrack (currently affiliated with University of Lausanne).</p>
224232
<p class="p4">(float$)calcMeanFroh(object&lt;Individual&gt; individuals, [integer$ minimumLength = 1000000], [Niso&lt;Chromosome&gt;$ chromosome = NULL])</p>
225233
<p class="p3">Calculates the mean value of the <i>F</i><span class="s4"><sub>roh</sub></span> statistic across the individuals passed in <span class="s3">individuals</span>.<span class="Apple-converted-space">  </span>This statistic is a measure of individual autozygosity, likely resulting from inbreeding, and is calculated based upon “runs of homozygosity”, or ROH, in the genome of an individual.<span class="Apple-converted-space">  </span>Broadly speaking, <i>F</i><span class="s4"><sub>roh</sub></span> is the proportion of an individual’s genome that is spanned by ROH longer than a given threshold length.<span class="Apple-converted-space">  </span>However, it should be noted that there are many different ways of calculating <i>F</i><span class="s4"><sub>roh</sub></span>, producing different results.<span class="Apple-converted-space">  </span>For example, the threshold length might be a given constant, or might be determined statistically from the characteristics of the population.<span class="Apple-converted-space">  </span>Furthermore, some heterozygous sites might be discarded (to compensate for genotyping errors), a minimum SNP density might be required within a sliding window for an ROH to be diagnosed, and so forth – it can get quite complex, as seen in the software PLINK (Purcell et al., 2007) and GARLIC (Szpiech, Blant and Pemberton, 2017).<span class="Apple-converted-space">  </span>The method used by <span class="s3">calcMeanFroh()</span> is the simplest possible method, assessing ROH for each individual directly from the simulated mutations without filtering or modification, and applying a given constant threshold length.<span class="Apple-converted-space">  </span>If a more sophisticated <i>F</i><span class="s4"><sub>roh</sub></span> algorithm is desired, one could modify the implementation of <span class="s3">calcMeanFroh()</span>, which is viewable with <span class="s3">functionSource()</span>, or one could output VCF data from SLiM and analyze it with other tools, perhaps calling out from the running SLiM script with <span class="s3">system()</span>.</p>
226234
<p class="p3">The threshold ROH length used by <span class="s3">calcMeanFroh()</span> is supplied by the parameter <span class="s3">minimumLength</span>.<span class="Apple-converted-space">  </span>It defaults to <span class="s3">1e6</span>, or 1 Mbp, since that is a length commonly used in the literature, but can be adjusted as desired.</p>

SLiMgui/SLiMHelpFunctions.rtf

Lines changed: 155 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{\rtf1\ansi\ansicpg1252\cocoartf2709
1+
{\rtf1\ansi\ansicpg1252\cocoartf2761
22
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Optima-Bold;\f1\fnil\fcharset0 Menlo-Regular;\f2\fswiss\fcharset0 Optima-Regular;
33
\f3\fswiss\fcharset0 Optima-Italic;\f4\froman\fcharset0 TimesNewRomanPSMT;\f5\fnil\fcharset0 Menlo-Bold;
44
\f6\fnil\fcharset0 AppleColorEmoji;\f7\froman\fcharset0 TimesNewRomanPS-ItalicMT;}
@@ -931,7 +931,8 @@ This function is written in Eidos, and its source code can be viewed with
931931
\f2\fs20 , in which case the ellipsis should supply a
932932
\f1\fs18 string$
933933
\f2\fs20 Eidos script parameter. The global symbol for the new mutation type is immediately available; the return value also provides the new object.\
934-
\expnd0\expndtw0\kerning0
934+
\pard\pardeftab543\li547\ri720\sb60\sa60\partightenfactor0
935+
\cf2 \expnd0\expndtw0\kerning0
935936
Note that by default in WF models, all mutations of a given mutation type will be converted into
936937
\f1\fs18 Substitution
937938
\f2\fs20 objects when they reach fixation, for efficiency reasons. If you need to disable this conversion, to keep mutations of a given type active in the simulation even after they have fixed, you can do so by setting the
@@ -1155,9 +1156,8 @@ There is no way to disable sex once it has been enabled; if you don\'92t want to
11551156
\f2\fs20 property of
11561157
\f1\fs18 Individual
11571158
\f2\fs20 , for example), and manage the consequences of that in your script yourself, in terms of which individuals can mate with which, and exactly how the offspring is produced.\
1158-
\pard\pardeftab397\li547\ri720\sb60\sa60\partightenfactor0
11591159

1160-
\f0\b \cf2 The
1160+
\f0\b The
11611161
\f5\fs18 xDominanceCoeff
11621162
\f0\fs20 parameter has been deprecated and removed.
11631163
\f2\b0 In SLiM 5 and later, use the
@@ -1327,6 +1327,7 @@ If
13271327
\f1\fs18 initializeChromosome()
13281328
\f2\fs20 , allowing a different mutation run count to be specified for each chromosome in multi-chromosome models.\expnd0\expndtw0\kerning0
13291329
\
1330+
\pard\pardeftab720\li547\ri720\sb60\sa60\partightenfactor0
13301331
\cf0 \kerning1\expnd0\expndtw0 If
13311332
\f1\fs18 preventIncidentalSelfing
13321333
\f2\fs20 is
@@ -1406,6 +1407,7 @@ If
14061407
\f2\fs20 for
14071408
\f1\fs18 checkInfiniteLoops
14081409
\f2\fs20 to disable these checks. There is no way to turn these checks on or off for individual loops; it is a global setting.\
1410+
\pard\pardeftab720\li547\ri720\sb60\sa60\partightenfactor0
14091411
\cf0 This function will likely be extended with further options in the future, added on to the end of the argument list. Using named arguments with this call is recommended for readability. Note that turning on optional features may increase the runtime and memory footprint of SLiM.\
14101412
\pard\pardeftab720\li720\fi-446\ri720\sb180\sa60\partightenfactor0
14111413

@@ -2147,8 +2149,9 @@ The code for
21472149
\f3\i F
21482150
\f2\i0\fs13\fsmilli6667 \sub ST
21492151
\fs20 \nosupersub (but see below for further discussion and clarification):\
2152+
\pard\pardeftab397\li547\ri720\sb60\sa60\partightenfactor0
21502153

2151-
\f3\i F
2154+
\f3\i \cf2 F
21522155
\f2\i0\fs13\fsmilli6667 \sub ST
21532156
\fs20 \expnd0\expndtw0\kerning0
21542157
\nosupersub = 1 -
@@ -2160,7 +2163,8 @@ The code for
21602163
\f2\i0\fs13\fsmilli6667 \kerning1\expnd0\expndtw0 \sub T
21612164
\fs20 \expnd0\expndtw0\kerning0
21622165
\nosupersub \
2163-
\kerning1\expnd0\expndtw0 where
2166+
\pard\pardeftab397\li547\ri720\sb60\sa60\partightenfactor0
2167+
\cf2 \kerning1\expnd0\expndtw0 where
21642168
\f3\i H
21652169
\fs13\fsmilli6667 \sub S
21662170
\f2\i0\fs20 \nosupersub is the average heterozygosity in the two subpopulations, and
@@ -2334,6 +2338,151 @@ The inbreeding load is a measure of the quantity of recessive deleterious variat
23342338
This function was contributed by Chris Kyriazis; thanks, Chris!\
23352339
\pard\pardeftab720\li720\fi-446\ri720\sb180\sa60\partightenfactor0
23362340

2341+
\f1\fs18 \cf2 (float)calcLD_D(object<Mutation>$\'a0mut1, [No<Mutation>\'a0mut2\'a0=\'a0NULL], [No<Haplosome>\'a0haplosomes\'a0=\'a0NULL])\
2342+
\pard\pardeftab397\li547\ri720\sb60\sa60\partightenfactor0
2343+
2344+
\f2\fs20 \cf2 Calculates the linkage disequilibrium (LD) coefficient
2345+
\f3\i D
2346+
\f2\i0 between a focal mutation
2347+
\f1\fs18 mut1
2348+
\f2\fs20 and one or more mutations in
2349+
\f1\fs18 mut2
2350+
\f2\fs20 , evaluated across a set of haplosomes given by
2351+
\f1\fs18 haplosomes
2352+
\f2\fs20 . The result is a
2353+
\f1\fs18 float
2354+
\f2\fs20 vector that matches the size and order of
2355+
\f1\fs18 mut2
2356+
\f2\fs20 . The implementation of this function, viewable with
2357+
\f1\fs18 functionSource()
2358+
\f2\fs20 , calculates
2359+
\f3\i D
2360+
\f2\i0 as defined by Hill and Robertson (1968, p. 226). The coefficient
2361+
\f3\i D
2362+
\f2\i0 is within [\uc0\u8722
2363+
\f3\i p
2364+
\f2\i0 (1\uc0\u8722
2365+
\f3\i p
2366+
\f2\i0 ),
2367+
\f3\i p
2368+
\f2\i0 (1\uc0\u8722
2369+
\f3\i p
2370+
\f2\i0 )], where
2371+
\f3\i p
2372+
\f2\i0 is the frequency of the more common mutation (that is,
2373+
\f3\i p
2374+
\f2\i0 \'a0=\'a0max(
2375+
\f3\i f
2376+
\f2\i0\fs13\fsmilli6667 \sub 1
2377+
\fs20 \nosupersub ,
2378+
\f3\i f
2379+
\f2\i0\fs13\fsmilli6667 \sub 2
2380+
\fs20 \nosupersub ) where
2381+
\f3\i f
2382+
\f2\i0\fs13\fsmilli6667 \sub 1
2383+
\fs20 \nosupersub and
2384+
\f3\i f
2385+
\f2\i0\fs13\fsmilli6667 \sub 2
2386+
\fs20 \nosupersub are the frequencies of the two mutations for which
2387+
\f3\i D
2388+
\f2\i0 is being calculated); for the normalized LD metric
2389+
\f3\i r
2390+
\f2\i0\fs13\fsmilli6667 \super 2
2391+
\fs20 \nosupersub , which is within [0, 1], see
2392+
\f1\fs18 calcLD_Rsquared()
2393+
\f2\fs20 . Departures of
2394+
\f3\i D
2395+
\f2\i0 from zero indicate LD; more specifically,
2396+
\f3\i D
2397+
\f2\i0 \'a0>\'a00 indicates that the mutations occur together more often than expected by chance (positive linkage), whereas
2398+
\f3\i D
2399+
\f2\i0 \'a0<\'a00 indicates they occur together less often than expected by chance (negative linkage).\
2400+
All mutations in
2401+
\f1\fs18 mut2
2402+
\f2\fs20 must be associated with the same chromosome as
2403+
\f1\fs18 mut1
2404+
\f2\fs20 ; this function does not currently calculate LD between mutations associated with different chromosomes. If
2405+
\f1\fs18 mut2
2406+
\f2\fs20 is
2407+
\f1\fs18 NULL
2408+
\f2\fs20 (the default), all such mutations in the population (including
2409+
\f1\fs18 mut1
2410+
\f2\fs20 itself) will be used. Similarly, all haplosomes must be associated with the same chromosome as
2411+
\f1\fs18 mut1
2412+
\f2\fs20 . If the
2413+
\f1\fs18 haplosomes
2414+
\f2\fs20 parameter is
2415+
\f1\fs18 NULL
2416+
\f2\fs20 (the default), all such haplosomes in the population will be used.\
2417+
This function was written by Vitor Sudbrack (currently affiliated with University of Lausanne).\
2418+
\pard\pardeftab720\li720\fi-446\ri720\sb180\sa60\partightenfactor0
2419+
2420+
\f1\fs18 \cf2 (float)calcLD_Rsquared(object<Mutation>$\'a0mut1, [No<Mutation>\'a0mut2\'a0=\'a0NULL], [No<Haplosome>\'a0haplosomes\'a0=\'a0NULL], [logical$\'a0squared\'a0=\'a0T])\
2421+
\pard\pardeftab397\li547\ri720\sb60\sa60\partightenfactor0
2422+
2423+
\f2\fs20 \cf2 Calculates the linkage disequilibrium (LD) squared correlation coefficient
2424+
\f3\i r
2425+
\f2\i0\fs13\fsmilli6667 \super 2
2426+
\fs20 \nosupersub between a focal mutation
2427+
\f1\fs18 mut1
2428+
\f2\fs20 and one or more mutations in
2429+
\f1\fs18 mut2
2430+
\f2\fs20 , evaluated across a set of haplosomes given by
2431+
\f1\fs18 haplosomes
2432+
\f2\fs20 . The result is a
2433+
\f1\fs18 float
2434+
\f2\fs20 vector that matches the size and order of
2435+
\f1\fs18 mut2
2436+
\f2\fs20 . The implementation of this function, viewable with
2437+
\f1\fs18 functionSource()
2438+
\f2\fs20 , calculates
2439+
\f3\i r
2440+
\f2\i0\fs13\fsmilli6667 \super 2
2441+
\fs20 \nosupersub as defined by Hill and Robertson (1968, p. 227). The squared correlation coefficient
2442+
\f3\i r
2443+
\f2\i0\fs13\fsmilli6667 \super 2
2444+
\fs20 \nosupersub is a normalized measure of LD within [0, 1] (for the unnormalized LD coefficient
2445+
\f3\i D
2446+
\f2\i0 , see
2447+
\f1\fs18 calcLD_D()
2448+
\f2\fs20 ). When
2449+
\f3\i r
2450+
\f2\i0\fs13\fsmilli6667 \super 2
2451+
\fs20 \nosupersub \'a0=\'a00, there is no statistical association between the mutations; they co-occur as expected by chance. A value of
2452+
\f3\i r
2453+
\f2\i0\fs13\fsmilli6667 \super 2
2454+
\fs20 \nosupersub \'a0=\'a01 indicates complete correlation: the mutations either always appear together or never appear together, depending on the sign of the underlying correlation coefficient
2455+
\f3\i r
2456+
\f2\i0 . To obtain the raw (signed)
2457+
\f3\i r
2458+
\f2\i0 value instead of
2459+
\f3\i r
2460+
\f2\i0\fs13\fsmilli6667 \super 2
2461+
\fs20 \nosupersub , you can pass
2462+
\f1\fs18 squared=F
2463+
\f2\fs20 instead of the default of
2464+
\f1\fs18 T
2465+
\f2\fs20 .\
2466+
All mutations in
2467+
\f1\fs18 mut2
2468+
\f2\fs20 must be associated with the same chromosome as
2469+
\f1\fs18 mut1
2470+
\f2\fs20 ; this function does not currently calculate LD between mutations associated with different chromosomes. If
2471+
\f1\fs18 mut2
2472+
\f2\fs20 is
2473+
\f1\fs18 NULL
2474+
\f2\fs20 (the default), all such mutations in the population (including
2475+
\f1\fs18 mut1
2476+
\f2\fs20 itself) will be used. Similarly, all haplosomes must be associated with the same chromosome as
2477+
\f1\fs18 mut1
2478+
\f2\fs20 . If the
2479+
\f1\fs18 haplosomes
2480+
\f2\fs20 parameter is
2481+
\f1\fs18 NULL
2482+
\f2\fs20 (the default), all such haplosomes in the population will be used.\
2483+
This function was written by Vitor Sudbrack (currently affiliated with University of Lausanne).\
2484+
\pard\pardeftab720\li720\fi-446\ri720\sb180\sa60\partightenfactor0
2485+
23372486
\f1\fs18 \cf2 (float$)calcMeanFroh(object<Individual>\'a0individuals, [integer$\'a0minimumLength\'a0=\'a01000000], [Niso<Chromosome>$\'a0chromosome\'a0=\'a0NULL])\
23382487
\pard\pardeftab397\li547\ri720\sb60\sa60\partightenfactor0
23392488

VERSIONS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ development head (in the master branch):
4242
add matrix() method to Plot to add a matrix-based image to a plot: (void)matrix(numeric matrix, numeric$ x1, numeric$ x2, numeric$ y1, numeric$ y2, [logical$ flipped = F], [Nif valueRange = NULL], [Ns$ colors = NULL], [float$ alpha = 1.0])
4343
fix #538, check for unique SLiM ids only for alive individuals when loading a tree sequence
4444
fix #533, memory smasher in Haplosome method nucleotides(format="char"); besides causing random crashes, this also caused this method to return incorrect results
45+
fix #527, add calcLD_D() and calcLD_Rsquared() popgen functions, contributed by Vitor Sudbrack
4546

4647

4748
version 5.0 (Eidos version 4.0):

0 commit comments

Comments
 (0)