Skip to content

Commit fa7b3ab

Browse files
committed
example of subsetting causing nan values
1 parent b5efa32 commit fa7b3ab

1 file changed

Lines changed: 39 additions & 0 deletions

File tree

docs/stats.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -849,6 +849,45 @@ ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=(0, 1)) -> 2 dime
849849
ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=[(0, 1)]) -> 3 dimensions
850850
```
851851

852+
By computing the LD statistic for a subset of samples in the tree sequence, it
853+
is possible that statistics are computed over pairs of sites that are not
854+
variable within the sample set. This can result in zero- or nan-valued statistics
855+
as outputs. To avoid this, the tree sequence may first be simplified to the
856+
sample subset, which will remove sites that are invariant within that subset:
857+
858+
```{code-cell} ipython3
859+
ts = msprime.sim_ancestry(
860+
20,
861+
population_size=10000,
862+
sequence_length=1000,
863+
recombination_rate=2e-8,
864+
random_seed=12)
865+
ts = msprime.sim_mutations(ts, rate=2e-8, random_seed=12)
866+
867+
ld_all = ts.ld_matrix(mode="site")
868+
print(ld_all)
869+
```
870+
871+
Considering a subset of samples:
872+
873+
```{code-cell} ipython3
874+
ld_sub = ts.ld_matrix(mode="site", sample_sets=range(8))
875+
print(ld_sub)
876+
```
877+
878+
Simplifying and computing the LD matrix avoids computing {math}`r^2` for
879+
invariant sites in the subsample:
880+
881+
```{code-cell} ipython3
882+
ts_simp = ts.simplify(samples=range(8), filter_sites=True)
883+
ld_sub = ts_simp.ld_matrix(mode="site", sample_sets=range(8))
884+
print(ld_sub)
885+
```
886+
887+
Any remaining sites that produce {math}`r^2` with a nan values are those with
888+
a mutation that occurs above the root of the tree, so that all samples in the
889+
subset carry the derived allele (so that its allele frequency is 1).
890+
852891
(sec_stats_two_locus_sample_one_way_stats)=
853892

854893
##### One-way Statistics

0 commit comments

Comments
 (0)