Commit b118e2c
Two-way two-locus statistics (python proto)
This PR implements two-way LD statistics, specified between sample sets.
During the development of this functionality, a number of issues with
the designation of state_dims/result_dims were discovered. These have
been resolved, testing clean for existing code and providing the proper
behavior for this new code.
The mechanism by which users will specify a multi-population (or
two-way) statistic is by providing the `indexes` argument. This helps us
avoid creating another `ld_matrix` method for the TreeSequence object.
In other words, for a one-way statistic, a user would specify:
```python
ts.ld_matrix(stat="D2", sample_sets=[ss1, ss2])
```
Which would output a 3D array containing one LD matrix per sample set.
```python
ts.ld_matrix(stat="D2", sample_sets=[ss1, ss2], indexes=(0, 1))
```
Will output a 2D array containing one LD matrix for the index
pair. This would use our `D2_ij_summary_func`, instead of the
`D2_summary_func`. Finally,
```python
ts.ld_matrix(stat="D2", sample_sets=[ss1, ss2], indexes=[(0, 1), (1, 1)])
```
will output a 3D array containing one LD matrix _per_ index pair
provided.
Tests have been added to validate the result dimension for all of the
possible input combinations.
Since these are two-way statistics, the indexes must be length 2. We
plan on enabling users to implement k-way via a "general_stat" api. We did
not implement anything more than two-way statistics here because of the
combinatoric explosion of logic required for indexes > 2.
I added some basic tests to demonstrate that things were working
properly. If we compute two-way statistics on identical sample sets,
they should be equal to the one-way statistics. This test does not work
on unbiased statistics unless the sample sets being tested are equal in
index. I've added another test for two-way unbiased statistics.
I've also cleaned up the docstrings a bit and fixed a bug with the
D_prime statistic, which should not be weighted by haplotype frequency.1 parent c998bfb commit b118e2c
3 files changed
Lines changed: 456 additions & 82 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2870 | 2870 | | |
2871 | 2871 | | |
2872 | 2872 | | |
2873 | | - | |
2874 | | - | |
| 2873 | + | |
2875 | 2874 | | |
2876 | 2875 | | |
2877 | 2876 | | |
| |||
3003 | 3002 | | |
3004 | 3003 | | |
3005 | 3004 | | |
3006 | | - | |
3007 | | - | |
| 3005 | + | |
| 3006 | + | |
3008 | 3007 | | |
3009 | 3008 | | |
3010 | 3009 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4231 | 4231 | | |
4232 | 4232 | | |
4233 | 4233 | | |
4234 | | - | |
| 4234 | + | |
4235 | 4235 | | |
4236 | 4236 | | |
4237 | 4237 | | |
| |||
0 commit comments