Commit b77bc1b
committed
Two-way two-locus statistics (python proto)
This PR implements two-way LD statistics, specified between sample sets.
During the development of this functionality, a number of issues with
the designation of state_dims/result_dims were discovered. These have
been resolved, testing clean for existing code and providing the proper
behavior for this new code.
The mechanism by which users will specify a multi-population (or
two-way) statistic is by providing the `index` argument. This helps us
avoid creating another `ld_matrix` method for the TreeSequence object.
In other words, for a one-way statistic, a user would specify:
```python
ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]])
```
Which would output a 3D ndarray containing one LD matrix per sample set.
```python
ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1)])
```
Which would output a 2D ndarray containing one LD matrix for the index
pair. This would use our `D2_ij_summary_func`, instead of the
`D2_summary_func`. Finally, if a user provided
```python
ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1), (1, 1)])
```
We would output a 3D ndarray containing one LD matrix _per_ index pair
provided.
Since these are two-way statistics, the indexes must be length 2. We
plan on enabling users to implement k-way via a "general_stat" api. We did
not implement anything more than two-way statistics here because of the
combinatoric explosion of logic required for indexes > 2.
I added some basic tests to demonstrate that things were working
properly. If we compute two-way statistics on identical sample sets,
they should be equal to the one-way statistics. Unfortunately, this does
not apply to unbiased statistics, which I've validated manually.
I've also cleaned up the docstrings a bit and fixed a bug with the
D_prime statistic, which should not be weighted by haplotype frequency.1 parent c86884d commit b77bc1b
2 files changed
Lines changed: 410 additions & 72 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4231 | 4231 | | |
4232 | 4232 | | |
4233 | 4233 | | |
4234 | | - | |
| 4234 | + | |
4235 | 4235 | | |
4236 | 4236 | | |
4237 | 4237 | | |
| |||
0 commit comments