Skip to content

Commit 0c0fb29

Browse files
committed
Add description of weighting schemes
1 parent af8a725 commit 0c0fb29

1 file changed

Lines changed: 27 additions & 5 deletions

File tree

docs/stats.md

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -740,11 +740,33 @@ ld = ts.ld_matrix(sites=[[0, 1], [1, 2, 3]])
740740
print(ld)
741741
```
742742

743-
Because we implement two-locus statistics for multi-allelic data, we require
744-
a method for combining the statistical results from each pair of alleles into
745-
one summary for a pair of sites. There two methods for combining results from
746-
multiple alleles, `hap_weighted` and `total_weighted`, which are
747-
statistic-specific and not chosen by the user:
743+
Because we allow for two-locus statistics to be computed for multi-allelic
744+
data, we need to be able to combine statistical results from each pair of
745+
alleles into one summary for a pair of sites. We use two implementations for
746+
combining results from multiple alleles: `hap_weighted` and `total_weighted`.
747+
These are statistic-specific and not chosen by the user.
748+
749+
Briefly, consider a pair of sites with {math}`n` alleles at the first locus and
750+
{math}`m` alleles at the second. Write {math}`f_{i,j}` as the statistic
751+
computed for focal alleles {math}`A_i` and {math}`B_j`, with haplotype weights
752+
{math}`(A_i B_j, A_i b_j, a_i B_j)`, where {math}`a_i` and {math}`b_j` are the
753+
collection of alleles that are not the focal alleles {math}`A_i` or
754+
{math}`B_j`, respectively. Then the weighting schemes are defined as:
755+
756+
- `hap_weighted`: {math}`\sum_{i=1}^{n}\sum_{j=1}^{m}p(A_{i}B_{j})f_{ij}`,
757+
where {math}`p(A_{i}B_{j})` is the frequency of haplotype {math}`A_{i}B_{j}`.
758+
This method was first introduced in [Karlin
759+
(1981)](https://doi.org/10.1111/j.1469-1809.1981.tb00308.x) and reviewed in
760+
[Zhao (2007)](https://doi.org/10.1017/S0016672307008634).
761+
762+
- `total_weighted`: {math}`\frac{1}{n m}\sum_{i=1}^{n}\sum_{j=1}^{m}f_{ij}`.
763+
This method assigns equal weight to each of the possible pairs of focal
764+
alleles at the two sites, taking the arithmetic mean of statistics over
765+
focal haplotypes.
766+
767+
Out of all of the available summary functions, only {math}`r^2` uses
768+
`hap_weighted` normalisation, with the remainder using uniform weighting
769+
(`total_weighted`).
748770

749771
(sec_stats_two_locus_branch)=
750772

0 commit comments

Comments
 (0)