@@ -705,9 +705,9 @@ Two-locus statistics can be computed using two modes, either `site` or
705705single-site statistics. Within this framework, statistics may be either
706706polarised or unpolarised. For statistics that are polarised, we compute
707707statistic values for pairs of derived alleles. Unpolarised statistics compute
708- statistics over all possible alleles, derived and ancestral, and the result in
709- averaged over statistics computed for all pairs of alleles (see weighting
710- schemes below) . The option for polarisation is not exposed to the user.
708+ statistics over all pairs of alleles, derived and ancestral. In either case,
709+ the result is averaged over these values, using a weighting
710+ scheme descrbed below. The option for polarisation is not exposed to the user.
711711Instead, we implement statistics that are polarised where appropriate.
712712
713713(sec_stats_two_locus_site)=
@@ -746,16 +746,17 @@ print(ld)
746746
747747Because we allow for two-locus statistics to be computed for multi-allelic
748748data, we need to be able to combine statistical results from each pair of
749- alleles into one summary for a pair of sites. We use two implementations for
749+ alleles into one summary for a pair of sites. This does not affect biallelic
750+ data (and so this section can be skipped on first reading).
751+ We use two implementations for
750752combining results from multiple alleles: ` hap_weighted ` and ` total_weighted ` .
751753These are statistic-specific and not chosen by the user.
752754
753755Briefly, consider a pair of sites with {math}` n ` alleles at the first locus and
754- {math}` m ` alleles at the second. Write {math}` f_{i,j} ` as the statistic
755- computed for focal alleles {math}` A_i ` and {math}` B_j ` , with haplotype weights
756- {math}` (A_i B_j, A_i b_j, a_i B_j) ` , where {math}` a_i ` and {math}` b_j ` are the
757- collection of alleles that are not the focal alleles {math}` A_i ` or
758- {math}` B_j ` , respectively. Then the weighting schemes are defined as:
756+ {math}` m ` alleles at the second. (Whether this includes the ancestral allele
757+ depends on whether the statistic is polarised.) Write {math}` f_{i,j} ` as the statistic
758+ computed for focal alleles {math}` A_i ` and {math}` B_j ` .
759+ Then the weighting schemes are defined as:
759760
760761- ` hap_weighted ` : {math}` \sum_{i=1}^{n}\sum_{j=1}^{m}p(A_{i}B_{j})f_{ij} ` ,
761762 where {math}` p(A_{i}B_{j}) ` is the frequency of haplotype {math}` A_{i}B_{j} ` .
@@ -780,7 +781,7 @@ The `branch` mode computes expected two-locus statistics between pairs of
780781trees, conditioned on the marginal topologies and branch lengths of those
781782trees. The trees for which we compute statistics are specified by positions,
782783and for a pair of positions we consider all possible haplotypes that could be
783- generated by a single mutation occurring at the two trees.
784+ generated by a single mutation occurring on each of the two trees.
784785
785786For two trees, one with {math}` n ` branches and the other with {math}` m `
786787branches, there are {math}` nm ` possible pairs of branches that may carry the
@@ -827,13 +828,13 @@ in the same manner as the rest of the stats API (see
827828the stats API in that we handle one-way and two-way statistics in the same
828829function call.
829830
830- To compute a two-way two-locus statistic, the ` index ` argument must be
831+ To compute a two-way two-locus statistic, the ` indexes ` argument must be
831832provided. The statistics are selected in the same way (with the ` stat `
832833argument), but we provide a restricted set of two-way statistics (see
833834{ref}` sec_stats_two_locus_summary_functions_two_way ` ). The dimension-dropping
834835rules for the result follow the rest of the tskit stats API in that a single list
835836or tuple will produce a single two-dimensional matrix, while list of these
836- will produce a three-dimensional array, with the outer dimension of length
837+ will produce a three-dimensional array, with the first dimension of length
837838equal to the length of the list.
838839
839840For example, to compute the {math}` r^2 ` LD matrix over a subset of samples in
@@ -868,8 +869,8 @@ ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=[(0, 1)]) -> 3 di
868869
869870#### Why are there nan values in the LD matrix?
870871
871- For some statistics, it is possible to observe nan entries in the LD matrix,
872- which can be surprising or numerically impact downstream analyses. A nan entry
872+ For some statistics, it is possible to observe ` nan ` entries in the LD matrix,
873+ which can be surprising or numerically impact downstream analyses. A ` nan ` entry
873874may occur when computing a ratio statistic (such as {math}` r ` or {math}` r^2 ` )
874875with a denominator of zero, indicating that one or both sites in the pair are
875876monomorphic. This can happen for a number of reasons:
0 commit comments