Skip to content

Commit ba98ec2

Browse files
committed
Small fixes and inconsistencies
1 parent 547bdc8 commit ba98ec2

2 files changed

Lines changed: 35 additions & 24 deletions

File tree

docs/stats.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -969,6 +969,12 @@ as input. Each of our summary functions has the signature
969969
and {math}`n_B = n_{AB} + n_{aB}`, with frequencies {math}`p` found by dividing
970970
by {math}`n`.
971971

972+
Our convention is to use {math}`A,B` to denote derived alleles, and {math}`a,b`
973+
ancestral alleles (or other alleles, if the site is multi-allelic). For
974+
polarised statistics, we average statistics over all non-ancestral alleles. For
975+
unpolarised statistics, the labeling is arbitrary as we average over all
976+
alleles (derived and ancestral).
977+
972978
`D`
973979
: {math}`f(n_{AB}, n_{Ab}, n_{aB}, n) = p_{AB}p_{ab} - p_{Ab}p_{aB} \, (=p_{AB} - p_A p_B)`
974980

python/tskit/trees.py

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -10942,36 +10942,39 @@ def ld_matrix(
1094210942
:math:`r^2`) computed from sample allelic states or branch lengths.
1094310943
The resulting linkage disequilibrium (LD) matrix represents either the
1094410944
two-locus statistic as computed between all pairs of specified
10945-
``sites`` ("site" mode, producing a num_sites-by-num_sites sized
10946-
matrix), or as computed from the branch structures at marginal trees
10947-
between pairs of trees at all specified ``positions`` ("branch" mode,
10948-
producing a num_positions-by-num_positions sized matrix).
10945+
``sites`` (``"site"`` mode, producing a
10946+
``len(sites)``-by-``len(sites)`` sized matrix), or as computed from the
10947+
branch structures at marginal trees between pairs of trees at all
10948+
specified ``positions`` (``"branch"`` mode, producing a
10949+
``len(positions)``-by-``len(positions)`` sized matrix).
1094910950
10950-
The sites considered for "site" mode defaults to all sites (which may
10951+
The sites considered for ``"site"`` mode defaults to all sites (which may
1095110952
result in a very large matrix!), but can be restricted using
10952-
the ``sites`` argument. Sites can be passed as a list of lists,
10953-
specifying the ``[[row_sites], [col_sites]]``, resulting in a
10953+
the ``sites`` argument. Sites must be passed as a list of lists,
10954+
specifying the ``[row_sites, col_sites]``, resulting in a
1095410955
rectangular matrix, or by specifying a single list of ``[sites]``, in
1095510956
which a square matrix will be produced (see
10956-
:ref:`sec_stats_two_locus_site` for examples).
10957+
:ref:`sec_stats_two_locus_site` for examples). Here, ``sites``,
10958+
``row_sites``, and ``col_sites`` are each lists of site indexes.
1095710959
10958-
Similarly, in the branch mode, the ``positions`` argument specifies
10960+
Similarly, in the ``"branch"`` mode, the ``positions`` argument specifies
1095910961
genomic coordinates at which the expectation for the two-locus statistic
1096010962
is computed, given the local tree structure.
1096110963
(See :ref:`sec_stats_two_locus_branch` for explanation of in what sense
1096210964
this is an expectation.) This defaults to computing
10963-
the LD for each pair of distinct trees, which is equivalent to passing in
10964-
the leftmost coordinates of each tree's span (since intervals are closed on
10965+
the LD for each pair of distinct trees (this is equivalent to passing in
10966+
the leftmost coordinates of each tree's span, since intervals are closed on
1096510967
the left and open on the right). Similar to the site mode, a nested list
1096610968
of row and column positions can be specified separately (resulting in a
1096710969
rectangular matrix) or a single list of a specified positions results
1096810970
in a square matrix (see :ref:`sec_stats_two_locus_branch` for
10969-
examples).
10971+
examples). Like ``sites``, the ``positions`` must be specified as a list
10972+
of lists.
1097010973
10971-
Some LD statistics are defined for two sample sets as well as within a
10972-
single set of samples. If the ``indexes`` argument is specified, then
10974+
Some LD statistics are defined for both within a single set of samples
10975+
and for two sample sets. If the ``indexes`` argument is specified, then
1097310976
``indexes`` specifies the indexes of the sample sets in the
10974-
``sample_sets`` list between which to compute LD. For example, this
10977+
``sample_sets`` list between which to compute LD. For instance, this
1097510978
results in a 3D array whose ``[k,:,:]``-th slice contains LD values
1097610979
between ``sample_sets[i]`` and ``sample_sets[j]``, where ``(i, j)`` is
1097710980
the ``k``-th element of ``indexes``.
@@ -11008,16 +11011,18 @@ def ld_matrix(
1100811011
computed. Defaults to "site", can be "site" or "branch".
1100911012
:param str stat: A string giving the selected two-locus statistic to
1101011013
compute. Defaults to "r2".
11011-
:param list sites: A list of sites over which to compute LD. Can be
11012-
specified as a list of lists to control the row and column sites.
11013-
Only applicable in site mode. Specify as
11014-
``[[row_sites], [col_sites]]`` or ``[all_sites]``.
11014+
:param list sites: A list of lists of sites over which to compute an
11015+
LD matrix. Can be specified as a list of lists to control the row
11016+
and column sites. Only available in "site" mode. Specify as
11017+
``[row_sites, col_sites]`` or ``[all_sites]``.
1101511018
Defaults to all sites.
11016-
:param list positions: A list of genomic positions where expected LD is
11017-
computed. Only applicable in branch mode. Can be specified as a list
11018-
of lists to control the row and column positions. Specify as
11019-
``[[row_positions], [col_positions]]`` or ``[all_positions]``.
11020-
Defaults to the leftmost coordinates of all trees.
11019+
:param list positions: A list of lists of genomic positions where
11020+
expected LD is computed based on tree topologies and branch
11021+
lengths. Only applicable in "branch" mode. Specify as a list of
11022+
two lists to control the row and column positions, as
11023+
``[row_positions, col_positions]``, or ``[all_positions]``.
11024+
Defaults to the leftmost coordinates of all trees and computes
11025+
LD between all pairs of trees.
1102111026
:param list indexes: A list of 2-tuples or a single 2-tuple, specifying
1102211027
the indexes of two sample sets over which to compute a two-way LD
1102311028
statistic. Only :math:`r^2`, :math:`D^2`, and :math:`\widehat{D^2}`

0 commit comments

Comments
 (0)