@@ -19,6 +19,9 @@ kernelspec:
1919# _ Advanced simplification_
2020% remove underscores in title when tutorial is complete or near-complete
2121
22+ :::{todo}
23+ This tutorial is only partly complete: and there are a number of sections containing TODO items.
24+ :::
2225
2326This is a companion to the basic {ref}` sec_simplification ` tutorial.
2427It focuses on details of ` simplify ` behavior that are useful when you need precise
@@ -55,6 +58,8 @@ tables to be {meth}`sorted <TableCollection.sort>`). Simplifying tables in place
5558is often useful for {ref}` forward-time simulations <sec_tskit_forward_simulations> ` .
5659:::
5760
61+ (sec_advanced_simplification_map_nodes)=
62+
5863## 1) Tracking node ID changes
5964
6065With default settings, simplification compacts tables and therefore reassigns node
@@ -74,6 +79,8 @@ Note that when simplifying tables in-place using {meth}`TableCollection.simplify
7479is always returned. To avoid compacting the node table, and leave node IDs unchanged, use
7580` filter_nodes=False ` .
7681
82+ (sec_advanced_simplification_map_nodes_reverse)=
83+
7784### Obtaining the reverse map
7885
7986Often you might want a reverse map, mapping the new node IDs to the old ones. Here's
@@ -94,21 +101,52 @@ print("New sample ID 0", "maps to old ID", int(reverse_map[0]))
94101## 2) Keeping input roots
95102
96103:::{todo}
97- This is easy to illustrate, and useful for forward sims / census approaches
104+ The ` keep_input_roots=True ` argument is easy to illustrate, and useful for
105+ forward sims / census approaches.
106+ :::
107+
108+ ## 3) Keeping ancestral individuals
109+
110+ In some cases, a tree sequence might contain historical individuals which are associated
111+ with nodes that are not samples, and you wish to retain information on individuals which
112+ remain ancestral after simplifying. For example a forward-time simulation could
113+ define individuals for all nodes in the past, including the
114+ {ref}` pedigree links <msprime:sec_pedigrees_encoding> ` between parents and children,
115+ and you wish to retain the chain of individuals that define that portion of the pedigree
116+ which is relevant to the genetic ancestry (see also discussion in the SLiM manual, and in
117+ [ SLiM issue #139 ] ( https://github.com/MesserLab/SLiM/issues/139 ) ).
118+
119+ To keep all the individuals associated with genetic ancestry, you can use
120+ ` keep_unary_in_individuals=True ` . In particular, this means
121+ that ancestral nodes which are not coalescent anywhere along the genome,
122+ but which are associated with an individual, will be retained (and
123+ so the referenced individuals will be retained too).
124+
125+ :::{todo}
126+ Should we have a demonstration here? {ref}` sec_tskit_forward_simulations ` could be used to
127+ create a simulator that saves pedigree information into each individual, and we could distill
128+ some of the discussion from https://github.com/MesserLab/SLiM/issues/139 into an example
129+ of storing a coherent pedigree.
98130:::
99131
100- ## 3) Setting sample flags
132+ The ` keep_unary_in_individuals ` argument is a specific example of keeping some, but not all,
133+ non-coalescent ancestry in the tree sequence. If you need to retain a known set of
134+ non-coalescent nodes, it can be helpful to treat them as focal samples and use the
135+ ` update_sample_flags=False ` option, as described next.
136+
137+
138+ ## 4) Setting sample flags
101139
102140Normally the nodes that are provided to the ` simplify() ` function are marked as sample
103141nodes in the output (by setting the ` NODE_IS_SAMPLE ` flag), and other nodes have that flag unset.
104- If you provide the ` update_sample_flags=False ` option , all node flags are left unchanged.
142+ If you provide the ` update_sample_flags=False ` argument , all node flags are left unchanged.
105143Here are some cases where that can be useful.
106144
107145### Parallel simplification
108146
109147One use for the ` update_sample_flags=False ` option combines it with ` filter_nodes=False ` ,
110148to ensure that the node table remains untouched during simplification.
111- This is primarily a use-case targetted at developers of forward simulators, and allows
149+ This is primarily a use-case targeted at developers of forward simulators, and allows
112150logically disjunct parts of the edge table to be simplified in parallel, without
113151risking two parallel processes trying to alter the same data.
114152
@@ -220,24 +258,6 @@ d3arg = argviz.D3ARG.from_ts(ts=subset_arg)
220258d3arg.draw(title=f"A full ARG, subset to {subset_arg.num_samples} samples");
221259```
222260
223- ## 4) Keeping individuals
224-
225- In some cases, a tree sequence might contain historical individuals which are associated
226- with nodes that are not samples, and you wish to retain information on individuals which are
227- ancestral to the sample nodes. For example a forward-time simulation could
228- define individuals for all nodes in the past, including the pedigree links between parents
229- and children (see also discussion in the SLiM manual, and at
230- https://github.com/MesserLab/SLiM/issues/139 ).
231-
232- To keep all the individuals associated with genetic ancestry, you can use
233- ` keep_unary_in_individuals=True ` .
234-
235- :::{todo}
236- Should we have a demonstration here? {ref}` sec_tskit_forward_simulations ` could be used to
237- create a simulator that saves pedigree information into each individual, and we could distill
238- some of the discussion from https://github.com/MesserLab/SLiM/issues/139 into that.
239- :::
240-
241261## 5) reduce_to_site_topology
242262
243263:::{todo}
0 commit comments