Skip to content

Commit c24f11f

Browse files
committed
complete fix #579 with internal documentation
1 parent ac19636 commit c24f11f

2 files changed

Lines changed: 45 additions & 0 deletions

File tree

VERSIONS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ development head (in the master branch):
1818
add the PCG random number generator, switch to pcg32_fast and pcg64_fast, remove all use of the old taus2 and MT19937-64 generators; note this completely breaks backward reproducibility
1919
fix various bugs involving conflicts between defined constants and other symbols, including #573 and #574; this sets new definition rules that could break some existing scripts (but is unlikely to)
2020
fix #575, QtSLiM terminates early when single-stepping with a rescheduled script block
21+
fix #579, crash in models with (a) tree-seq recording, (b) multiple chromosomes, AND (c) rejection of proposed offspring in modifyChild()
2122

2223

2324
version 5.1 (Eidos version 4.1):

treerec/implementation.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,3 +222,47 @@ To read in a tree sequence, SLiM requires that:
222222
This is because of how `__TabulateSubpopulationsFromTreeSequence` works;
223223
probably it could be made more general, but it isn't.
224224

225+
## Multiple chromosomes and multiple tree sequences
226+
227+
In SLiM 5.0 we added support for simulating multiple chromosomes in SLiM, and that added
228+
some wrinkles to how tree-sequence recording is implemented in SLiM. In particular, we
229+
now keep one tree sequence per chromosome being simulated. In `species.h` we now define
230+
a struct named `TreeSeqInfo` that keeps the `tsk_table_collection_t`, in particular, as
231+
well as a couple of other bits of information. Each Species then keeps its own `std::vector`
232+
of `TreeSeqInfo` structs, named `treeseq_`. Where operations used to involve the table
233+
collection, they now typically involve a loop over the elements of `treeseq_` to perform
234+
the operation on each table collection in turn.
235+
236+
The main complication here is that all of these table collections share three tskit tables:
237+
the node, individual, and population tables. This means that the table collections all
238+
have a shared structure, and that needs to be preserved across operations like simplify.
239+
The shared tables are kept by the first chromosome's table collection, in `treeseq_[0]`.
240+
The other table collections zero-fill their node, individual, and population tables most
241+
of the time. That means that, in that state, they are not compliant with tskit
242+
requirements, and trying to use them will often produce a segfault due to a dereference
243+
of a NULL pointer. That is deliberate and useful; it makes it easy to debug situations
244+
where the table collections are being used when they should not be. Sometimes we want the
245+
table collections to actually be usable. For that, `Species::CopySharedTablesIn()` will
246+
do a bitwise, shallow copy of the shared tables into a given table collection; it should
247+
be matched by `DisconnectCopiedSharedTables()` as soon as the operation is done, restoring
248+
the zero-filled table state. See https://github.com/tskit-dev/tskit/pull/2665 for
249+
Jerome's original multi-chromosome parallel simplification example, from which this design
250+
was derived.
251+
252+
The end goal is that this will allow parallel simplification to happen in SLiM. The code
253+
in `Species::SimplifyAllTreeSequences()` now implements the extra bookkeeping needed to
254+
maintain the shared table structure, so the design is ready to be parallelized when I
255+
return to the parallelization project.
256+
257+
Single-chromosome simulations still get saved out as a .trees tree sequence file. With
258+
multiple chromosomes, we now save out a "trees archive" with one .trees file per chromosome.
259+
This is not directly supported on the tskit side at the moment; you can just iterate over
260+
the files in the trees archive and process them as you wish in Python. An example of this
261+
is provided in the SimHumanity paper, https://doi.org/10.47248/hpgg2505040006.
262+
263+
There is some new top-level metadata associated with trees archives and multiple chromosomes.
264+
This is detailed in section 29.1 of the SLiM manual. In particular, there are new top-level
265+
keys `this_chromosome` and `chromosomes` that should be provided in every .trees file. The
266+
`chromosomes` key provides a table of all of the chromosomes involved in the trees archive.
267+
The `this_chromosome` key provides information about the particular chromosome represented
268+
by one particular .trees file in the trees archive.

0 commit comments

Comments
 (0)