Feat:/simulate after local mrca#2396
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2396 +/- ##
=======================================
Coverage 91.21% 91.22%
=======================================
Files 20 20
Lines 11831 11847 +16
Branches 2296 2300 +4
=======================================
+ Hits 10792 10807 +15
Misses 569 569
- Partials 470 471 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Implementation looks good, it seems unlikely that it's a memory leak to me. Let's look at the size of the output tables I think. Probably this will be a function of your value of You need to run the C code through clang-format as a minimum to get CI passing. |
|
Note specific requirements for clang-format: https://tskit.dev/tskit/docs/stable/development.html#sec-development-c-requirements |
|
It is ready for review now. Please check:
|
jeromekelleher
left a comment
There was a problem hiding this comment.
Looks good!
Some minor comment above.
Also needed:
- Some C unit tests. We should be able to add a test or two to test_ancestry.c to cover the functionality. Nothing elaborate, just making sure the code is covered by valgrind.
- We also need to cover the other models. This will require updating msp_merge_ancestors in a similar way. Best to start with algorithms.py on this.
| for (avl_node = self->non_empty_populations.head; avl_node != NULL; | ||
| avl_node = avl_node->next) { | ||
| pop_id = (tsk_id_t)(intptr_t) avl_node->item; | ||
| pop_id = (tsk_id_t) (intptr_t) avl_node->item; |
There was a problem hiding this comment.
Seems to be a bunch of formatting noise here - which version of clang-format are you using?
There was a problem hiding this comment.
version 18.. I will remove this noise
|
I'm going to mute this thread @hossam26644 - please ping me when you'd like a review |
|
@jeromekelleher check now please, and please check the implementation of merge_ancestors. It works and passes the tests that I put.. but I am not sure if I am missing something |
jeromekelleher
left a comment
There was a problem hiding this comment.
Generally looking good. Needs more models though, in particular the pedigree model. You need to construct a pedigree in which full coalescence will happen and verify that you still go through the rest of the nodes (you'll need store_unary as well I guess, but that's true anyway?)
|
|
||
| if stop_at_local_mrca is None: | ||
| stop_at_local_mrca = True | ||
| if not stop_at_local_mrca: # when set to False |
| CU_ASSERT_EQUAL(ret, 0); | ||
|
|
||
| ret = msp_run(&msp, t, ULONG_MAX); | ||
| CU_ASSERT_EQUAL(ret, 2); // exit on max time |
There was a problem hiding this comment.
Use the symbolic constant, no need for comment then
26f42eb to
640ff6e
Compare
|
@jeromekelleher
One more thing: I added a condition to stop simulations when there are no events to happen and stop_at_local_mrca is False, i.e., no coalescence, GC, recombination, or migration events; it used to give an error before, since simulations should stop one step before that. This error was raised for smc models only, while for Hudson's model there are always recombination then coalescence events (unless the seq length is 1 base, then it behaves like the smc models). These recombination and coalescence events happen even when the recombination rate is zero (for Hudson). Now, if we can put a condition when simulations get to this infinite cycle of recombination and coalescence, we won't have to specify an end time; simulations can run until more time means nothing (no unary nodes no recombination no coalescence). I don't understand fully why we see this when recombination rate == zero, so maybe I am missing important details; but if think it will be useful, I can remove the requirement for an end time for simulations above the root. |
84de697 to
f9af0ec
Compare
jeromekelleher
left a comment
There was a problem hiding this comment.
The infinite waiting time problem is in general hard (there's some references to it in the documentation if you look for it). It would be great if we could keep simulating until there are no events for the SMC, but we'll have to be careful not to introduce any unexpected behaviours otherwise.
Re the pedigree model - I don't see why it's not relevant? We can construct a pedigree in which coalescence happens very quickly, but we still want to track how the ancestral material passes through the ancestors. What happens at the moment when partial or full coalescence happens within the pedigree?
|
@jeromekelleher we can go for another round of review:
|
jeromekelleher
left a comment
There was a problem hiding this comment.
I'm not following some of the logic, but I think I just need to check it out an play with it myself to get my head around it. I think we should just clean this much up, squash the commits and merge. We can register a few follow-ups issues to close before release.
| mig_dest = k | ||
| min_time = min(t_re, t_ca, t_gcin, t_gc_left, t_mig) | ||
| assert min_time != infinity | ||
| assert (min_time != infinity) or ( |
There was a problem hiding this comment.
Comparing with the False singleton is an antipattern. Use not self.stop_at_local_mrca
| else: | ||
| self.t += min_time | ||
| if min_time == t_re: | ||
| if (min_time == infinity) and self.stop_at_local_mrca is False: |
There was a problem hiding this comment.
| if (min_time == infinity) and self.stop_at_local_mrca is False: | |
| if (not self.stop_at_local_mrca) and (min_time == infinity): |
Put the constant first so it gets short-circuited. (Not that it matters here, but just as a general principle)
| """ | ||
|
|
||
|
|
||
| class PotentialInfiniteSimulationhWarning(UserWarning): |
There was a problem hiding this comment.
| class PotentialInfiniteSimulationhWarning(UserWarning): | |
| class PotentialInfiniteSimulationWarning(UserWarning): |
| out: | ||
| msp_safe_free(parents); | ||
| msp_safe_free(segment_mem); | ||
| if (coalescence_occurred == false && recombination_occurred == false && ret == 0) { |
There was a problem hiding this comment.
These out blocks are for error condition and cleanup handling only - move this block up to before the goto. Also you don't need to test for ret == 0 then.
Also, change to if ((!coalescence_occurred)) && (!recombination_occurred))
| goto out; | ||
| } | ||
| ret = msp_apply_demographic_events(self, self->next_demographic_event->time); | ||
| migration_occurred |
There was a problem hiding this comment.
Same point as before about comments on the same line
| self->time = cur_time; | ||
| ret = msp_dtwf_generation(self); | ||
| if (ret != 0) { | ||
| if (ret != 0 && ret != MSP_DTWF_DID_NOTHING) { |
There was a problem hiding this comment.
Better to change the condition to if (ret < 0) .
| self->next_sampling_event++; | ||
| } | ||
|
|
||
| if (ret == MSP_DTWF_DID_NOTHING && migration_occurred == false |
There was a problem hiding this comment.
Same style points as above
| msp_safe_free(node_trees); | ||
| msp_safe_free(n); | ||
| msp_safe_free(mig_tmp); | ||
| if (ret == MSP_DTWF_DID_NOTHING) { |
There was a problem hiding this comment.
Ditto on putting non cleanup logic in out block
2a250e9 to
ec37d9f
Compare
|
@hossam26644 just FYI - I started looking into this in detail and it seemed easier to me to fix the stopping condition than to explain why we needed the max_time thing. I've fixed it for Hudson and am filling out the rest of the details now. |
|
Hmm - looks like there's a basic flaw in the logic for the multi-merger case, which wasn't really being tested. I'll need to spend more time on this to get to the bottom of it. |
|
I can invest some time into it. Actually, I would like to. But I am prioritising the paper draft now. If this is not urgent, maybe it can sit for a while until after the new year. |
|
Leave it with me - I'm going to see how far I can get with it today. There will be some follow-up work to do all right though, don't worry! |
be1b404 to
2ef10be
Compare
|
OK, I think this is basically done. Can you review please @hossam26644? I'm going to log some follow-up issues. |
|
Something happened to the branch here @hossam26644 - it needs to be rebased |
c4f251f to
f1f7635
Compare
hossam26644
left a comment
There was a problem hiding this comment.
I think it is neat
| self.verify_segments() | ||
| if self.model not in ["fixed_pedigree", "dtwf"]: | ||
| # The fixed_pedigree model doesn't maintain a bunch of stuff. | ||
| # It would probably be simpler if it did. |
There was a problem hiding this comment.
why wasdtwf excluded from verify overlaps? why is it included in the new version?
There was a problem hiding this comment.
Good question - it was incorrectly omitted before, and it means that we didn't pick up the basic logic error in the multi-merger case (subtracting min_overlap, instead of len(h)). We also weren't testing this code path at all.
| ```{code-cell} | ||
| ts = msprime.sim_ancestry(2, sequence_length=10, recombination_rate=0.1, random_seed=42) | ||
| SVG(ts.draw_svg(y_axis=True, y_ticks=[0, 1, 2, 3, 4, 5])) | ||
| SVG(ts.draw_svg(y_axis=True, y_ticks=[0, 1, 2, 3, 4])) |
There was a problem hiding this comment.
should this be in a separate commit or is fine here?
There was a problem hiding this comment.
Yes, good point, this should be a separate commit.
| with flags should be more intuitive now that we are relying on the | ||
| These flags have been deprecated in favour of using | ||
| :class:`NodeType<NodeType>`. This is not a breaking change. The | ||
| constant associated with each flag remains the same. However, working |
There was a problem hiding this comment.
there is a bunch of formatting changes in this file and nothing more, do you want to include them in the PR?
There was a problem hiding this comment.
Ah yes, good point. It's just whitespace trimming which my editor does automatically and should probably be enforced, so I think it's fine to include.
jeromekelleher
left a comment
There was a problem hiding this comment.
OK, thanks for the review. I'll push an update and merge.
| with flags should be more intuitive now that we are relying on the | ||
| These flags have been deprecated in favour of using | ||
| :class:`NodeType<NodeType>`. This is not a breaking change. The | ||
| constant associated with each flag remains the same. However, working |
There was a problem hiding this comment.
Ah yes, good point. It's just whitespace trimming which my editor does automatically and should probably be enforced, so I think it's fine to include.
| self.verify_segments() | ||
| if self.model not in ["fixed_pedigree", "dtwf"]: | ||
| # The fixed_pedigree model doesn't maintain a bunch of stuff. | ||
| # It would probably be simpler if it did. |
There was a problem hiding this comment.
Good question - it was incorrectly omitted before, and it means that we didn't pick up the basic logic error in the multi-merger case (subtracting min_overlap, instead of len(h)). We also weren't testing this code path at all.
| ```{code-cell} | ||
| ts = msprime.sim_ancestry(2, sequence_length=10, recombination_rate=0.1, random_seed=42) | ||
| SVG(ts.draw_svg(y_axis=True, y_ticks=[0, 1, 2, 3, 4, 5])) | ||
| SVG(ts.draw_svg(y_axis=True, y_ticks=[0, 1, 2, 3, 4])) |
There was a problem hiding this comment.
Yes, good point, this should be a separate commit.
f1f7635 to
88fc6b9
Compare
|
Going to let #2432 go through first, then resolve merge conflicts |
88fc6b9 to
19dafa7
Compare
Here you go: @jeromekelleher
Still missing some tests, and I want to change the flag name of the python algorithm to match the msprime implementation.
As a reminder, this implementation consumes more memory than the when stopping at the youngest root. It could be the node table getting much larger, and could be a memory leak.