Add sample_nodes_by_ploidy function#3157
Conversation
7bedfae to
e8fa19a
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3157 +/- ##
=======================================
Coverage 89.58% 89.59%
=======================================
Files 28 28
Lines 31885 31895 +10
Branches 5855 5857 +2
=======================================
+ Hits 28565 28575 +10
Misses 1888 1888
Partials 1432 1432
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
jeromekelleher
left a comment
There was a problem hiding this comment.
Nice! Can we do a follow-up commit where we use the new functions in the vcf() method? (Just to see if we're implementing the right interface?)
| def sample_nodes_by_ploidy(self, ploidy): | ||
| """ | ||
| Returns an 2D array of node IDs, where each row has length `ploidy`. | ||
| This is useful when inidividuals are not defined in the tree sequence |
There was a problem hiding this comment.
| This is useful when inidividuals are not defined in the tree sequence | |
| This is useful when individuals are not defined in the tree sequence |
There was a problem hiding this comment.
I sprinkle these in to prove it's not AI. Fixed.
|
I've used the new functions in the VCF code - I've tried to keep the tests unchanged with a couple of exceptions:
|
|
Huh, |
|
Hmm, some of the error semantics in the existing code are a bit odd. I've replicated them for compatibility. For example an individual with no nodes is fine, but not if that individual was specifically asked for. This was only tested in the CLI code. |
|
I think it's OK to relax the input requirements a bit now. Making things that used to raise an error work as expected now seems like forward progress to me, and it seems unlikely we'll break much code by doing that (people don't tend to rely on errors much). We just need to document the changes. |
|
Ok, I've simplified things so that the error is only triggered when all the individuals specified have no nodes. |
|
What's the output for an individual with no nodes then, missing data? We should document these corner cases now, in the |
|
I've added some changes to the docs and a changelog. |
jeromekelleher
left a comment
There was a problem hiding this comment.
I don't think we've got the semantics quite right here - it would be surprising if I specified individuals=[0, 1] and I got back a VCF with zero samples in it. In this case, I think we should raise an error.
| that is a sample and another that is not) in the data model will | ||
| result in an error by default. However, such individuals can be | ||
| excluded using the ``individuals`` argument. | ||
| using the ``individuals`` argument. And individual specified that |
|
|
||
| **Changes** | ||
|
|
||
| - Allow mixed sample and non-sample nodes to be output by ```write_vcf``, |
There was a problem hiding this comment.
| - Allow mixed sample and non-sample nodes to be output by ```write_vcf``, | |
| - Allow mixed sample and non-sample nodes to be output by ```write_vcf``. |
| and for individuals without nodes to be specified for output, although | ||
| they will not appear in the VCF. |
There was a problem hiding this comment.
| and for individuals without nodes to be specified for output, although | |
| they will not appear in the VCF. | |
| Individuals without nodes no longer raise an error and are instead ignored, | |
| except in the case where they are explicitly specified in the individuals argument. |
7d64905 to
754a765
Compare
|
Fixed up. |
| result in an error by default. However, such individuals can be | ||
| excluded using the ``individuals`` argument. | ||
| using the ``individuals`` argument. An individual specified that | ||
| is not associated with any nodes will be ignored and not included in the |
There was a problem hiding this comment.
I thought it was - I added a test in 266f0cb to check.
|
I'm confused. I thought the semantics was:
That's what the release notes say to me, but the docstring is not. |
|
The code and docstring are "An individual specified that is not associated with any nodes will be ignored and not included in the output." The changelog before revisions was "allow individuals without nodes to be specified for output, although they will not appear in the VCF" which is the same thing. I didn't read your revision close enough to realise it meant the opposite. Happy to change the code if you think the erroring on nodeless, specified, individual is better. |
|
Yes please |
| This is useful when individuals are not defined in the tree sequence | ||
| so `TreeSequence.individuals_nodes` cannot be used. The samples are | ||
| placed in the array in the order which they are found in the node | ||
| table. Ploidy must be a multiple of the number of sample nodes. |
There was a problem hiding this comment.
| table. Ploidy must be a multiple of the number of sample nodes. | |
| table. The number of sample nodes must be a multiple of ploidy. |
9a8eaec to
7bcf8a6
Compare
|
Ok, have tried to be really careful this time! It is now an error again to specify nodeless individuals. I've also realised that it needs to be stated in the changelog that it is now allowed to have sample nodes that have no individual. |
jeromekelleher
left a comment
There was a problem hiding this comment.
I think the semantics are still pretty weird - two of the tests look like the same thing to me, with different errors. See suggested new semantics.
| with pytest.raises(ValueError, match="is not associated with any nodes"): | ||
| ts.as_vcf( | ||
| allow_position_zero=True, | ||
| individuals=list(range(ts.num_individuals)) + [ts.num_individuals - 1], |
There was a problem hiding this comment.
The last bit is redundant isn't it?
| "Sample nodes must either all be associated with individuals " | ||
| "or not associated with any individuals" | ||
| ) | ||
| # If there are no individuals, or all the individuals are not associated with |
There was a problem hiding this comment.
I think we should make this simpler - just
if ploidy is not None and individuals is not None:
raise ValueError("...")
if individuals is None:
if ts.num_individuals > 0:
individuals_nodes = ts.individuals_nodes
else:
ploidy = 1 if ploidy is None else ploidy
individuals_nodes = ts.sample_nodes_by_ploidy(ploidy)
else:
individuals_nodes = ts.individuals_nodes[individuals]
|
As discussed, I've removed the VCF code changes here to reduce this to solely the |
No description provided.