Fix probability_rates mismatch with local_mut_regions in generate_var…#262
Merged
joshfactorial merged 2 commits intoApr 18, 2026
Merged
Conversation
…iants probability_rates was computed from the full mutation_rate_regions array before intersecting with the current block. intersect_regions can return a different number of entries (e.g., a sub-block falling entirely within one region returns 1 entry regardless of how many regions exist), so passing the original probability_rates to rng.choice caused a numpy ValueError when lengths differed. Fix: compute probability weights from local_mut_regions after intersection, replacing any None rates (no mutation BED supplied) with the model's average mutation rate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n tests The probability_rates fix (previous commit) was incomplete because intersect_regions had two related bugs that caused crashes regardless: 1. Middle regions dropped: the original algorithm only matched regions containing block_start or block_end, so any region fully inside the block was silently skipped. 2. Zero-length fallback: when block_end == last_region_end (always the case in practice after recalibrate_mutation_regions), the condition `block_end < region[1]` was False, so a zero-length entry (last_end, block_end, default) was appended, triggering "Invalid Mutation Region" → ValueError immediately after. Fix: rewrite intersect_regions using overlap arithmetic (max/min clipping) so every region that overlaps the block is included and the tail fallback only fires when the block genuinely extends past all regions. Also substitutes None rates in the pre-loop factors calculation (generate_variants.py line 81) to prevent TypeError when mutation_rate_regions contains None entries from recalibrate_mutation_regions. Adds tests/test_read_simulator/test_generate_variants.py with 15 regression tests covering: intersect_regions with 3/4 regions, block_end == last_region_end, partial overlaps, contiguity, full coverage, outside-all-regions fallback; generate_variants with 2/3/4 regions, None rates, correctness checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…iants
probability_rates was computed from the full mutation_rate_regions array before intersecting with the current block. intersect_regions can return a different number of entries (e.g., a sub-block falling entirely within one region returns 1 entry regardless of how many regions exist), so passing the original probability_rates to rng.choice caused a numpy ValueError when lengths differed.
Fix: compute probability weights from local_mut_regions after intersection, replacing any None rates (no mutation BED supplied) with the model's average mutation rate.