Skip to content

Fix probability_rates mismatch with local_mut_regions in generate_var…#262

Merged
joshfactorial merged 2 commits into
developfrom
fix/generate-variants-probability-rates-mismatch
Apr 18, 2026
Merged

Fix probability_rates mismatch with local_mut_regions in generate_var…#262
joshfactorial merged 2 commits into
developfrom
fix/generate-variants-probability-rates-mismatch

Conversation

@joshfactorial
Copy link
Copy Markdown
Collaborator

…iants

probability_rates was computed from the full mutation_rate_regions array before intersecting with the current block. intersect_regions can return a different number of entries (e.g., a sub-block falling entirely within one region returns 1 entry regardless of how many regions exist), so passing the original probability_rates to rng.choice caused a numpy ValueError when lengths differed.

Fix: compute probability weights from local_mut_regions after intersection, replacing any None rates (no mutation BED supplied) with the model's average mutation rate.

joshfactorial and others added 2 commits April 4, 2026 20:07
…iants

probability_rates was computed from the full mutation_rate_regions array
before intersecting with the current block. intersect_regions can return
a different number of entries (e.g., a sub-block falling entirely within
one region returns 1 entry regardless of how many regions exist), so
passing the original probability_rates to rng.choice caused a numpy
ValueError when lengths differed.

Fix: compute probability weights from local_mut_regions after
intersection, replacing any None rates (no mutation BED supplied) with
the model's average mutation rate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n tests

The probability_rates fix (previous commit) was incomplete because
intersect_regions had two related bugs that caused crashes regardless:

1. Middle regions dropped: the original algorithm only matched regions
   containing block_start or block_end, so any region fully inside the
   block was silently skipped.

2. Zero-length fallback: when block_end == last_region_end (always the
   case in practice after recalibrate_mutation_regions), the condition
   `block_end < region[1]` was False, so a zero-length entry
   (last_end, block_end, default) was appended, triggering
   "Invalid Mutation Region" → ValueError immediately after.

Fix: rewrite intersect_regions using overlap arithmetic (max/min clipping)
so every region that overlaps the block is included and the tail fallback
only fires when the block genuinely extends past all regions.

Also substitutes None rates in the pre-loop factors calculation
(generate_variants.py line 81) to prevent TypeError when mutation_rate_regions
contains None entries from recalibrate_mutation_regions.

Adds tests/test_read_simulator/test_generate_variants.py with 15 regression
tests covering: intersect_regions with 3/4 regions, block_end == last_region_end,
partial overlaps, contiguity, full coverage, outside-all-regions fallback;
generate_variants with 2/3/4 regions, None rates, correctness checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@joshfactorial joshfactorial merged commit 3d4a7c8 into develop Apr 18, 2026
1 check passed
@joshfactorial joshfactorial deleted the fix/generate-variants-probability-rates-mismatch branch April 18, 2026 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant