Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
150 commits
Select commit Hold shift + click to select a range
1f39b97
Add bio-function-vep crate with lookup_variants table function
mwiewior Feb 22, 2026
12759aa
fix: Interval join parser now handles extra non-range predicates in j…
mwiewior Feb 22, 2026
0ecc6e9
feat: LEFT JOIN support in IntervalJoinExec + swap build/probe sides …
mwiewior Feb 22, 2026
85c50f3
fix: Downgrade DataFusion back to 50.3.0
mwiewior Feb 22, 2026
9ae456b
fix: Accept Utf8View columns in variation cache schema validation
mwiewior Feb 22, 2026
a388282
feat: Normalize chromosome "chr" prefix for VEP cache join
mwiewior Feb 22, 2026
492082d
feat: Default to all annotation columns in lookup_variants
mwiewior Feb 22, 2026
a609a2c
feat: Apply JoinFilter in IntervalJoinExec for non-range predicates
mwiewior Feb 22, 2026
e65d984
fix: Correct VEP lookup false positives from coordinate mismatch and …
mwiewior Feb 22, 2026
1a27a1c
perf: Skip redundant JoinFilter evaluation for pure range interval joins
mwiewior Feb 22, 2026
3840e79
feat: Deduplicate VEP cache entries with STRING_AGG for variation_name
mwiewior Feb 22, 2026
a61f9e6
fix: Move cache dedup GROUP BY after the join, not before
mwiewior Feb 22, 2026
a255805
perf: Remove blocking GROUP BY, restore streaming join pipeline
mwiewior Feb 22, 2026
c911d44
Optimize lookup_variants scan pruning and add partitioned LEFT join o…
mwiewior Feb 22, 2026
af4c6de
Use coordinate metadata for lookup overlap
mwiewior Feb 22, 2026
b472ed8
fix: support pipe-ALT matching and insertion-style cache joins
mwiewior Feb 23, 2026
bd8d360
feat: add colocated-id lookup mode and robust multi-alt matching
mwiewior Feb 23, 2026
4e0951a
feat: add non-consequence vep-existing lookup fallback mode
mwiewior Feb 23, 2026
b7394be
vep lookup: propagate somatic in fallback match modes
mwiewior Feb 23, 2026
866517e
feat: add fjall KV cache backend for VEP variant lookup
mwiewior Feb 23, 2026
025d037
feat: columnar v1 cache format with position index and fjall tuning APIs
mwiewior Feb 23, 2026
882de3c
feat: parallel cache loading via DataFusion partitions
mwiewior Feb 23, 2026
a9599a6
Fix kv-cache lookup type mismatch and v1 column index overflow
mwiewior Feb 23, 2026
3f1cf4f
Fix parallel cache loading hang: two-phase read/write approach
mwiewior Feb 24, 2026
e180a97
Optimize cache loading: parallel serialize + fjall batch writes
mwiewior Feb 24, 2026
8d1cc6e
Add RepartitionExec for even partition distribution after filtering
mwiewior Feb 24, 2026
595ea2b
Remove RepartitionExec from cache loader
mwiewior Feb 24, 2026
c1989bd
Stream cache loading batch-by-batch with position-order flushing
mwiewior Feb 24, 2026
996dc55
Improve loader flush concurrency without enforcing sort
mwiewior Feb 24, 2026
1b6703f
vep-cache: remove v0 paths and enforce v1 output typing
mwiewior Feb 24, 2026
0e683b0
Optimize KV lookup hot path and add profiling
mwiewior Feb 24, 2026
7b049b5
V5 zstd dictionary compression + hot-path optimizations
mwiewior Feb 24, 2026
8b532e7
Add bio.annotation config namespace and V5 zstd compression tuning
mwiewior Feb 25, 2026
7886b06
Consolidate vep-cache to single V0 format and add window_size config
mwiewior Feb 25, 2026
63724ca
Fix parallel dict training race and remove serial contig example
mwiewior Feb 25, 2026
992ff25
chore: update vep cache config and benchmark tooling
mwiewior Feb 25, 2026
c1f28fe
Fix Fjall/parquet lookup parity and document cache API
mwiewior Feb 26, 2026
ab88d47
Document lookup_variants match_mode behaviors
mwiewior Feb 26, 2026
a11f673
Handle zstd decompression growth for large cache entries
mwiewior Feb 26, 2026
9e751c3
Add Vortex cache integration and cache export tooling
mwiewior Feb 27, 2026
213cddd
Revert "Add Vortex cache integration and cache export tooling"
mwiewior Feb 27, 2026
75444ae
Rewrite VEP lookup from interval join to equi-join with extended_prob…
mwiewior Feb 28, 2026
6908018
Merge vep cache into bio-function-vep and fix workspace tests
mwiewior Mar 4, 2026
66a5438
Add annotate_vep golden benchmark and crate docs
mwiewior Mar 4, 2026
e548081
feat(vep): wire transcript/exon consequence baseline and update porti…
mwiewior Mar 4, 2026
085138f
feat(vep): extend consequence coverage with context-aware SO handlers
mwiewior Mar 4, 2026
6979148
vep: add translation-aware codon CSQ path and backend-consistent cont…
mwiewior Mar 4, 2026
5cba566
vep: extend codon-aware consequence classification for coding edits
mwiewior Mar 4, 2026
736f297
vep: add term-level golden parity fixtures and edge-case coverage
mwiewior Mar 5, 2026
694b935
vep: filter context loading by input chroms for lookup mode
mwiewior Mar 8, 2026
2fe3e79
vep: strip parent SO terms and filter non-VEP transcripts for golden …
mwiewior Mar 8, 2026
afb98e8
vep: fix frameshift/stop_gained ranking and indel splice-site normali…
mwiewior Mar 8, 2026
96c24ce
vep: remove false missense heuristic for transcripts without translat…
mwiewior Mar 8, 2026
9ddb255
vep: use coitrees for transcript overlap, fix CDS N-padding and cross…
mwiewior Mar 8, 2026
28f1d6e
vep: strip splice_donor_region with 5th_base and suppress stop_gained…
mwiewior Mar 8, 2026
6f8a82b
vep: fix insertion exon boundary overlap and trim VCF anchor in annot…
mwiewior Mar 9, 2026
62d1ef5
vep: detect stop_gained in CDS with pre-existing internal stops
mwiewior Mar 9, 2026
b27a37c
vep: fix insertion upstream/downstream detection at transcript bounda…
mwiewior Mar 9, 2026
2ee1737
vep: intron-based splice detection matching VEP Perl algorithm
mwiewior Mar 9, 2026
d74b730
chore: cargo fmt
mwiewior Mar 9, 2026
8aad9ce
vep: frameshift intron skip, VEP insertion coords, cross-transcript c…
mwiewior Mar 9, 2026
89ebc6b
vep: fix insertion polypyrimidine range using VEP dual-loop coord model
mwiewior Mar 9, 2026
2e1f721
vep: frameshift intron boundary splice_region, coding_sequence_varian…
mwiewior Mar 9, 2026
99115a3
vep: 100% golden parity — mature_miRNA_variant, complex indel, intron…
mwiewior Mar 9, 2026
c51e8f0
vep: filter Ensembl-only transcripts by default, add merged flag
mwiewior Mar 9, 2026
bccc660
vep: suppress non_coding_transcript_{exon_,}variant inside mature miR…
mwiewior Mar 9, 2026
af0131c
vep: populate all 29 CSQ fields — EXON, INTRON, cDNA/CDS/protein posi…
mwiewior Mar 9, 2026
9134b37
vep: VEP-style allele minimization, per-feature regulatory entries, f…
mwiewior Mar 9, 2026
48f9c72
vep: add unit tests for CSQ field helpers and regulatory per-feature …
mwiewior Mar 9, 2026
7d720c7
vep: fix BIOTYPE, FLAGS parsing, allele suffix trimming, indel codons
mwiewior Mar 9, 2026
26b86ea
vep: fix FLAGS nesting, MNV suffix trimming, indel codons/positions/a…
mwiewior Mar 10, 2026
b2b3250
vep: fix last 11 CSQ mismatches — amino acids preserved-AA + cDNA exo…
mwiewior Mar 10, 2026
002d80b
vep: add unit tests for data fixes — codons, amino acids, cDNA, FLAGS…
mwiewior Mar 10, 2026
99a12a9
vep: populate all 29 CSQ fields, add HGVS module, enrich cache structs
mwiewior Mar 10, 2026
4cedfc8
vep: replace raw_object_json parsing with promoted parquet columns
mwiewior Mar 10, 2026
5f969bd
vep: add 12 new CSQ fields (41 total) — VARIANT_CLASS, CANONICAL, TSL…
mwiewior Mar 10, 2026
b86d591
vep: add Batch 3 — allele frequencies, co-located variants, MAX_AF, P…
mwiewior Mar 10, 2026
4310b93
vep: fix AF precision, CSQ comma escaping, allele-aware co-located fa…
mwiewior Mar 10, 2026
77bfa81
vep: add position-based co-located variant aggregation for Existing_v…
mwiewior Mar 10, 2026
7f49e3d
vep: add VEP coordinate normalization UDFs for indel cache lookup
mwiewior Mar 11, 2026
e583c54
vep: fix cache-hit CSQ field alignment, allele-filtered co-located da…
mwiewior Mar 11, 2026
6d2e14e
vep: add unit tests for CDS/protein position ?-N format, frameshift c…
mwiewior Mar 11, 2026
3fbb6dc
vep: fix star allele filtering, regulatory insertion overlap, incompl…
mwiewior Mar 11, 2026
798778a
vep: fix CDS ?-N over-application, boundary insertion amino acids, in…
mwiewior Mar 11, 2026
ee3d794
vep: fix 5 remaining term-set discrepancies, add 8 unit tests
mwiewior Mar 11, 2026
8ea5268
vep: add VariantLookupExec hash+COITree dual-index join, 53/74 CSQ fi…
mwiewior Mar 11, 2026
d1e44dd
vep: piggybacked co-located collection via COITree sink, fix Int8/Int…
mwiewior Mar 11, 2026
f4d3908
vep: strip PPT with splice_acceptor, store orig_start in BuildRow
mwiewior Mar 11, 2026
bee18d6
vep: fix insertion PPT detection range, VEP trim_sequences co-located…
mwiewior Mar 12, 2026
72c841c
docs: update VEP README with chr1 benchmark results and usage guide
mwiewior Mar 12, 2026
565876c
fix: derive benchmark output filenames from source VCF stem
mwiewior Mar 12, 2026
a9a274f
Port VEP matched-alleles flow for existing variant parity
mwiewior Mar 12, 2026
3edec40
Refine VEP colocation parity diagnostics
mwiewior Mar 12, 2026
fdb4126
Advance strict VEP parity slice
mwiewior Mar 12, 2026
183daef
Improve VEP parity coverage
mwiewior Mar 12, 2026
193d50f
Add VEP shift-state plumbing for strict parity work
mwiewior Mar 12, 2026
4916c98
Match VEP variation-tabix candidate window
mwiewior Mar 12, 2026
08a1a53
Align VEP amino acid formatting with codon output
mwiewior Mar 12, 2026
b10ebf5
Add coverage for prepared feature indexes
mwiewior Mar 13, 2026
d749ac5
Align VEP intron and colocated output parity
mwiewior Mar 13, 2026
e453748
Add VEP traceability refs and helper coverage
mwiewior Mar 13, 2026
8d4db93
Finish strict chr1 VEP parity
mwiewior Mar 13, 2026
5262e57
Add missing regulatory traceability refs
mwiewior Mar 13, 2026
b9f6cf4
Add VEP distance and regulatory integration coverage
mwiewior Mar 13, 2026
d9c0c1d
Add HGVS harness and transcript numbering parity
mwiewior Mar 13, 2026
e10976e
Port HGVS transcript coordinate semantics
mwiewior Mar 13, 2026
92d11c2
Port transcript HGVS notation normalization
mwiewior Mar 13, 2026
f56368b
Align HGVSp output-layer semantics
mwiewior Mar 13, 2026
5ef269e
Wire HGVSp prediction-format output
mwiewior Mar 13, 2026
298c13c
Add HGVS shift flag and flag parity audit
mwiewior Mar 13, 2026
8668c9b
Wire HGVS coding mapper bounds
mwiewior Mar 14, 2026
054cca6
Improve HGVS parity: 860→187 HGVSc, 114→125 HGVSp mismatches
mwiewior Mar 14, 2026
67e315b
Fix HGVSp stop-loss/frameshift extension distance with 3' UTR transla…
mwiewior Mar 14, 2026
ff01878
Fix HGVSp protein dup detection order and stop-loss extension
mwiewior Mar 14, 2026
3c2c9e0
Fix genomic shift seq_strand to match VEP _genomic_shift semantics
mwiewior Mar 14, 2026
92894a1
Update STRICT_VEP_PARITY_PLAN with Phase 4 HGVS status and root cause…
mwiewior Mar 14, 2026
693b17c
Fix HGVSp codon-boundary insertion position to match VEP genomic2pep
mwiewior Mar 14, 2026
b1b7a57
Fix HGVSp protein dup detection with boundary insertion position alig…
mwiewior Mar 14, 2026
a67bfb1
Add cdna_seq fallback for 3' UTR extraction in stop-loss/frameshift HGVS
mwiewior Mar 14, 2026
7689401
Hydrate full spliced cDNA from FASTA for 3' UTR in stop-loss/frameshift
mwiewior Mar 14, 2026
8fd92e9
Update plan with non-merged HGVS benchmark: 6 mismatches out of 2,997…
mwiewior Mar 14, 2026
7bc7cae
Revert HGVSc ? suppression that caused regressions
mwiewior Mar 14, 2026
88b241b
Fix spurious HGVSc for boundary-spanning deletions and LoF extTer
mwiewior Mar 14, 2026
e75147e
Fix stop_loss_extra_aa to use ref_translation.len() matching VEP leng…
mwiewior Mar 14, 2026
e5346ac
Fix stop_loss_extra_aa ref_len to match VEP cached peptide (no termin…
mwiewior Mar 14, 2026
84a24e9
Port VEP genomic2pep for insertion protein position — 74/74 zero mism…
mwiewior Mar 14, 2026
543df5b
Add comprehensive HGVS unit tests for edge cases and VEP parity
mwiewior Mar 14, 2026
7225090
Optimize cDNA hydration: batch FASTA reads per transcript span
mwiewior Mar 14, 2026
7f7446b
Optimize cDNA hydration: filter by indel CDS overlap + stop codon vic…
mwiewior Mar 14, 2026
f4db20b
Skip genomic shift computation for SNVs/MNVs
mwiewior Mar 14, 2026
f3fda0e
Read promoted transcript columns instead of parsing raw_object_json
mwiewior Mar 14, 2026
e3e2d28
Skip raw_object_json entirely when promoted columns are available
mwiewior Mar 14, 2026
5bc6b15
Remove dead JSON extraction functions and tests
mwiewior Mar 14, 2026
23190ab
Update parity plan: Phase 4 HGVS achieved — 74/74 zero mismatches
mwiewior Mar 14, 2026
6de9329
Implement --everything flag with 80-field CSQ schema, wire APPRIS/SIF…
mwiewior Mar 15, 2026
b614bb9
Wire DOMAINS, fix MANE/HGVS_OFFSET/gnomAD sub-pops — 80/80 zero misma…
mwiewior Mar 15, 2026
da2c7aa
Add unit tests for Phase 5 --everything helpers and update parity plan
mwiewior Mar 15, 2026
6a2d3e6
Sliding window SIFT/PolyPhen loading, wire miRNA structure field
mwiewior Mar 15, 2026
03c000b
Lazy sliding window SIFT/PolyPhen loading — 79/80 zero mismatches on …
mwiewior Mar 15, 2026
720b9a9
Fix DOMAINS: coding gate + insertion boundary swap — 80/80 on full chr1
mwiewior Mar 15, 2026
0fd887c
Add missing example files referenced in Cargo.toml
mwiewior Mar 15, 2026
891fca4
Bump CI Rust toolchain to 1.91.0
mwiewior Mar 15, 2026
e50c96a
Fix elided lifetime warnings in superintervals for Rust 1.91
mwiewior Mar 15, 2026
07c2017
Fix all clippy warnings for Rust 1.91 CI
mwiewior Mar 15, 2026
e32db42
Fix review issues: optional failed column, fallback NULL, KV key coll…
mwiewior Mar 15, 2026
5801197
Fix coitrees metadata access to be portable across SIMD backends
mwiewior Mar 15, 2026
9a487b3
Fix rustfmt formatting for long GenericInterval lines
mwiewior Mar 15, 2026
dddb9e7
Wrap bare URLs in doc comments with angle brackets for rustdoc
mwiewior Mar 15, 2026
47bd42c
openspec for fjall
mwiewior Mar 16, 2026
fe2730d
Openspec for fjall
mwiewior Mar 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
- name: Setup Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: '1.88.0'
toolchain: '1.91.0'
components: 'clippy, rustfmt'

- name: Cache Cargo registry and build
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,7 @@ target
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
/vep-benchmark/data/output/
/vep-benchmark/data/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
/vep-benchmark/data/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi
vep-benchmark/data/*
Loading