Contributor Guide

This repository compares Ensembl VEP output against local-checkout vepyr output using Python, Polars, and semantic CSQ normalization.

Mandatory Working Rules

Read .agent/CONTINUITY.md at the start of each turn.
Keep the repo container-first. Do not install system packages on the host unless explicitly requested.
Default runtime path is docker compose plus bind mounts for inputs, caches, and output directories.
Do not mutate a vepyr checkout automatically. Treat it as an input mount.
Use semantic comparison of parsed INFO/CSQ as the canonical equality rule. Raw CSQ strings are supporting evidence only.
Preserve deterministic output structure under the selected run directory.

Run these from repo root when code changes:

uv run ruff format .
uv run ruff check .
uv run pytest

If runtime code changes substantially, also run:

uv run python -m vepyr_diffly.cli list-presets

Prefer small deterministic fixture VCFs for unit tests.
Keep one integration-style sampled workflow test that exercises artifact generation end to end without requiring external VEP infrastructure.
Validate normalized table schemas explicitly.

Console output must remain concise and operator-friendly.
File outputs must include machine-readable summary plus inspectable mismatch tables.
When introducing a new comparison rule, add fixture coverage that demonstrates the failure mode it prevents.

After every smoke test or benchmark-style non-golden run, clean its artifacts from runs/ once the result has been captured.
Keep long-lived artifacts in runs/ only for golden tests or explicitly requested retained runs.