This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
ccxt_extract is an Elixir library that serializes everything the CCXT JS library knows about 110+ cryptocurrency exchanges into language-agnostic JSON, so consumers in any language (Elixir, Rust, Go, Python) can call exchanges without walking AST. See README.md for user-facing setup and ROADMAP.md for the active work plan — ROADMAP.md is generated by rmap (the roadmap CLI) from roadmap/tasks.toml; edit the TOML, not the Markdown (see § Documentation invariants).
This repo is in greenfield mode until further notice. No backward compatibility. Removing complexity is the priority. When in doubt: delete the old path, don't wrap it.
- Old schema versions are deleted, not retained alongside the new one.
- Do not add compatibility shims, migration aliases, dual-version dispatch, or "one release" retention windows without explicit user direction.
- Breaking changes do not require a deprecation period — the sole consumer
(
../ccxt_client/) takes one coordinated migration.
v4 schema cut: ✅ shipped 2026-05-18. The active milestone is now feature_complete — run rmap milestones for the live count. Definition: every pending task in the contract-defining phases (9 override infra, 11 request, 12 response, 13 error, 15 WS, 16 currency) + the method-descriptors bundle + the infrastructure tasks that gate them.
Why this milestone is the goal. Closing feature_complete ends ccxt_extract's mission for ../ccxt_client/. After it lands, ccxt_client can freeze the v4 JSON it consumes and treat further ccxt_extract releases as optional regenerations against new CCXT versions, not a live dependency. The architecture supports this — ccxt_extract is a generator, not an ingestion runtime; the JSON it emits is a static, schema-pinned contract that ccxt_client can fork, vendor, or own outright once the milestone closes.
How to pick the next task. Use rmap next --milestone feature_complete. It filters dep-blocked tasks, so the visible list is "pickable now" not "the whole milestone" — rmap list --status pending --milestone feature_complete is the full inventory, and rmap next --marker parallel surfaces the worktree-dispatchable subset. Do not name specific task numbers, Eff scores, or a "current queue" in this file — rmap already computes what has shipped, what is unblocked, and what is pickable; any snapshot here rots on the next merge. Ask rmap at runtime.
Scope discipline. feature_complete's membership is exclusion-driven — blocked / superseded / pure-hygiene tasks stay out by design. Push back on proposals to add new contract-surface tasks unless a priority consumer (typically ../ccxt_client/) has surfaced a concrete need; the milestone is a ceiling, not a backlog. Infrastructure / hygiene tasks join only when they gate an existing milestone task (the pattern Task 144 set, where downstream tasks gained a depends_on edge + an enforcement acceptance criterion).
Grouped per ~/.claude/setup-guide.md (Elixir Library + Volt + Reach template). Each include earns its token cost — niche Hex packages and behavioral rules the model can't recall reliably from training.
@/.claude/includes/across-instances.md
@/.claude/includes/critical-rules.md
@~/.claude/includes/worktree-workflow.md
@/.claude/includes/task-prioritization.md
@/.claude/includes/task-writing.md
@/.claude/includes/rmap.md
@/.claude/includes/workflow-philosophy.md
@~/.claude/includes/web-command.md
@/.claude/includes/elixir-setup.md
@/.claude/includes/ex-unit-json.md
@/.claude/includes/dialyzer-json.md
@/.claude/includes/code-style.md
@/.claude/includes/development-commands.md
@/.claude/includes/development-philosophy.md
@/.claude/includes/elixir-volt.md
@/.claude/includes/oxc.md
@/.claude/includes/quickbeam.md
@/.claude/includes/npm-ci-verify.md
@~/.claude/includes/reach.md
Project-scope plugins (.claude/settings.json, committed — visible to anyone cloning the repo):
| Plugin | Purpose |
|---|---|
elixir@deltahedge |
Elixir skills + agents (hex-docs-search, integration-testing, dialyzer-json, ex-unit-json, usage-rules, npm-* suite, reach, etc.) |
elixir-workflows@deltahedge |
Mix / ExUnit / dev workflow commands; workflow-generator skill |
Universal-core plugins (code-simplifier, feature-dev, claude-md-management, hookify, remember, git-commit, staged-review, task-driver, cloud-delegation, dev-lifecycle, codex) load at user scope and apply here implicitly — don't re-declare. New stack-specific plugins go in .claude/settings.json. See ~/.claude/plugin-catalog.md for the picker.
MCP servers (.mcp.json, committed):
| Server | Endpoint | Purpose |
|---|---|---|
tidewave |
http://localhost:4002/tidewave/mcp |
Runtime exploration via mcp__tidewave__* — project_eval, get_logs, get_source_location, get_docs, search_package_docs. Started by mix tidewave (or iex -S mix tidewave). |
Tidewave port for this repo is 4002 (see ~/.claude/tidewave-ports.md registry). Restart Claude Code if .mcp.json changes.
Branch-worthy work lives in a git worktree at ~/_DATA/worktrees/ccxt_extract/<id>/, not on a branch in the main checkout (~/_DATA/code/ccxt_extract/). The worktree IS the scope authorization for git commit / git push / gh pr create on that branch — full rules in ~/.claude/includes/worktree-workflow.md.
This repo's tracking-ID convention: <id> is the ROADMAP task number when the work tracks a roadmap entry (e.g. task-105, task-119), or a short feature name for unscheduled work (e.g. fix-aggregate-merge). With cloud-agent delegation retired (see ROADMAP.md § Notes), Linear issue IDs are no longer in scope as worktree IDs.
Cleanup: after PR merge or branch deletion, run git worktree remove ~/_DATA/worktrees/ccxt_extract/<id> and git worktree prune in the same session — completion of a task includes worktree teardown.
Corpus in fresh worktrees: the gitignored extraction corpus (priv/output/, priv/discoveries/<not class_hierarchy.json>, priv/ccxt/, priv/ccxt_bundle.js) is filesystem-isolated per worktree — git only materializes tracked content when adding a worktree. Run mix ccxt_extract.link_corpus to symlink the existing corpus from the main checkout instead of regenerating via mix ccxt_extract.update. Run mix ccxt_extract.unlink_corpus before regenerating in-worktree — directory symlinks are write-transparent, so corpus regeneration without unlinking writes back into the main checkout.
[P] parallel marker in ROADMAP.md — independent tasks tagged [P] are explicitly safe to dispatch into separate worktrees concurrently. They predate cloud delegation and are unaffected by the [CSR] retirement.
Every output field is produced by exactly one of two complementary passes. Understanding which pass owns which field is critical before editing.
| Tool | Input | Output scope | Speed | Used in |
|---|---|---|---|---|
| OXC (Rust NIF) | CCXT TS source at priv/ccxt/ts/src/ |
Structural — method ASTs, class hierarchy, type annotations, section membership, sign-method bodies | ~43ms per file | oxc_extractor, oxc_batch, method_ast, parse_methods (discovery files only — not emitted to per-exchange JSON since schema 3.0.0 / Task 117; Phase 12 consumes from priv/discoveries/parse_methods.json), sign_method, sign_recipe (scaffold, Task 64 — populated by Tasks 65–69), handle_errors, throw_dispatches, error_class_hierarchy (Task 87 — corpus-global tree from errorHierarchy.ts, copied into every per-exchange JSON), interface_signatures, request_defaults, ws_methods (discovery files only — same Phase 15 treatment), ws_heartbeat (Task 93 — emits the per-exchange websocket.heartbeat section), ws_auth (Task 92 — emits the per-exchange websocket.auth section) |
| QuickBEAM (Zig NIF) | priv/ccxt_bundle.js (the browser bundle copied during ccxt_extract.setup) |
Resolved runtime — full describe() after inheritance, URL templates, rate limits, nonce defaults, request headers |
~13s for all exchanges | quickbeam_runtime, describe, load_markets, url_templates, signing_fixtures, request_headers |
Neither tool alone is sufficient. contract_test cross-validates the two (e.g., every method named in resolved describe().api must exist in the parsed class AST or an ancestor). Divergence means a silent regression — fix the extractor, not the test.
Raw extractors write to priv/discoveries/*.json (and subdirs like describe/<id>.json, load_markets/<id>.json). CcxtExtract.Pipeline then assembles those into per-exchange files under priv/output/<id>.json validated against priv/schema/exchange_v4.json. The v4 top-level groups are endpoints, auth, errors, rate_limits, normalization, websocket, markets, testnet, and raw (consumer-shaped, not producer-shaped — see SCHEMA.md for the full path-migration table). Provenance is explicit — every emitted JSON carries a flat top-level _provenance map keying each section (by RFC 6901 JSON Pointer) to raw/derived/override. Override reasons live in the priv/overrides/<id>.json entry, not inline in the emitted payload.
Both paths are gitignored derived state. priv/output/ and priv/discoveries/* are not tracked in git — they're regenerated per CCXT release and would otherwise bloat the repo (~1GB of JSON per full-universe run, already accumulated 827MB in .git). The one exception is priv/discoveries/class_hierarchy.json, which lib/ccxt_extract/tiers.ex reads at compile time via @external_resource and must remain committed. Fresh clones materialize the rest via mix setup; external consumers via mix ccxt_extract.update --output DIR.
The stages are, in order:
mix ccxt_extract.exchanges— discover the universe of exchange IDs.- Per-extractor mix tasks — each writes a slice to
priv/discoveries/. mix ccxt_extract.pipeline— merges slices intopriv/output/<id>.json.mix ccxt_extract.update— orchestrator: runs 1+2+3 as one scoped transaction.mix ccxt_extract.validate— JSV-validates every output against the schema.mix ccxt_extract.contract_test— runs cross-extractor invariants.
Extraction is byte-deterministic for a fixed CCXT version + bundle + scope: two consecutive runs of the same scope produce byte-identical output (Task 114). Two mechanisms enforce this:
mix ccxt_extract.determinism_checkruns an extraction task twice into isolated tmp dirs and byte-diffs every.jsonfile. It strips volatile timestamp keys and re-encodes both sides through sorted-key canonical JSON, so map-iteration order and wall-clock stamps can't masquerade as drift. Exit non-zero on any divergence. Run it after touching any extractor or the pipeline.Pipeline.check_version_drift!/1runs at the top ofPipeline.extract/1and aborts loudly whenpriv/ccxtHEAD orpriv/ccxt_bundle.jsno longer matches the baseline inpriv/ccxt_version.json— silent upstream drift can't regenerate the corpus against a different CCXT without a signal. Bypass with--allow-version-driftwhen the drift is intentional (a deliberate CCXT bump).
AstNormalize.to_encodable/1 deep-sorts object keys before encoding — the load-bearing fix that made determinism achievable at the source. The two remaining workarounds (timestamp-key stripping in the checker; no frozen-clock path through Pattern B writers) are tracked as Task 137.
Every per-exchange extraction task takes the same flag set, parsed by CcxtExtract.Scope:
--tier1 --tier2 --tier3 --dex --all --exchange ID[,ID2] [--exchange ID3 ...]
--exchange is repeatable AND comma-split; unknown IDs abort with fuzzy suggestions via String.jaro_distance/2. Tier flags expand to the whole family (root + inheriting variants/aliases) by composing Tiers.members_for_tier/1 with class_hierarchy.json. Default = full universe.
Corpus-level tasks (setup, exchanges, base_methods, top-level validate) run unscoped by design. classes.ex is a documented exception — flags only stamp tier_scope; the actual hierarchy load is always full-universe because family inheritance is load-bearing.
AggregateWriter merges scoped runs with existing on-disk aggregates and recomputes envelope totals from the final merged entries, so successive scoped runs accumulate without drift. Do not replace its merge logic with an overwrite.
CcxtExtract.Paths splits read sites from write sites:
Paths.priv/1,Paths.priv_dir/0,Paths.discoveries/0,Paths.bundle/0,Paths.version_file/0,Paths.ts_src/0— reads, honor:priv_dir_override.Paths.out/1,Paths.out_priv_dir/0,Paths.out_bundle/0,Paths.out_version_file/0— writes, honor:priv_write_overridefirst, then fall through to:priv_dir_override.
Integration tests set the narrower :priv_write_override via CcxtExtract.PrivWriteCase (test/support/priv_write_case.ex) to redirect writes into a tmp dir while reads still hit the committed corpus. PrivWriteCase enforces async: false because the override is a VM-global app env. When adding a new write site, use Paths.out(...) / out_bundle/0 / out_version_file/0, not the read helpers. External consumers running mix ccxt_extract.update --output DIR get the broader :priv_dir_override so everything (reads + writes) lands under DIR.
The split is enforced by the paths_rw_split corpus-level invariant in mix ccxt_extract.contract_test — Reach.Project.taint_analysis/2 over lib/**/*.ex with a same-file filter. New direct-call leaks like File.write!(Paths.priv(...)) surface at the next contract-test run.
mix ccxt_extract.updateandmix ccxt_extract.pipelineabort ifpriv/output/orpriv/discoveries/has uncommitted changes. Bypass with--force. The rail is skipped automatically under--output DIR(external target dirs aren't expected to be git repos).- Safety-rail paths are computed through
Paths.out(...)(not a compile-time@attribute) so the test overrides correctly isolate them. - Post-untrack note:
priv/output/andpriv/discoveries/*(exceptclass_hierarchy.json) are gitignored. The rail is effectively inert for ignored paths —git statusdoesn't see them, so no abort fires on regeneration. This is expected, not a regression. The rail still protectspriv/discoveries/class_hierarchy.json, which is compile-time load-bearing and worth a manual pause when it drifts.
Raw extraction runs for every CCXT exchange regardless of tier. Derivation effort (signing recipes, fee schedules, error handlers) is scoped to the 7-exchange option-seller set — Tier 1 (binance + its binanceusdm variant, bybit, okx, deribit) plus priority DEX (hyperliquid, derive). Tier 2 is intentionally empty (see the frozen-curation note below). Tier 3 and unclassified exchanges receive null + reason for derived fields until a priority consumer surfaces a concrete need. Roots are hand-curated in priv/priority_tiers.json; variants inherit their root's tier via class_hierarchy.json. A tier task that exists only to handle Tier-3 quirks belongs in "Superseded / Deferred", not active phases.
The derivation scope is a movable slider. Re-add an exchange — or a matching family group — by lifting its root back into tier1 / tier2 / dex in priv/priority_tiers.json, then running a scoped mix ccxt_extract.update --exchange <id>. Raw discoveries are already on disk universe-wide, so a re-add only unlocks derivation — no catch-up extraction. tier2 is kept as the empty key precisely as the staging bucket for these re-additions.
Pre-narrow tier curation (frozen 2026-05-20). Before the narrowing to the 7-exchange option-seller set, the tiers were:
| Tier | Roots |
|---|---|
| tier1 | binance, bybit, okx, deribit, coinbaseexchange |
| tier2 | kraken, kucoin, gate, htx, bitmex, bitfinex |
| tier3 | bitget, bingx, bitmart, coinex, cryptocom, mexc, hashkey, woo, dydx, paradex, apex, woofipro, modetrade |
| dex | hyperliquid, aster, lighter, derive |
The narrowing demoted coinbaseexchange (tier1→tier3), all six tier2 roots (→tier3), and aster + lighter (dex→tier3) when the sole consumer (../ccxt_client/) scoped to 7 exchanges, retiring the speculative market-maker / options framing that justified the broader set. This table is the reference for re-adding exchanges in matching family groups.
priv/fixtures/signing/<id>.json is the handoff between CCXT truth and any port. They're generated by calling CCXT's real exchange.sign() under frozen credentials, timestamps, and nonces (Date.now() = 1700000000000, etc.). Output is byte-identical across runs except generated_at. Consumers replay frozen inputs against their own signing code and assert byte-equal url/method/headers/body. Case preservation inside input/output (apiKey, X-BAPI-SIGN) IS the wire contract — do not camelize/snake-case at extraction time.
priv/overrides/<id>.json uses RFC 6901 JSON-Pointer paths with a value payload, required reason, and verified_against/unverified flags. CcxtExtract.OverrideRegistry validates them and the override_registry_valid contract-test invariant gates them. Overrides are a last resort for fields that extraction can't prove — every override needs a reason.
Extraction targets the CCXT JS source (OXC) + resolved runtime (QuickBEAM) — not exchange-vendor API docs. CCXT is a reconciliation layer: years of maintainer work reconcile published docs → real wire behavior → exchange bugs → undocumented quirks, and that reconciliation lives in method bodies and describe() maps. Docs lag reality (CCXT routinely ships wire fixes before vendor docs update); 110+ exchanges mean 110+ incompatible doc shapes (rare OpenAPI specs, hand-written markdown, PDFs, Postman collections, occasional non-English-only pages). CCXT has already normalized that surface into one schema — that normalization is the asset this library crystallizes into JSON.
Exchange docs enter the pipeline in three narrow roles only:
- Override justification —
verified_againstinpriv/overrides/<id>.jsoncites a docs URL as evidence when an override corrects CCXT. Docs are evidence for a claim, not a primary source. - Gap enrichment (Tier 1 only) — fields CCXT doesn't model at all (leverage tiers, rebate schedules, sub-account limits). Track as a roadmap task before reading docs.
- Verification (future) — a third-source check in
contract_testwould strengthen today's OXC-vs-QuickBEAM cross-check (still CCXT-vs-CCXT). Not yet built.
Do not propose redesigning the pipeline to read vendor docs as a primary source — that's 110× the work for less reliability. Narrow gaps go through overrides or a tracked task, not a refactor.
This repo gets worked from multiple Claude surfaces — Claude Code CLI (local clone), Claude macOS app (separate clone), occasionally the iOS app. Each surface has its own working copy; history on zenhive may have been rewritten by another surface between sessions. git fetch --prune zenhive early in every session.
When local is many commits ahead of remote, compare author dates against the remote tip before pushing. Locals authored before remote's last commit are usually duplicates of rewritten history from another surface (same message, different SHA), not new work — only commits authored after remote's tip are genuinely new.
To integrate after another surface's rewrite: git rebase --onto zenhive/development <last-duplicate-sha> development -X theirs replays only the genuinely-new commits and auto-resolves JSON conflicts toward local (regenerable via mix ccxt_extract.update). Never force-push.
# one-time setup (install CCXT, copy bundle, verify tools)
mix deps.get
mix ccxt_extract.setup
# full refresh, full universe
mix ccxt_extract.update
# scoped refresh — the 7-exchange derivation-scoped set
mix ccxt_extract.update --tier1 --dex
# single-exchange or mixed
mix ccxt_extract.update --exchange binance,deribit
mix ccxt_extract.update --tier1 --exchange hyperliquid
# write elsewhere (consumer's dir, no git rail)
mix ccxt_extract.update --tier1 --output /path/to/consumer/ccxt
# assemble only (discoveries → output/)
mix ccxt_extract.pipeline
# validate outputs against priv/schema/exchange_v4.json
mix ccxt_extract.validate
# cross-extractor invariants (QuickBEAM vs OXC)
mix ccxt_extract.contract_test
# verify extraction is byte-deterministic across consecutive runs
mix ccxt_extract.determinism_check
# regenerate port-contract signing vectors
mix ccxt_extract.signing_fixtures
# Tidewave MCP server (for runtime exploration via `mcp__tidewave__*`)
mix tidewave # listens on http://localhost:4002
# tests (see test section below for flags)
time mix compile --warnings-as-errors
mix test.json --exclude extraction
mix test.json # includes :extraction (slow, requires priv/ccxt)
mix dialyzer.json --quiet
mix credo --strict --format json
mix sobelow --mark-skip-all # re-mark skips after a scan:extractiontag is excluded by default (test/test_helper.exssetsExUnit.start(exclude: [:extraction, :tier3_corpus, :flaky])). Tests tagged:extractionhit the real CCXT source + bundle and are slow. Run them explicitly withmix test.json --include extractionwhen touching extractor internals. The:flakyexclude is permanent infra (no tests carry the tag in the green state) — Task 131 added it so--exclude flakyinharness.yml:89is operational the day a regression needs quarantine.Mix.shell()is VM-global. Tests that capture Mix output viaMix.shell(Mix.Shell.Process)MUST saveprior_shell = Mix.shell()and restore intry/afteroron_exit. Files that exercise code callingMix.shell().info(...)and assert oncapture_ioshould defensively pinMix.shell(Mix.Shell.IO)in their parentsetup—setup_task_test.exsis the reference pattern (Task 131). Without the pin, leakedMix.Shell.Processfrom another file routes output via:erlang.sendandcapture_ioreturns"".Reach.Project.from_glob/1carries a 5sTask.async_streamdefault. Reach 2.2 doesn't thread a:timeoutopt throughparse_files/build_module_sdgs, so under async test pool contention even small fixture globs trip the timeout. Test files callingContractTest.check_paths_rw_split/1(orReach.Project.from_glob/1directly) should declareasync: falseuntil upstream Reach exposes a timeout knob.contract_test_test.exsis the reference (Task 131).test/integration/cached/*_cached_test.exs— assert against the already-committedpriv/discoveries/corpus. Fast; they don't re-run extraction. These dispatch on observed counts, not envelope stamps, because committed fixtures may come from a scoped run (~34 exchanges) or a full run (~110). UseCcxtExtract.Test.ScopeThresholds(test/support/scope_thresholds.ex) —min_count/3,min_total/4,proportional/2— not ad-hocif count >= Nladders. Cutoff must equal floor:>= 90branch returning>= 100creates a dead zone for counts in[90, 99].test/integration/*_integration_test.exs(non-cached) — actually run extractors againstpriv/ccxt. Always tagged:extraction. UseCcxtExtract.PrivWriteCaseto isolate writes, not ad-hoc rename/restore tricks.test/support/*.ex— only compiled whenMIX_ENV=test(seeelixirc_paths(:test)inmix.exs). Put test helpers here, not inlib/.- Single-test runs:
mix test.json path/to/test.exs:LINEormix test.json --failedfor fast iteration.
Every task must update docs in lockstep with code — a task is incomplete until:
- roadmap/tasks.toml — the typed source of truth for the roadmap. Flip task status with
rmap status <id> <state>(or hand-edit the TOML), thenrmap renderregeneratesROADMAP.md+roadmap/data.json. Do not hand-editROADMAP.md— it is a generated view;rmaprecomputes the focus block and Eff glyphs, so there is no separate "phase summary / Current Focus" sync step.rmap validate --check-rendergates drift. - CHANGELOG.md —
## [Unreleased]entry with what shipped and key decisions. - CLAUDE.md — if architecture, conventions, or invariants moved.
- SCHEMA.md — if the emitted JSON shape changed.
- CONSUMER_CONTRACT.md — if a checklist item moved between
⬜/🚧/✅. - ../ccxt_client/ROADMAP.md (cross-repo rule) — flip or unblock any dependent consumer task. A ccxt_extract task is not complete until its downstream ccxt_client impact is reflected.
Scope-refactor work lives in SCOPED-EXTRACTION-TASKS.md (Tasks 1–11 done; future envelope-stamping work tracked there as Task 13). The generic REFACTOR.md tracks remaining structural cleanups.
From AGENTS.md: review requests get one overall rating for the intended or current change set. If staged vs unstaged mismatch matters, flag it as a finding or blocker, not as a separate score.