Commit 9e9c925
committed
feat: add --threads option to parallelize report data fetching
Add --threads CLI option (default 1) to `edr report` and `edr send-report`.
When set to >1, independent dbt run-operations are executed concurrently
using ThreadPoolExecutor with SubprocessDbtRunner.
dbt's Python API (APIDbtRunner) is not thread-safe due to global mutable
state (GLOBAL_FLAGS, adapter FACTORY, etc.), so parallel execution uses
SubprocessDbtRunner which spawns independent dbt processes per call.
The fetching is split into phases:
- Phase 1: 14 independent operations run in parallel
- Phase 2: exposures + test_results (depend on Phase 1)
- Phase 3: lineage (depends on Phase 2)
- Phase 4: pure computation (no dbt calls)
With --threads=14, edr report time is expected to drop from ~3m40s to
~30-40s on adapters with high query latency (e.g. Athena).1 parent e5af7e7 commit 9e9c925
3 files changed
Lines changed: 400 additions & 12 deletions
0 commit comments