perf: cache parsed manifest in APIDbtRunner#2196
perf: cache parsed manifest in APIDbtRunner#2196dtaniwaki wants to merge 2 commits intoelementary-data:masterfrom
Conversation
Reuse the parsed manifest across sequential dbt run-operation calls instead of re-parsing it on every invocation. This reduces edr report overhead from ~18 manifest parses down to 1.
Cover three scenarios: - manifest is cached after the first successful invocation - manifest is not cached when invocation fails - cached manifest is reused on subsequent calls
|
👋 @dtaniwaki |
📝 WalkthroughWalkthroughThis change implements manifest caching in the Changes
Sequence DiagramsequenceDiagram
participant Caller
participant APIDbtRunner as APIDbtRunner<br/>(_manifest=None)
participant dbtRunner as dbtRunner
participant Manifest as Manifest
Caller->>APIDbtRunner: _inner_run_command() [1st call]
APIDbtRunner->>dbtRunner: dbtRunner(..., manifest=None)
dbtRunner->>Manifest: run & generate manifest
dbtRunner-->>APIDbtRunner: result + manifest
APIDbtRunner->>APIDbtRunner: self._manifest = result.manifest
APIDbtRunner-->>Caller: result
Caller->>APIDbtRunner: _inner_run_command() [2nd call]
APIDbtRunner->>dbtRunner: dbtRunner(..., manifest=self._manifest)
dbtRunner->>Manifest: run with cached manifest
dbtRunner-->>APIDbtRunner: result
APIDbtRunner-->>Caller: result
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/clients/dbt_runner/test_api_dbt_runner.py`:
- Around line 17-23: Replace the helper that bypasses APIDbtRunner.__init__ with
one that calls the real initializer (so _manifest is set by the class) and avoid
the hardcoded "/tmp" by using a temporary directory fixture; e.g., change
_make_runner to call APIDbtRunner(...) with project_dir=str(tmp_path) (or a
tempfile.TemporaryDirectory context) and monkeypatch/stub any heavy external
behavior if needed instead of assigning runner._manifest manually, ensuring
tests exercise the actual __init__ behavior and remove the Ruff S108 `/tmp`
literal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aa6ee93f-7cc2-43d8-865d-e4273823bcf5
📒 Files selected for processing (2)
elementary/clients/dbt/api_dbt_runner.pytests/unit/clients/dbt_runner/test_api_dbt_runner.py
| def _make_runner(): | ||
| runner = APIDbtRunner.__new__(APIDbtRunner) | ||
| runner._manifest = None | ||
| runner.project_dir = "/tmp/fake" | ||
| runner.env_vars = None | ||
| runner.raise_on_failure = False | ||
| return runner |
There was a problem hiding this comment.
Cover the real initializer and avoid the linted /tmp path.
_make_runner() is fine for isolating _inner_run_command, but it manually sets _manifest, so these tests would still pass if APIDbtRunner.__init__ stopped initializing _manifest before super().__init__(). Also, Line 20 is flagged by Ruff S108.
Proposed test/helper adjustment
def _make_runner():
runner = APIDbtRunner.__new__(APIDbtRunner)
runner._manifest = None
- runner.project_dir = "/tmp/fake"
+ runner.project_dir = "fake_project_dir"
runner.env_vars = None
runner.raise_on_failure = False
return runner
+
+
+def test_manifest_initialized_before_parent_init():
+ def assert_manifest_initialized(self, *args, **kwargs):
+ assert hasattr(self, "_manifest")
+ assert self._manifest is None
+
+ with mock.patch(
+ "elementary.clients.dbt.api_dbt_runner.CommandLineDbtRunner.__init__",
+ assert_manifest_initialized,
+ ):
+ APIDbtRunner(project_dir="fake_project_dir")🧰 Tools
🪛 Ruff (0.15.10)
[error] 20-20: Probable insecure usage of temporary file or directory: "/tmp/fake"
(S108)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/clients/dbt_runner/test_api_dbt_runner.py` around lines 17 - 23,
Replace the helper that bypasses APIDbtRunner.__init__ with one that calls the
real initializer (so _manifest is set by the class) and avoid the hardcoded
"/tmp" by using a temporary directory fixture; e.g., change _make_runner to call
APIDbtRunner(...) with project_dir=str(tmp_path) (or a
tempfile.TemporaryDirectory context) and monkeypatch/stub any heavy external
behavior if needed instead of assigning runner._manifest manually, ensuring
tests exercise the actual __init__ behavior and remove the Ruff S108 `/tmp`
literal.
|
Closing in favor of #2197 which should address this more effectively. |
Hi, thanks for building and maintaining Elementary! I'd love to get your review on this small performance improvement.
Problem
edr reportexecutes ~18 sequentialdbt run-operationcalls via the Python API. Each call creates a freshdbtRunner()without passing a manifest, which forces a full manifest re-parse every time. On a project with hundreds of models, sources, and macros, each parse can take ~10 seconds — adding up to ~3 minutes of overhead while the actual queries finish in seconds.Solution
Cache the parsed manifest on
APIDbtRunnerafter the first successful invocation and reuse it for all subsequentdbtRunner(manifest=...)calls. This brings the number of manifest parses from ~18 down to 1.Key details:
self._manifestis initialized beforesuper().__init__()because the parent constructor may call_inner_run_command()via_run_deps_if_needed().dbtRunner.manifestis a public API available since dbt-core 1.5+.APIDbtRunneris affected —SubprocessDbtRunnerandDbtFusionRunnerare unchanged.Summary by CodeRabbit