Skip to content

perf: cache parsed manifest in APIDbtRunner#2196

Closed
dtaniwaki wants to merge 2 commits intoelementary-data:masterfrom
dtaniwaki:worktree-cache-manifest
Closed

perf: cache parsed manifest in APIDbtRunner#2196
dtaniwaki wants to merge 2 commits intoelementary-data:masterfrom
dtaniwaki:worktree-cache-manifest

Conversation

@dtaniwaki
Copy link
Copy Markdown

@dtaniwaki dtaniwaki commented Apr 22, 2026

Hi, thanks for building and maintaining Elementary! I'd love to get your review on this small performance improvement.

Problem

edr report executes ~18 sequential dbt run-operation calls via the Python API. Each call creates a fresh dbtRunner() without passing a manifest, which forces a full manifest re-parse every time. On a project with hundreds of models, sources, and macros, each parse can take ~10 seconds — adding up to ~3 minutes of overhead while the actual queries finish in seconds.

Solution

Cache the parsed manifest on APIDbtRunner after the first successful invocation and reuse it for all subsequent dbtRunner(manifest=...) calls. This brings the number of manifest parses from ~18 down to 1.

Key details:

  • self._manifest is initialized before super().__init__() because the parent constructor may call _inner_run_command() via _run_deps_if_needed().
  • The manifest is only cached after a successful invocation to avoid storing a partial state.
  • dbtRunner.manifest is a public API available since dbt-core 1.5+.
  • Only APIDbtRunner is affected — SubprocessDbtRunner and DbtFusionRunner are unchanged.

Summary by CodeRabbit

  • Refactor
    • Enhanced dbt command execution to cache and reuse manifest data across multiple command invocations for improved efficiency.

Reuse the parsed manifest across sequential dbt run-operation calls
instead of re-parsing it on every invocation. This reduces edr report
overhead from ~18 manifest parses down to 1.
Cover three scenarios:
- manifest is cached after the first successful invocation
- manifest is not cached when invocation fails
- cached manifest is reused on subsequent calls
@github-actions
Copy link
Copy Markdown
Contributor

👋 @dtaniwaki
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

@dtaniwaki dtaniwaki requested a deployment to elementary_test_env April 22, 2026 04:15 — with GitHub Actions Waiting
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

This change implements manifest caching in the APIDbtRunner class. The runner now initializes an instance-level _manifest attribute, stores the dbt manifest after a successful first invocation, and reuses the cached manifest in subsequent command executions instead of regenerating it.

Changes

Cohort / File(s) Summary
Core manifest caching implementation
elementary/clients/dbt/api_dbt_runner.py
Added __init__ method to initialize _manifest attribute and modified _inner_run_command to pass the cached manifest to dbtRunner, storing the manifest after successful execution for reuse in subsequent calls.
Manifest caching tests
tests/unit/clients/dbt_runner/test_api_dbt_runner.py
New test module validating that APIDbtRunner correctly caches and reuses the dbt manifest across invocations, including tests for initial caching, failed runs, and subsequent reuse of cached manifests.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant APIDbtRunner as APIDbtRunner<br/>(_manifest=None)
    participant dbtRunner as dbtRunner
    participant Manifest as Manifest

    Caller->>APIDbtRunner: _inner_run_command() [1st call]
    APIDbtRunner->>dbtRunner: dbtRunner(..., manifest=None)
    dbtRunner->>Manifest: run & generate manifest
    dbtRunner-->>APIDbtRunner: result + manifest
    APIDbtRunner->>APIDbtRunner: self._manifest = result.manifest
    APIDbtRunner-->>Caller: result

    Caller->>APIDbtRunner: _inner_run_command() [2nd call]
    APIDbtRunner->>dbtRunner: dbtRunner(..., manifest=self._manifest)
    dbtRunner->>Manifest: run with cached manifest
    dbtRunner-->>APIDbtRunner: result
    APIDbtRunner-->>Caller: result
Loading

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hop, hop, a manifest cached so fine,
No need to rebuild what once did shine,
Across each run, the memo stays put,
Swift and clever—a rabbit's salute! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: cache parsed manifest in APIDbtRunner' directly summarizes the main change: adding manifest caching optimization to APIDbtRunner for performance improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/clients/dbt_runner/test_api_dbt_runner.py`:
- Around line 17-23: Replace the helper that bypasses APIDbtRunner.__init__ with
one that calls the real initializer (so _manifest is set by the class) and avoid
the hardcoded "/tmp" by using a temporary directory fixture; e.g., change
_make_runner to call APIDbtRunner(...) with project_dir=str(tmp_path) (or a
tempfile.TemporaryDirectory context) and monkeypatch/stub any heavy external
behavior if needed instead of assigning runner._manifest manually, ensuring
tests exercise the actual __init__ behavior and remove the Ruff S108 `/tmp`
literal.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa6ee93f-7cc2-43d8-865d-e4273823bcf5

📥 Commits

Reviewing files that changed from the base of the PR and between e5af7e7 and ccb3fe8.

📒 Files selected for processing (2)
  • elementary/clients/dbt/api_dbt_runner.py
  • tests/unit/clients/dbt_runner/test_api_dbt_runner.py

Comment on lines +17 to +23
def _make_runner():
runner = APIDbtRunner.__new__(APIDbtRunner)
runner._manifest = None
runner.project_dir = "/tmp/fake"
runner.env_vars = None
runner.raise_on_failure = False
return runner
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cover the real initializer and avoid the linted /tmp path.

_make_runner() is fine for isolating _inner_run_command, but it manually sets _manifest, so these tests would still pass if APIDbtRunner.__init__ stopped initializing _manifest before super().__init__(). Also, Line 20 is flagged by Ruff S108.

Proposed test/helper adjustment
 def _make_runner():
     runner = APIDbtRunner.__new__(APIDbtRunner)
     runner._manifest = None
-    runner.project_dir = "/tmp/fake"
+    runner.project_dir = "fake_project_dir"
     runner.env_vars = None
     runner.raise_on_failure = False
     return runner
+
+
+def test_manifest_initialized_before_parent_init():
+    def assert_manifest_initialized(self, *args, **kwargs):
+        assert hasattr(self, "_manifest")
+        assert self._manifest is None
+
+    with mock.patch(
+        "elementary.clients.dbt.api_dbt_runner.CommandLineDbtRunner.__init__",
+        assert_manifest_initialized,
+    ):
+        APIDbtRunner(project_dir="fake_project_dir")
🧰 Tools
🪛 Ruff (0.15.10)

[error] 20-20: Probable insecure usage of temporary file or directory: "/tmp/fake"

(S108)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/clients/dbt_runner/test_api_dbt_runner.py` around lines 17 - 23,
Replace the helper that bypasses APIDbtRunner.__init__ with one that calls the
real initializer (so _manifest is set by the class) and avoid the hardcoded
"/tmp" by using a temporary directory fixture; e.g., change _make_runner to call
APIDbtRunner(...) with project_dir=str(tmp_path) (or a
tempfile.TemporaryDirectory context) and monkeypatch/stub any heavy external
behavior if needed instead of assigning runner._manifest manually, ensuring
tests exercise the actual __init__ behavior and remove the Ruff S108 `/tmp`
literal.

@dtaniwaki
Copy link
Copy Markdown
Author

dtaniwaki commented Apr 22, 2026

Closing in favor of #2197 which should address this more effectively.

@dtaniwaki dtaniwaki closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant