perf: cache parsed manifest in APIDbtRunner by dtaniwaki · Pull Request #2196 · elementary-data/elementary

dtaniwaki · 2026-04-22T04:15:30Z

Hi, thanks for building and maintaining Elementary! I'd love to get your review on this small performance improvement.

Problem

edr report executes ~18 sequential dbt run-operation calls via the Python API. Each call creates a fresh dbtRunner() without passing a manifest, which forces a full manifest re-parse every time. On a project with hundreds of models, sources, and macros, each parse can take ~10 seconds — adding up to ~3 minutes of overhead while the actual queries finish in seconds.

Solution

Cache the parsed manifest on APIDbtRunner after the first successful invocation and reuse it for all subsequent dbtRunner(manifest=...) calls. This brings the number of manifest parses from ~18 down to 1.

Key details:

self._manifest is initialized before super().__init__() because the parent constructor may call _inner_run_command() via _run_deps_if_needed().
The manifest is only cached after a successful invocation to avoid storing a partial state.
dbtRunner.manifest is a public API available since dbt-core 1.5+.
Only APIDbtRunner is affected — SubprocessDbtRunner and DbtFusionRunner are unchanged.

Summary by CodeRabbit

Refactor
- Enhanced dbt command execution to cache and reuse manifest data across multiple command invocations for improved efficiency.

Reuse the parsed manifest across sequential dbt run-operation calls instead of re-parsing it on every invocation. This reduces edr report overhead from ~18 manifest parses down to 1.

Cover three scenarios: - manifest is cached after the first successful invocation - manifest is not cached when invocation fails - cached manifest is reused on subsequent calls

github-actions · 2026-04-22T04:15:40Z

👋 @dtaniwaki
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

coderabbitai · 2026-04-22T04:15:44Z

📝 Walkthrough

Walkthrough

This change implements manifest caching in the APIDbtRunner class. The runner now initializes an instance-level _manifest attribute, stores the dbt manifest after a successful first invocation, and reuses the cached manifest in subsequent command executions instead of regenerating it.

Changes

Cohort / File(s)	Summary
Core manifest caching implementation `elementary/clients/dbt/api_dbt_runner.py`	Added `__init__` method to initialize `_manifest` attribute and modified `_inner_run_command` to pass the cached manifest to `dbtRunner`, storing the manifest after successful execution for reuse in subsequent calls.
Manifest caching tests `tests/unit/clients/dbt_runner/test_api_dbt_runner.py`	New test module validating that `APIDbtRunner` correctly caches and reuses the dbt manifest across invocations, including tests for initial caching, failed runs, and subsequent reuse of cached manifests.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant APIDbtRunner as APIDbtRunner<br/>(_manifest=None)
    participant dbtRunner as dbtRunner
    participant Manifest as Manifest

    Caller->>APIDbtRunner: _inner_run_command() [1st call]
    APIDbtRunner->>dbtRunner: dbtRunner(..., manifest=None)
    dbtRunner->>Manifest: run & generate manifest
    dbtRunner-->>APIDbtRunner: result + manifest
    APIDbtRunner->>APIDbtRunner: self._manifest = result.manifest
    APIDbtRunner-->>Caller: result

    Caller->>APIDbtRunner: _inner_run_command() [2nd call]
    APIDbtRunner->>dbtRunner: dbtRunner(..., manifest=self._manifest)
    dbtRunner->>Manifest: run with cached manifest
    dbtRunner-->>APIDbtRunner: result
    APIDbtRunner-->>Caller: result

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hop, hop, a manifest cached so fine,
No need to rebuild what once did shine,
Across each run, the memo stays put,
Swift and clever—a rabbit's salute! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'perf: cache parsed manifest in APIDbtRunner' directly summarizes the main change: adding manifest caching optimization to APIDbtRunner for performance improvements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/clients/dbt_runner/test_api_dbt_runner.py`:
- Around line 17-23: Replace the helper that bypasses APIDbtRunner.__init__ with
one that calls the real initializer (so _manifest is set by the class) and avoid
the hardcoded "/tmp" by using a temporary directory fixture; e.g., change
_make_runner to call APIDbtRunner(...) with project_dir=str(tmp_path) (or a
tempfile.TemporaryDirectory context) and monkeypatch/stub any heavy external
behavior if needed instead of assigning runner._manifest manually, ensuring
tests exercise the actual __init__ behavior and remove the Ruff S108 `/tmp`
literal.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa6ee93f-7cc2-43d8-865d-e4273823bcf5

📥 Commits

Reviewing files that changed from the base of the PR and between e5af7e7 and ccb3fe8.

📒 Files selected for processing (2)

elementary/clients/dbt/api_dbt_runner.py
tests/unit/clients/dbt_runner/test_api_dbt_runner.py

coderabbitai · 2026-04-22T04:19:36Z

+def _make_runner():
+    runner = APIDbtRunner.__new__(APIDbtRunner)
+    runner._manifest = None
+    runner.project_dir = "/tmp/fake"
+    runner.env_vars = None
+    runner.raise_on_failure = False
+    return runner


⚠️ Potential issue | 🟡 Minor

Cover the real initializer and avoid the linted /tmp path.

_make_runner() is fine for isolating _inner_run_command, but it manually sets _manifest, so these tests would still pass if APIDbtRunner.__init__ stopped initializing _manifest before super().__init__(). Also, Line 20 is flagged by Ruff S108.

Proposed test/helper adjustment

def _make_runner(): runner = APIDbtRunner.__new__(APIDbtRunner) runner._manifest = None - runner.project_dir = "/tmp/fake" + runner.project_dir = "fake_project_dir" runner.env_vars = None runner.raise_on_failure = False return runner + + +def test_manifest_initialized_before_parent_init(): + def assert_manifest_initialized(self, *args, **kwargs): + assert hasattr(self, "_manifest") + assert self._manifest is None + + with mock.patch( + "elementary.clients.dbt.api_dbt_runner.CommandLineDbtRunner.__init__", + assert_manifest_initialized, + ): + APIDbtRunner(project_dir="fake_project_dir")

🧰 Tools

🪛 Ruff (0.15.10)

[error] 20-20: Probable insecure usage of temporary file or directory: "/tmp/fake"

(S108)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/clients/dbt_runner/test_api_dbt_runner.py` around lines 17 - 23, Replace the helper that bypasses APIDbtRunner.__init__ with one that calls the real initializer (so _manifest is set by the class) and avoid the hardcoded "/tmp" by using a temporary directory fixture; e.g., change _make_runner to call APIDbtRunner(...) with project_dir=str(tmp_path) (or a tempfile.TemporaryDirectory context) and monkeypatch/stub any heavy external behavior if needed instead of assigning runner._manifest manually, ensuring tests exercise the actual __init__ behavior and remove the Ruff S108 `/tmp` literal.

dtaniwaki · 2026-04-22T13:16:58Z

Closing in favor of #2197 which should address this more effectively.

dtaniwaki added 2 commits April 22, 2026 12:41

perf: cache parsed manifest in APIDbtRunner

d6d6d06

Reuse the parsed manifest across sequential dbt run-operation calls instead of re-parsing it on every invocation. This reduces edr report overhead from ~18 manifest parses down to 1.

test: add unit tests for APIDbtRunner manifest caching

ccb3fe8

Cover three scenarios: - manifest is cached after the first successful invocation - manifest is not cached when invocation fails - cached manifest is reused on subsequent calls

dtaniwaki requested a deployment to elementary_test_env April 22, 2026 04:15 — with GitHub Actions Waiting

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

dtaniwaki mentioned this pull request Apr 22, 2026

feat: add --threads option to parallelize report data fetching #2197

Open

dtaniwaki closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache parsed manifest in APIDbtRunner#2196

perf: cache parsed manifest in APIDbtRunner#2196
dtaniwaki wants to merge 2 commits intoelementary-data:masterfrom
dtaniwaki:worktree-cache-manifest

dtaniwaki commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

dtaniwaki commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dtaniwaki commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

dtaniwaki commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dtaniwaki commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

dtaniwaki commented Apr 22, 2026 •

edited

Loading