Skip to content

feat: Jinja/dbt template preprocessing for SQL analysis tools#67

Closed
anandgupta42 wants to merge 5 commits intomainfrom
worktree-jinja-preprocessing
Closed

feat: Jinja/dbt template preprocessing for SQL analysis tools#67
anandgupta42 wants to merge 5 commits intomainfrom
worktree-jinja-preprocessing

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

Summary

  • SQL analysis tools (sql_analyze, sql_format, sql_optimize, sql_translate) now automatically detect and preprocess Jinja-templated dbt SQL before analysis
  • Adds a new sql_preprocess_jinja tool for explicit preprocessing when needed
  • Adds a Python jinja_preprocessor module that handles all common dbt Jinja patterns
  • Results transparently indicate when Jinja preprocessing was applied

What changed

Python engine (packages/altimate-engine/)

New file: src/altimate_engine/sql/jinja_preprocessor.py
Core preprocessor that handles:

Pattern Transformation
{{ ref('model') }} model
{{ source('src', 'table') }} src__table
{{ config(...) }} → removed
{{ var('name') }} '__var_name__'
{{ this }} / {{ this.identifier }} __this__
{# comments #} → removed
{% if %}...{% endif %} → keeps inner content
{% for %}...{% endfor %} → keeps inner content
{% set %}, {% macro %}...{% endmacro %} → removed
{{ adapter.dispatch() }}, {{ log() }}, {{ return() }} → removed
Unknown {{ expr }} __jinja_expr__ + warning

Modified: src/altimate_engine/server.py

  • sql.analyze: auto-preprocesses Jinja, adds note to confidence_factors
  • sql.translate: auto-preprocesses Jinja, adds warning about re-applying templates
  • sql.optimize: auto-preprocesses Jinja, lowers confidence to "medium"
  • sql.format: auto-preprocesses Jinja, adds note about plain SQL output
  • New sql.preprocess_jinja RPC method for explicit preprocessing

Modified: src/altimate_engine/models.py

  • Added SqlPreprocessJinjaParams and SqlPreprocessJinjaResult models

TypeScript CLI (packages/altimate-code/)

New file: src/tool/sql-preprocess-jinja.ts

  • New sql_preprocess_jinja tool registered in the tool registry
  • Shows refs, sources, and variables found during preprocessing

Modified: src/bridge/protocol.ts

  • Added SqlPreprocessJinjaParams and SqlPreprocessJinjaResult interfaces
  • Added sql.preprocess_jinja to BridgeMethods registry

Modified: src/tool/registry.ts

  • Registered SqlPreprocessJinjaTool

Tests (packages/altimate-engine/tests/)

New file: tests/test_jinja_preprocessor.py — 60 tests covering:

  • contains_jinja() detection (7 tests)
  • ref() macro handling (6 tests)
  • source() macro handling (4 tests)
  • var() macro handling (3 tests)
  • config() removal (3 tests)
  • {{ this }} handling (3 tests)
  • Jinja comments (3 tests)
  • {% if/elif/else/endif %} blocks (4 tests)
  • {% for/endfor %} blocks (1 test)
  • {% set %} handling (2 tests)
  • {% macro/endmacro %} blocks (1 test)
  • Utility macros: adapter.dispatch, return, log (3 tests)
  • No-Jinja passthrough (2 tests)
  • Real-world dbt models (5 tests: incremental, sources, for-loop columns, snapshot, valid structure)
  • Edge cases (7 tests: nested Jinja, consecutive templates, mixed quotes, unknown expressions, to_dict, blank lines)
  • Server dispatch integration (6 tests: preprocess_jinja, analyze, format, translate, optimize with Jinja)

Test plan

  • All 54 preprocessor unit tests pass (pytest tests/test_jinja_preprocessor.py -k "not ServerDispatch")
  • TypeScript compiles without new errors (tsc --noEmit — our files clean)
  • Manual test: altimate run "Analyze: SELECT * FROM {{ ref('stg_orders') }}"
  • Manual test: altimate run "Format: SELECT id FROM {{ source('raw', 'events') }} WHERE date > {{ var('start') }}"

Closes #63

🤖 Generated with Claude Code

SQL analysis tools (analyze, format, optimize, translate) now automatically
detect and preprocess Jinja-templated dbt SQL before analysis. This enables
meaningful analysis of dbt models without requiring users to manually strip
template syntax.

The preprocessor handles: ref(), source(), config(), var(), this, comments,
if/elif/else/endif, for/endfor, set, macro blocks, adapter.dispatch, and
other common dbt Jinja patterns.

Closes #63

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
variables_found: list[str] = Field(default_factory=list)
macros_removed: list[str] = Field(default_factory=list)
warnings: list[str] = Field(default_factory=list)
error: str = Field(default=None) # Uses same pattern as rest of file

This comment was marked as outdated.

Spawns `python -m altimate_engine.server` as a real subprocess and
communicates via stdin/stdout JSON-RPC protocol, exactly as the
TypeScript CLI bridge does in production. 19 E2E tests covering:

- Core preprocessing (ref, source, var, config, complex dbt models)
- Batch requests in a single server session
- Auto-preprocessing in downstream tools (analyze, format, translate, optimize)
- Error handling (invalid JSON, unknown methods, missing params)
- JSON-RPC 2.0 protocol correctness (response IDs, field presence)

Also fixes unit test assertions for downstream tools to handle
environments without altimate_core gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines 502 to +506
original_sql=params_obj.sql,
optimized_sql=rw.get("rewritten_sql", params_obj.sql),
optimized_sql=rw.get("rewritten_sql", sql_to_optimize),
suggestions=suggestions,
anti_patterns=anti_patterns,
confidence=opt_confidence,

This comment was marked as outdated.

…ine.ts`

Mixing `||` and `??` without explicit parentheses is a JS syntax error.
Both error_message expressions in `ensureEngineImpl` now correctly group
`(e?.stderr?.toString() || e?.message)` before the `??` fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines 584 to 591
if jinja_fmt_note and not fmt_error:
fmt_error = jinja_fmt_note
result = SqlFormatResult(
success=raw.get("success", True),
formatted_sql=raw.get("formatted_sql", raw.get("sql")),
statement_count=raw.get("statement_count", 1),
error=raw.get("error"),
error=fmt_error,
)

This comment was marked as outdated.

# 4. {% if ... %}...{% endif %} — keep inner content
_RE_IF_OPEN = re.compile(r"\{%-?\s*if\b[^%]*?-?%\}")
_RE_ELIF = re.compile(r"\{%-?\s*elif\b[^%]*?-?%\}")
_RE_ELSE = re.compile(r"\{%-?\s*else\s*-?%\}")

This comment was marked as outdated.

anandgupta42 and others added 2 commits March 5, 2026 16:09
1. Fix `error` field type in SqlPreprocessJinjaResult: use `str | None`
   instead of `str = Field(default=None)` to match Pydantic v2 typing.

2. Fix sql.optimize fallback: use `params_obj.sql` (original) instead of
   `sql_to_optimize` (preprocessed) so Jinja-only preprocessing isn't
   incorrectly reported as an optimization.

3. Fix sql.format Jinja note: append as SQL comment to formatted output
   instead of abusing the `error` field, which clients ignore on success.

4. Fix regex `[^%]*?` in Jinja preprocessor: use `.*?` so modulo
   operators (%) inside tags like `{% if loop.index % 2 == 0 %}` are
   matched correctly. Added test for this edge case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use __jinja_expr__ placeholder for adapter.dispatch/return/log instead
  of empty string to avoid invalid SQL in expression positions
- Add Jinja auto-preprocessing to lineage.check and sql.explain
- Improve sql.optimize fallback: preserve original SQL when only Jinja
  preprocessing occurred (no actual rewrites)
- Simplify sql.format: drop the SQL comment approach, just format the
  preprocessed SQL cleanly
- Update tests to verify __jinja_expr__ placeholder behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@anandgupta42 anandgupta42 deleted the worktree-jinja-preprocessing branch March 17, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Jinja/dbt template preprocessing for SQL analysis tools

1 participant