[revisit later] feat: improve builder agent prompt for resilience and correctness by anandgupta42 · Pull Request #90 · AltimateAI/altimate-code

anandgupta42 · 2026-03-07T03:01:45Z

Summary

Two sets of improvements based on Spider2-DBT benchmark analysis (68 real-world dbt+DuckDB tasks). All changes are generic improvements that benefit all workflows.

Builder Prompt Improvements

Graceful degradation: sql_analyze, sql_validate, lineage_check, schema_inspect skip gracefully when unavailable instead of the agent getting stuck retrying
Temporal determinism: Avoid current_date/now()/current_timestamp on fixed/historical datasets
Output validation: Query output database directly after dbt run to verify correctness
Read before writing: Always read existing models before creating new ones
Reserved word quoting: Dialect-aware quoting for SQL reserved words
Self-review enhancements: JOIN correctness, aggregation completeness, non-deterministic function checks

Context Efficiency Improvements

Surface observation masks: Pruned tool outputs now show [Tool output cleared — read(file_path: "...") returned 47 lines, 3.2 KB — "SELECT..."] instead of opaque [Old tool result content cleared]. The mask was already computed by SessionCompaction.prune() but never surfaced in toModelMessages.
Remove dead <directories> block: SystemPrompt.environment() emitted empty <directories>\n \n</directories> (permanently disabled via && false). Removed ~30 wasted tokens per API call.
Compact skill descriptions: Tool schema uses single-line <skill name="...">description</skill> instead of 4-line XML per skill. Drops unused <location> URLs. ~60% size reduction in skill listings.

Benchmark Pipeline

68-task benchmark runner with parallel execution (--parallel N)
Official Spider2 evaluation bridge
Interactive single-file HTML report with leaderboard chart

Benchmark Results

Metric	Before	After	Delta
Pass rate	39.71% (27/68)	42.65% (29/68)	+2.94%
Regressions	—	0	—
New passes	—	divvy001, hive001	+2 tasks

Test plan

TypeScript typecheck passes (tsgo --noEmit in opencode package: 0 errors)
8 new e2e tests for context efficiency (no mocking — real session state, prune, skill loading)
Existing skill test updated for compact format
Full test suite: 1505 pass
Full 68-task benchmark run validated with 0 regressions
Manual smoke test: run builder agent on a dbt project without warehouse connection

🤖 Generated with Claude Code

…lysis Key improvements to builder agent system prompt: - Add graceful degradation: skip `sql_analyze`, `sql_validate`, `lineage_check`, `schema_inspect` when unavailable instead of getting stuck retrying - Add temporal determinism: avoid `current_date`/`now()`/`current_timestamp` on fixed/historical datasets - Add output validation: query output database directly after dbt run to verify correctness, not just compilation success - Add read-before-write: always read existing models before creating new ones - Add reserved word quoting guidance for SQL column names - Add self-review checks for JOIN correctness, aggregation completeness, non-deterministic functions - Update agent-modes docs to reflect graceful degradation and output validation Validated against Spider2-DBT benchmark: 42.65% pass rate (29/68 tasks), up from 39.71% baseline (27/68), 0 regressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Scope "Explore first" to relevant models in same layer/domain instead of reading ALL models (performance concern for large projects) - Make reserved word quoting dialect-aware: double quotes for ANSI SQL, backticks for BigQuery/MySQL, brackets for SQL Server - Add incremental model exception for temporal function guidance - Add warehouse-agnostic fallback for output validation (not just DuckDB) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 68-task benchmark for evaluating agent on dbt+DuckDB workflows - Resumable runner with parallel execution (`--parallel N`) - Official Spider2 evaluation bridge (`eval_utils`) - Interactive single-file HTML report with leaderboard chart - One-time setup script for Spider2 repo + DuckDB databases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three generic context efficiency improvements: 1. Surface `observation_mask` for pruned tool outputs in `toModelMessages` instead of opaque "[Old tool result content cleared]". The mask was already computed by `SessionCompaction.prune()` but never used — gives the model post-compaction awareness of what it previously read. 2. Remove dead `<directories>` block from `SystemPrompt.environment()`. The tree was permanently disabled via `&& false`, leaving an empty XML tag wasting ~30 tokens per API call. 3. Compact skill descriptions in tool schema from 4-line XML per skill to single-line `<skill name="...">description</skill>`. Drops unused `<location>` URLs. Cuts skill listing size by ~60%. Includes 8 e2e tests validating all three changes without mocking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add missing `id`, `sessionID`, `messageID` properties to `createObservationMask` test fixtures to satisfy `MessageV2.ToolPart` type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-14T01:04:48Z

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

github-actions · 2026-03-14T01:04:49Z

Hey! Your PR title [revisit later] feat: improve builder agent prompt for resilience and correctness doesn't follow conventional commit format.

Please update it to start with one of:

feat: or feat(scope): new feature
fix: or fix(scope): bug fix
docs: or docs(scope): documentation changes
chore: or chore(scope): maintenance tasks
refactor: or refactor(scope): code refactoring
test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

github-actions Bot added the contributor label Mar 7, 2026

anandgupta42 force-pushed the feat/builder-prompt-improvements branch from 3fcde96 to 3e321f8 Compare March 7, 2026 05:58

anandgupta42 and others added 6 commits March 6, 2026 22:02

fix: add missing fields to ToolPart test fixtures

b1c7e5d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: resolve typecheck errors in context-efficiency tests

a2fd017

Add missing `id`, `sessionID`, `messageID` properties to `createObservationMask` test fixtures to satisfy `MessageV2.ToolPart` type. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

anandgupta42 force-pushed the feat/builder-prompt-improvements branch from d05da0a to a2fd017 Compare March 7, 2026 06:03

anandgupta42 and others added 3 commits March 6, 2026 22:24

chore: re-trigger CI

500053d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: re-trigger CI

c76a7fc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into feat/builder-prompt-improvements

7cfe4bc

anandgupta42 changed the title ~~feat: improve builder agent prompt for resilience and correctness~~ [revisit later] feat: improve builder agent prompt for resilience and correctness Mar 14, 2026

anandgupta42 closed this Mar 14, 2026

github-actions Bot added the needs:compliance label Mar 14, 2026

github-actions Bot added the needs:title label Mar 14, 2026

anandgupta42 deleted the feat/builder-prompt-improvements branch March 17, 2026 00:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[revisit later] feat: improve builder agent prompt for resilience and correctness#90

[revisit later] feat: improve builder agent prompt for resilience and correctness#90
anandgupta42 wants to merge 9 commits intomainfrom
feat/builder-prompt-improvements

anandgupta42 commented Mar 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anandgupta42 commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Builder Prompt Improvements

Context Efficiency Improvements

Benchmark Pipeline

Benchmark Results

Test plan

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anandgupta42 commented Mar 7, 2026 •

edited

Loading