You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bump the version badge to v0.3.0 and refresh the feature cards to the
shipping design (phase-based setup, focused six-tool set, reliability
guards, single-verdict validation). Paper results and the architecture
diagram are unchanged.
<p>SAG operates entirely within Docker containers — not generating Dockerfiles, but interactively exploring and configuring inside them. Zero host pollution, fully reproducible.</p>
<p>Examines concrete evidence on disk — .class files after compilation, surefire XML test reports, build artifacts. Prevents hallucinations where the agent believes it succeeded without actual completion.</p>
<p>Examines concrete evidence on disk — .class files, surefire XML test reports, build artifacts — under a single verdict shared by the CLI, the report, and the exit code. Prevents hallucinations where the agent believes it succeeded without actual completion.</p>
127
127
</div>
128
128
<divclass="feat-item">
129
-
<smallclass="feat-title">Trunk & Branch Context</small>
130
-
<p>Trunk Context maintains global state and task lists. Branch contexts capture subtask details. When a subtask ends, results merge back to Trunk — enabling complex chains without context loss.</p>
<p>A project setup runs as an engine-driven sequence — provision, analyze, build, test, report. The agent works inside one phase at a time with a clean context window, and the engine advances only on real build evidence.</p>
<p>Layered tools from low-level (Bash) through build specialists (Maven, Gradle) to high-level orchestrators (Project Analyzer). Graceful fallback when specialized tools encounter unexpected scenarios.</p>
133
+
<smallclass="feat-title">Focused Tool Set</small>
134
+
<p>Six intent-driven tools — build, project, search, bash, files, report — behind verb-based actions. Long build output becomes a retrievable reference instead of flooding the agent's context window.</p>
135
135
</div>
136
136
<divclass="feat-item">
137
-
<smallclass="feat-title">Agent State Evaluator</small>
138
-
<p>Analyzes Agent Memory to detect problematic patterns — repeated identical actions indicating the agent is stuck in a loop. Provides corrective feedback to break cycles autonomously.</p>
<p>Completion gates reject work without real artifacts, long builds run detached instead of timing out, and a global iteration and wall-clock cap bound every run. Repeated-action detection breaks stuck loops.</p>
0 commit comments