Skip to content

Commit 7fdc886

Browse files
authored
[codex] remove legacy prompt optimizer surface (#20)
1 parent cfb082d commit 7fdc886

23 files changed

Lines changed: 83 additions & 525 deletions

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# Changelog
22

3+
## 0.19.0 — legacy optimizer removal
4+
5+
### Removed
6+
7+
- Removed the legacy pairwise prompt optimizer surface:
8+
`PromptOptimizer`, `OptimizationLoop`, and their associated root-exported
9+
types are gone. The blessed optimization path is now
10+
`runMultiShotOptimization` for task trajectories and the steering-specific
11+
optimizers for explicit steering tables.
12+
- Removed the old `PromptVariant` root export. Public callers should use
13+
`MultiShotVariant` for multi-shot trajectory optimization or
14+
`EvolvableVariant` for the lower-level prompt/code evolution core.
15+
16+
### Changed
17+
18+
- Documentation now points optimization users at `runMultiShotOptimization`
19+
instead of the removed pairwise prompt optimizer.
20+
321
## 0.18.0 — multi-shot optimization
422

523
### Added

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ The recipe for a code-generator eval is in [`SKILL.md` §Minimal working path](.
8181
| `runAgentControlLoop` | Policy-based runtime for agentic tasks: observe typed state, validate, decide, act, repeat with budgets, tracing, and stuck-loop guards. | [control-runtime.md](./docs/control-runtime.md) |
8282
| `FeedbackTrajectory`, `InMemoryFeedbackTrajectoryStore`, `FileSystemFeedbackTrajectoryStore` | Human/environment feedback loops: capture approvals, rejections, choices, revisions, metrics, and policy blocks as train/dev/test/holdout examples. | [feedback-trajectories.md](./docs/feedback-trajectories.md) |
8383
| `evaluateActionPolicy` | Generic action preflight for approval, budget, expected-outcome, and kill-criteria checks. | [feature-guide.md](./docs/feature-guide.md) |
84-
| `ExperimentTracker`, `PromptOptimizer`, `bisector` | A/B prompts, optimize steering, bisect regressions. | SKILL.md |
84+
| `ExperimentTracker`, steering optimizers, `bisector` | A/B prompts, optimize steering, bisect regressions. | SKILL.md |
8585
| `runMultiShotOptimization`, `trialTraceFromMultiShotTrial` | GEPA-style optimization for variable-length agent trajectories with ASI, paired seeds, and optional held-out promotion gating. | [multi-shot-optimization.md](./docs/multi-shot-optimization.md) |
8686
| `runPromptEvolution`, `createCompositeMutator`, `createSandboxPool`, `createSandboxCodeMutator`, `MutationTelemetry`, `LineageRecorder`, `CostLedger`, `JsonlTrialCache` | Prompt + code evolution loops with bounded sandbox pools, durable JSONL telemetry, plateau-detecting composite mutators, crash-resumable trial cache. | §Evolution loop |
8787
| `reflective-mutation` (`buildReflectionPrompt`, `parseReflectionResponse`, `DEFAULT_MUTATION_PRIMITIVES`) | Trace-conditioned LLM mutator that reasons over top/bottom trials instead of blind rewrites. | inline JSDoc |

clients/python/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ All errors carry `.code` and `.details` (the structured payload from the server)
140140

141141
## Versioning
142142

143-
This package is **version-locked** to the npm package. `tangle-agent-eval==0.18.0``@tangle-network/agent-eval@0.18.0`. The two ship from the same git tag in the same CI workflow; if either fails to publish, neither does. Mismatched versions are a build-time error.
143+
This package is **version-locked** to the npm package. `tangle-agent-eval==0.19.0``@tangle-network/agent-eval@0.19.0`. The two ship from the same git tag in the same CI workflow; if either fails to publish, neither does. Mismatched versions are a build-time error.
144144

145145
`wire_version` is separate. It bumps only on breaking schema changes. Package versions can differ across releases as long as `wire_version` is the same.
146146

clients/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "tangle-agent-eval"
7-
version = "0.18.0"
7+
version = "0.19.0"
88
description = "Python client for @tangle-network/agent-eval — judge content against rubrics over HTTP or stdio RPC."
99
readme = "README.md"
1010
requires-python = ">=3.10"

clients/python/src/tangle_agent_eval/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
VersionResponse,
4040
)
4141

42-
__version__ = "0.18.0"
42+
__version__ = "0.19.0"
4343

4444
__all__ = [
4545
"Client",

docs/feature-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ trying, and whether a change made them better or worse.
3333
| “Human feedback should become reusable eval data.” | `FeedbackTrajectory` | Captures approvals, rejections, edits, choices, metrics, and policy blocks. |
3434
| “Can this action run, or does it need approval?” | `evaluateActionPolicy` | Generic preflight for side effects, budgets, and required evidence. |
3535
| “I need train/dev/test/holdout examples.” | `Dataset` plus feedback trajectory conversion | Stable splits and contamination control. |
36-
| “Which prompt or signature wins?” | `PromptOptimizer`, `OptimizationLoop`, steering optimizers | Runs variants on scenarios and compares scores. |
36+
| “Which prompt or signature wins?” | `runMultiShotOptimization`, steering optimizers | Runs variants on scenarios and compares scores. |
3737
| “Improve a multi-turn agent over real task traces.” | `runMultiShotOptimization` | GEPA-style trajectory optimization with ASI and held-out promotion. |
3838
| “Improve prompts, then code if prompts plateau.” | `runPromptEvolution`, composite mutator, code mutator | Bounded evolution with telemetry and lineage. |
3939
| “Find why a regression happened.” | bisector, traces, run records | Narrows changes and preserves evidence. |
@@ -156,7 +156,7 @@ Store as `FeedbackTrajectory`, then derive:
156156
| Feedback data | `FeedbackTrajectory`, stores, converters | Human/environment labels | Domain adapters live in downstream repos. |
157157
| Action policy | `evaluateActionPolicy` | Approval/budget preflight | Blocks or labels actions before `act()`. |
158158
| Datasets | `Dataset`, holdout tools, canaries | Train/dev/test/holdout corpora | Keeps optimization honest. |
159-
| Optimization | `PromptOptimizer`, `OptimizationLoop`, steering optimizers | Prompt/signature comparison | Use held-out gates before promotion. |
159+
| Optimization | `runMultiShotOptimization`, steering optimizers | Prompt/signature comparison | Use held-out gates before promotion. |
160160
| Evolution | prompt/code mutators, sandbox pool, telemetry | Autoresearch and mutation loops | Use budgets and lineage; do not run unbounded. |
161161
| Telemetry | `TraceStore`, OTLP, file sinks | Audit and replay | Treat traces as evidence, not just logs. |
162162
| Reporting | summaries, pareto, cost tracker | Decision support | Useful for PRs, launch gates, research notes. |

docs/wire-protocol.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ GET /v1/version
9696
```json
9797
{
9898
"package": "@tangle-network/agent-eval",
99-
"version": "0.18.0",
99+
"version": "0.19.0",
100100
"wireVersion": "1.0.0",
101101
"apiSurface": ["judge", "listRubrics", "version"]
102102
}

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@tangle-network/agent-eval",
3-
"version": "0.18.0",
3+
"version": "0.19.0",
44
"description": "Trace-first evaluation framework for Tangle agents. Core (spans, pipelines, sandbox harness, OTLP export), trust (dataset, red-team, calibration, behavior DSL), builder-of-builders (three-layer eval, resumable sessions, meta-runtime correlation), and frontier (meta-eval correlation study, Process Reward Modeling, bisector).",
55
"homepage": "https://github.com/tangle-network/agent-eval#readme",
66
"repository": {

src/code-mutator.ts

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
import type {
2828
MutateAdapter,
29-
PromptVariant,
29+
EvolvableVariant,
3030
TrialResult,
3131
VariantAggregate,
3232
} from './prompt-evolution'
@@ -49,7 +49,7 @@ export interface CodeMutationOutcome {
4949
childId?: string
5050
/** Free-form one-liner: "tightened tool descriptions in forge-tools.ts". */
5151
description?: string
52-
/** What the runner was trying to fix (carried into PromptVariant.rationale). */
52+
/** What the runner was trying to fix (carried into EvolvableVariant.rationale). */
5353
rationale?: string
5454
/** Caller-defined diff payload. Mapped into the variant's payload by
5555
* `toVariantPayload`; agent-eval treats it as opaque. */
@@ -67,7 +67,7 @@ export interface CodeMutationOutcome {
6767

6868
export type CodeMutationRunner<T, P> = (args: {
6969
slot: PoolSlot<T>
70-
parent: PromptVariant<P>
70+
parent: EvolvableVariant<P>
7171
parentAggregate: VariantAggregate
7272
topTrials: TrialResult[]
7373
bottomTrials: TrialResult[]
@@ -83,25 +83,25 @@ export interface CreateSandboxCodeMutatorOpts<T, P> {
8383
* encode the diff however they want (file map, patch string, branch
8484
* ref, snapshot id) without agent-eval taking a stance.
8585
*/
86-
toVariantPayload(outcome: CodeMutationOutcome, parent: PromptVariant<P>): P
86+
toVariantPayload(outcome: CodeMutationOutcome, parent: EvolvableVariant<P>): P
8787
/** Optional telemetry sinks. */
8888
mutationTelemetry?: MutationTelemetry
8989
costLedger?: CostLedger
9090
lineage?: LineageRecorder<P>
9191
/** Override id generation. Default: `${parent.id}.g${generation}.code.${i}`. */
92-
childIdFor?(parent: PromptVariant<P>, generation: number, index: number): string
92+
childIdFor?(parent: EvolvableVariant<P>, generation: number, index: number): string
9393
/** Default label for the variant (visible in reports). */
94-
labelFor?(outcome: CodeMutationOutcome, parent: PromptVariant<P>, generation: number, index: number): string
94+
labelFor?(outcome: CodeMutationOutcome, parent: EvolvableVariant<P>, generation: number, index: number): string
9595
}
9696

9797
export function createSandboxCodeMutator<T, P>(
9898
opts: CreateSandboxCodeMutatorOpts<T, P>,
9999
): MutateAdapter<P> {
100100
const childIdFor = opts.childIdFor
101-
?? ((parent: PromptVariant<P>, generation: number, index: number) =>
101+
?? ((parent: EvolvableVariant<P>, generation: number, index: number) =>
102102
`${parent.id}.g${generation}.code.${index}`)
103103
const labelFor = opts.labelFor
104-
?? ((outcome: CodeMutationOutcome, parent: PromptVariant<P>, _generation: number, index: number) =>
104+
?? ((outcome: CodeMutationOutcome, parent: EvolvableVariant<P>, _generation: number, index: number) =>
105105
outcome.description?.slice(0, 80) ?? `${parent.label} → code.${index}`)
106106

107107
return {
@@ -136,7 +136,7 @@ export function createSandboxCodeMutator<T, P>(
136136
}
137137
})
138138

139-
const variants: PromptVariant<P>[] = []
139+
const variants: EvolvableVariant<P>[] = []
140140
let index = 0
141141
for (const outcome of outcomes) {
142142
const childId = outcome.childId ?? childIdFor(parent, generation, index)
@@ -164,7 +164,7 @@ export function createSandboxCodeMutator<T, P>(
164164
}
165165

166166
if (outcome.ok) {
167-
const variant: PromptVariant<P> = {
167+
const variant: EvolvableVariant<P> = {
168168
id: childId,
169169
payload: opts.toVariantPayload(outcome, parent),
170170
generation,

src/composite-mutator.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818

1919
import type {
2020
MutateAdapter,
21-
PromptVariant,
21+
EvolvableVariant,
2222
TrialResult,
2323
VariantAggregate,
2424
} from './prompt-evolution'
@@ -42,7 +42,7 @@ export interface CreateCompositeMutatorOpts<P> {
4242
}
4343

4444
interface MutateArgs<P> {
45-
parent: PromptVariant<P>
45+
parent: EvolvableVariant<P>
4646
parentAggregate: VariantAggregate
4747
topTrials: TrialResult[]
4848
bottomTrials: TrialResult[]
@@ -91,7 +91,7 @@ export function createCompositeMutator<P>(opts: CreateCompositeMutatorOpts<P>):
9191
}
9292

9393
return {
94-
async mutate(args: MutateArgs<P>): Promise<PromptVariant<P>[]> {
94+
async mutate(args: MutateArgs<P>): Promise<EvolvableVariant<P>[]> {
9595
const { mode, reason } = pickMode(args)
9696
opts.onPolicyDecision?.({ generation: args.generation, chose: mode, reason })
9797

0 commit comments

Comments
 (0)