Skip to content

Commit 1b551eb

Browse files
committed
v2 with SkillOpt Support
1 parent 50f72a7 commit 1b551eb

15 files changed

Lines changed: 2175 additions & 0 deletions
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
name: Bug report
3+
about: Report a reproducible bug in CodexOpt
4+
title: "[Bug] "
5+
labels: bug
6+
assignees: ""
7+
---
8+
9+
## Summary
10+
11+
Describe the bug clearly.
12+
13+
## Steps To Reproduce
14+
15+
1.
16+
2.
17+
3.
18+
19+
## Expected Behavior
20+
21+
What should have happened?
22+
23+
## Actual Behavior
24+
25+
What happened instead?
26+
27+
## Environment
28+
29+
- OS:
30+
- Python:
31+
- CodexOpt version:
32+
33+
## Logs / Output
34+
35+
Add relevant terminal output, stack traces, or run artifacts.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
name: Feature request
3+
about: Suggest an improvement for CodexOpt
4+
title: "[Feature] "
5+
labels: enhancement
6+
assignees: ""
7+
---
8+
9+
## Problem
10+
11+
What problem are you trying to solve?
12+
13+
## Proposed Solution
14+
15+
Describe the feature and expected behavior.
16+
17+
## Alternatives Considered
18+
19+
What alternatives did you evaluate?
20+
21+
## Additional Context
22+
23+
Include examples, references, or related issues.

.github/pull_request_template.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
## Summary
2+
3+
Describe what changed and why.
4+
5+
## Checklist
6+
7+
- [ ] I ran lint checks.
8+
- [ ] I ran tests.
9+
- [ ] I updated docs/changelog if needed.
10+
- [ ] I verified this does not introduce unrelated changes.
11+
12+
## Validation
13+
14+
Paste relevant command output (or summarize key results):
15+
16+
```bash
17+
uv run --no-sync ruff check src tests
18+
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --no-sync pytest -q
19+
```

CHANGELOG.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
## [Unreleased]
6+
7+
## [0.2.0] - 2026-05-26
8+
9+
### Added
10+
- Added `codexopt improve` as the one-command Codex workflow for discovery, task mining, reflective optimization, preview, and apply.
11+
- Added `codexopt improve --live` to opt into Codex-backed optimizer and judge runs.
12+
- Added the `reflective` engine for SkillOpt and GEPA inspired optimization of `SKILL.md` and `AGENTS.md`.
13+
- Added tiered rewards with verifier, judge, and static fallback modes.
14+
- Added Codex rollout parsing for `codex exec --json` trajectories.
15+
- Added `codexopt tasks init` to mine starter optimization tasks from git history, skills, and issues.
16+
- Added `skillopt` as a SKILL.md optimization engine with train/validation evidence splits.
17+
- Added validation-gated candidate acceptance with configurable edit budget and validation delta.
18+
- Added optional executable rollout tasks from JSON `evidence.task_files`.
19+
- Added temporary-repo rollout execution for candidate skills, including pass/fail artifact metadata.
20+
- Added SkillOpt metadata to `optimize.json`, CLI summaries, and markdown reports.
21+
- Added support for `.agents/skills/**/SKILL.md` discovery.
22+
- Documented progressive Codex user workflows, reflective optimization, rollout configuration, task format, artifacts, and current boundaries.
23+
24+
### Changed
25+
- Made offline preview the default for `codexopt improve` so Codex and API budget are only used when explicitly requested.
26+
- Deprecated the legacy `--engine gepa` path in favor of the maintained `reflective` engine.
27+
- Updated package description to emphasize Codex and SkillOpt-style validation.
28+
29+
## [0.1.0] - 2026-03-09
30+
31+
### Added
32+
- Initial open-source release of CodexOpt.
33+
- CLI workflow for `init`, `scan`, `benchmark`, `optimize`, `apply`, and `report`.
34+
- Heuristic optimization engine and optional GEPA integration path.
35+
- Run artifacts and markdown reporting.
36+
- `uv`-first CI pipeline for lint, test, and build.

CONTRIBUTING.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Contributing
2+
3+
Thanks for contributing to CodexOpt.
4+
5+
## Development Setup
6+
7+
```bash
8+
uv lock
9+
uv sync --extra dev
10+
```
11+
12+
## Run Checks
13+
14+
```bash
15+
uv run --no-sync ruff check src tests
16+
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --no-sync pytest -q
17+
uv build
18+
```
19+
20+
## Pull Requests
21+
22+
- Keep changes scoped and include tests when behavior changes.
23+
- Update `README.md` and/or `CHANGELOG.md` when relevant.
24+
- Ensure CI passes before requesting review.

SECURITY.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Security Policy
2+
3+
## Reporting a Vulnerability
4+
5+
Please report security issues privately to:
6+
7+
- `shashi@super-agentic.ai`
8+
9+
Include:
10+
11+
- A clear description of the issue.
12+
- Reproduction steps or proof of concept.
13+
- Affected versions and environment details.
14+
15+
## Response
16+
17+
- We will acknowledge reports as quickly as possible.
18+
- We will work on validation, mitigation, and a fix timeline.
19+
- Please avoid public disclosure until a fix is available.

docs/codex-users.md

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
# Using CodexOpt with Codex
2+
3+
Use this guide when your repo already has Codex instruction files and you want
4+
CodexOpt to improve them safely.
5+
6+
CodexOpt works with the same files Codex loads:
7+
8+
- `AGENTS.md`
9+
- `.codex/skills/**/SKILL.md`
10+
- `.agents/skills/**/SKILL.md`
11+
12+
## Start With A Preview
13+
14+
Run this from the repo where you use Codex:
15+
16+
```bash
17+
uv run codexopt improve
18+
```
19+
20+
This command:
21+
22+
1. finds `AGENTS.md` and `SKILL.md` files
23+
2. mines starter tasks from git history and skill descriptions
24+
3. runs the reflective optimizer in preview mode
25+
4. shows what would change
26+
5. writes review artifacts under `.codexopt/`
27+
28+
The default preview stays offline. It does not spend Codex or API budget unless
29+
you ask it to.
30+
31+
## Run The Live Codex Loop
32+
33+
Use live mode when you want CodexOpt to evaluate actual Codex behavior:
34+
35+
```bash
36+
uv run codexopt improve --live
37+
```
38+
39+
Live mode uses `codex exec` as the optimizer and judge. CodexOpt evaluates the
40+
candidate instruction file, captures feedback from the run, proposes a focused
41+
rewrite, and keeps the rewrite only when it improves held-out tasks.
42+
43+
## Apply The Result
44+
45+
After reviewing the preview, apply validated changes:
46+
47+
```bash
48+
uv run codexopt improve --live --apply
49+
```
50+
51+
CodexOpt writes backups before changing files.
52+
53+
## Review The Report
54+
55+
Write a markdown report after any run:
56+
57+
```bash
58+
uv run codexopt report --output codexopt-report.md
59+
```
60+
61+
The report shows:
62+
63+
- files found
64+
- files improved
65+
- validation score movement
66+
- accepted reflective edits
67+
- sampled feedback that led to the edit
68+
- fallback notes when CodexOpt had to use a weaker signal
69+
70+
## Step By Step Workflow
71+
72+
Use this flow when you want more control than `improve`:
73+
74+
```bash
75+
uv run codexopt init
76+
uv run codexopt scan
77+
uv run codexopt benchmark
78+
uv run codexopt optimize skills --engine reflective
79+
uv run codexopt apply --kind skills --dry-run
80+
uv run codexopt report --output codexopt-report.md
81+
```
82+
83+
Review the dry-run diff, then apply:
84+
85+
```bash
86+
uv run codexopt apply --kind skills
87+
```
88+
89+
For `AGENTS.md`:
90+
91+
```bash
92+
uv run codexopt optimize agents --engine reflective --file AGENTS.md
93+
uv run codexopt apply --kind agents --dry-run
94+
```
95+
96+
## Add Simple Task Evidence
97+
98+
Task evidence tells CodexOpt what “better” means for your repo.
99+
100+
Create `tasks.md`:
101+
102+
```md
103+
- Update changelog entries for patch releases.
104+
- Add regression tests before changing parser behavior.
105+
- Summarize risky changes in the final response.
106+
```
107+
108+
Reference it in `codexopt.yaml`:
109+
110+
```yaml
111+
evidence:
112+
task_files:
113+
- tasks.md
114+
```
115+
116+
Then run:
117+
118+
```bash
119+
uv run codexopt improve
120+
```
121+
122+
CodexOpt uses these tasks for train and validation splits. A candidate must
123+
improve held-out validation score before it can win.
124+
125+
## Mine Starter Tasks
126+
127+
If you do not have task evidence yet, generate a starter file:
128+
129+
```bash
130+
uv run codexopt tasks init
131+
```
132+
133+
Review the generated `codexopt-tasks.json`, trim anything noisy, then add it to
134+
`evidence.task_files`.
135+
136+
## Add Command Rollouts
137+
138+
Use command rollouts when a deterministic verifier can decide whether a skill
139+
supports a workflow.
140+
141+
Create `skill-rollouts.json`:
142+
143+
```json
144+
[
145+
{
146+
"name": "release-skill-smoke",
147+
"description": "Verify the release skill mentions changelog and tests.",
148+
"command": "python scripts/verify_release_skill.py",
149+
"timeout_seconds": 30,
150+
"expected_stdout_contains": "ok"
151+
}
152+
]
153+
```
154+
155+
Reference it:
156+
157+
```yaml
158+
evidence:
159+
task_files:
160+
- skill-rollouts.json
161+
```
162+
163+
Run:
164+
165+
```bash
166+
uv run codexopt improve
167+
```
168+
169+
CodexOpt copies the repo to a temporary directory, writes the candidate
170+
`SKILL.md`, runs the verifier, and uses pass rate as a strong reward signal.
171+
172+
## Add Codex Rollouts
173+
174+
Use Codex rollouts when you want to test how Codex behaves with a candidate
175+
skill.
176+
177+
Create `codex-rollouts.json`:
178+
179+
```json
180+
[
181+
{
182+
"name": "codex-release-notes",
183+
"backend": "codex",
184+
"description": "Ask Codex to use the candidate release skill on a release-note task.",
185+
"codex_prompt": "Use the local release skill to update CHANGELOG.md for a patch release.",
186+
"timeout_seconds": 120,
187+
"expected_final_response_contains": "CHANGELOG.md",
188+
"expected_command_contains": "git status",
189+
"expected_file_change": "CHANGELOG.md",
190+
"expected_file_contains": {
191+
"path": "CHANGELOG.md",
192+
"contains": "Patch"
193+
}
194+
}
195+
]
196+
```
197+
198+
Run live mode:
199+
200+
```bash
201+
uv run codexopt improve --live
202+
```
203+
204+
CodexOpt runs `codex exec --json` in a temporary repo copy and records the
205+
trajectory:
206+
207+
- final response
208+
- command executions
209+
- file changes
210+
- token usage
211+
- errors
212+
213+
## What SkillOpt Means In CodexOpt
214+
215+
CodexOpt now includes SkillOpt-style discipline in the Codex workflow:
216+
217+
- train and validation task splits
218+
- bounded edits
219+
- validation-gated acceptance
220+
- rollout-based reward when available
221+
- textual feedback that drives reflective mutation
222+
223+
For most users, the entry point is still simple:
224+
225+
```bash
226+
uv run codexopt improve --live
227+
```

0 commit comments

Comments
 (0)