add Optimize-experiment skill for Claude#375
Conversation
…dule import to first use This fixes proxy issues seen when Claude runs tests in it's sandbox
0435f58 to
32429a9
Compare
|
@claude code-review |
|
Claude finished @yangm2's task in 2m 36s —— View job Code Review
Overall this is a well-structured, well-documented, and thoroughly-tested PR. The lazy-evaluator refactor is a clean fix for the import-time network side effect, and the STOPGAP retrieval-check tooling is carefully reasoned (the trace-check vs. live-requery short-circuit logic is genuinely thoughtful). Below are mostly minor observations — nothing blocking. Lazy evaluator construction (
|
What type of PR is this? (check all applicable)
Description
This adds an
/optimize-experiment <experiment-name>skill with the associated tools so that Claude can do a static analysis of traces from an experiment. It can also compare the tool calls/results from an experiment on a new datastore to determine whether hacks/workarounds in the system prompt can be removed.Related Tickets & Documents
QA Instructions, Screenshots, Recordings
Please replace this line with instructions on how to test your changes, a note on the devices and browsers this has been tested on, as well as any relevant images for UI changes.
Added/updated tests?
Documentation
Architecture.mdhas been updated[optional] Are there any post deployment tasks we need to perform?