-
Notifications
You must be signed in to change notification settings - Fork 0
179 lines (165 loc) · 7.56 KB
/
nightly-provider-autonomy.yml
File metadata and controls
179 lines (165 loc) · 7.56 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# Provider Autonomy Probes
#
# Phase 2.1 of docs/roadmap/non-claude-provider-autonomy.md.
#
# Runs `scripts/nightly_provider_e2e.py` on demand (manual dispatch only)
# against the direct API providers Auto Code supports. Probes use real
# credentials from repository secrets so the persisted provider-smoke
# history accumulates evidence the AutonomyPolicy gate consumes when
# deciding whether a direct provider can be promoted to `full_autonomous`.
#
# Each provider is probed `runs_per_provider` times (default 3) in one run
# so the persisted history accumulates the trailing pass streak the
# AutonomyPolicy gate counts toward `min_stable_runs` — one run can
# therefore make a provider promotion-eligible (Option C, Phase 2.1).
# `provider-smoke-history.json` is git-tracked (see .gitignore) so the
# committed evidence reaches the gate's consumer via a normal pull.
#
# Outputs:
# - `.auto-claude/runtime/nightly-provider-summary.json` (run summary)
# - artifact upload (kept for 30 days)
# - shields.io readiness badges under `.github/badges/`
# - bot PR that updates the tracked `provider-smoke-history.json` and the
# badges when `open_history_pr` is set (you merge it to develop)
name: Provider Autonomy Probes
on:
workflow_dispatch:
inputs:
providers:
description: 'Space-separated providers to probe (default: all known)'
required: false
type: string
default: 'openai google openrouter litellm zhipuai ollama'
timeout_seconds:
description: 'Per-provider timeout in seconds'
required: false
type: string
default: '600'
runs_per_provider:
description: 'Probe each provider N times in one run so the persisted history accumulates the promotion-gate pass streak (min_stable_runs, default 3)'
required: false
type: string
default: '3'
open_history_pr:
description: 'Open a follow-up PR if provider-smoke-history.json changes'
required: false
type: boolean
default: false
permissions:
# contents: write is required for the history-update step to push the
# bot/nightly-provider-history branch (create-pull-request). It only
# pushes that bot branch + opens a PR — develop is never auto-pushed.
contents: write
pull-requests: write
concurrency:
group: nightly-provider-autonomy
cancel-in-progress: false
jobs:
probe:
name: Probe direct API providers
runs-on: ubuntu-latest
timeout-minutes: 90
env:
# Provider credentials are pulled from repository secrets so they
# never appear in the workflow file or process arguments.
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
LITELLM_API_KEY: ${{ secrets.LITELLM_API_KEY }}
LITELLM_API_BASE: ${{ secrets.LITELLM_API_BASE }}
ZHIPUAI_API_KEY: ${{ secrets.ZHIPUAI_API_KEY }}
# AutonomyPolicy is locked to standard for probe runs so the gate
# values are reproducible across nights; overrides come from
# repository variables when an enterprise tunes them.
AUTO_CODE_AUTONOMY: claude
AUTO_CODE_AUTONOMY_PRESET: standard
# Run the live fault-injection probes (unsupported_tools,
# gateway_model_limitations). The promotion gate requires their coverage,
# so this workflow — whose purpose is to produce promotion evidence —
# opts in. Without it the gate sits at "live_fault not_configured" even
# with a stable pass history.
AUTO_CODE_PROVIDER_E2E_LIVE_FAULT_PROBES: 'true'
steps:
- uses: actions/checkout@v4
- name: Setup Python backend
uses: ./.github/actions/setup-python-backend
- name: Run nightly provider probes
id: probe
env:
# Defaults apply when a dispatch input is omitted.
PROVIDERS: ${{ inputs.providers || 'openai google openrouter litellm zhipuai ollama' }}
TIMEOUT_SECONDS: ${{ inputs.timeout_seconds || '600' }}
# Probe each provider this many times so one nightly run leaves a
# promotion-eligible pass streak in the persisted history. Defaults
# to the standard preset's min_stable_runs (3).
RUNS_PER_PROVIDER: ${{ inputs.runs_per_provider || '3' }}
run: |
set -euo pipefail
mkdir -p .auto-claude/runtime
python scripts/nightly_provider_e2e.py \
--providers $PROVIDERS \
--backend-dir apps/backend \
--timeout "$TIMEOUT_SECONDS" \
--runs-per-provider "$RUNS_PER_PROVIDER" \
--allow-provider-failures \
--output .auto-claude/runtime/nightly-provider-summary.json
- name: Upload nightly summary artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: nightly-provider-summary
path: |
.auto-claude/runtime/nightly-provider-summary.json
.auto-claude/runtime/provider-smoke-history.json
retention-days: 30
if-no-files-found: warn
- name: Render autonomy readiness badges
if: steps.probe.outcome == 'success'
env:
PROVIDERS: ${{ inputs.providers || 'openai google openrouter litellm zhipuai ollama' }}
run: |
set -uo pipefail
# Drive with AUTO_CODE_AUTONOMY=safe so the gate evaluates the
# recorded evidence rather than short-circuiting on the off knob.
# Use the backend venv interpreter (setup-python-backend does not
# put it on PATH). One provider failing must not fail the job.
for provider in $PROVIDERS; do
AUTO_CODE_AUTONOMY=safe apps/backend/.venv/bin/python apps/backend/run.py \
--autonomy-readiness "$provider" --json 2>/dev/null \
| apps/backend/.venv/bin/python scripts/render_autonomy_badge.py \
--provider "$provider" \
--output ".github/badges/autonomy-$provider.json" || true
done
- name: Open history-update PR
# Fires on manual dispatch when open_history_pr is set, so the
# accumulated provider-smoke history + readiness badges reach develop
# (you merge when ready — nothing is auto-pushed to a protected branch).
if: >
steps.probe.outcome == 'success'
&& inputs.open_history_pr == true
uses: peter-evans/create-pull-request@c5a7806660adbe173f04e3e038b0ccdcd758773c # v6.1.0
with:
token: ${{ secrets.GITHUB_TOKEN }}
branch: bot/nightly-provider-history
delete-branch: true
add-paths: |
.auto-claude/runtime/provider-smoke-history.json
.github/badges
commit-message: |
chore(autonomy): nightly provider-smoke history update
Automated update from nightly-provider-autonomy.yml.
Summary attached to the workflow run as the
nightly-provider-summary artifact.
title: 'chore(autonomy): nightly provider-smoke history update'
body: |
Automated history update from the nightly provider autonomy
probe. The corresponding summary is attached to the workflow
run as the `nightly-provider-summary` artifact.
This PR is generated only when:
1. The nightly probe job succeeded (per-provider failures
still allow the PR; transport-level failures do not).
2. `provider-smoke-history.json` actually changed.
See `docs/roadmap/non-claude-provider-autonomy.md` Phase 2.
labels: |
autonomy
bot