feat: add GitHub Actions CI example for security regression testing#45
Conversation
mertsatilmaz
left a comment
There was a problem hiding this comment.
Thanks for working on this. This is the right direction and the docs are useful, but I cannot merge it as-is because the workflow currently does not actually guarantee failure when a regression is detected.
The main issue is this statement in the docs:
“The harness CLI exits with a non-zero code when an assertion fails.”
That is not true for the current CLI behavior. agent-harness run can emit result JSON with "result": "fail" while still exiting successfully. So the GitHub Actions workflow needs an explicit result-checking step after writing the JSON files.
Please update the workflow to parse the result JSON files and fail the job if any result has "result": "fail".
For example, after the agent-harness run ... --out results/...json steps, add a step like this:
- name: Fail if any regression result failed
run: |
python - <<'PY'
import json
import pathlib
import sys
failed = []
for path in pathlib.Path("results").glob("*.json"):
result = json.loads(path.read_text(encoding="utf-8"))
if result.get("result") == "fail":
failed.append(f"{path}: {result.get('scenario_id')} failed")
if failed:
print("Security regression failures detected:")
for item in failed:
print(f"- {item}")
sys.exit(1)
print("No failing security regression results detected.")
PYPlease also update docs/ci-github-actions.md to explain the actual behavior:
- the harness writes machine-readable result JSON
- the workflow fails by checking for
"result": "fail"in the result files not_runmay still appear when an assertion is recognized but not implemented yet- this first example treats only
"fail"as a CI failure
A few smaller requests:
-
I do not want this PR to add another active workflow to this repository yet. Please move the example workflow out of
.github/workflows/and place it under something like:docs/examples/github-actions/security-regression.ymlThen update
docs/ci-github-actions.mdto explain that users can copy that file into.github/workflows/security-regression.ymlin their own project. That keeps this PR as a CI usage example without adding extra CI jobs to this repository on every PR. -
The basic goal-hijack scenario currently includes
no_secret_disclosure, which still reportsnot_run. That is okay if documented, but the docs should not imply every assertion is fully enforced in that example. -
Please keep the artifact upload step with
if: always()so result JSON files are still uploaded even when the result-checking step fails.
After these fixes, this should be a solid copy-pasteable GitHub Actions example that actually fails CI when a regression result is detected.
4720792 to
2ff5342
Compare
|
Hey @mertsatilmaz, Thanks for the detailed feedback. |
|
@RajGajjar-01 thanks for your contribution, welcome to the team. I made some changes on the error handling/documentation. LGTM now. |
Clarified the description of the workflow in the document.
|
Thank you. |
Closes #18
What this adds
.github/workflows/security-regression.yml: a GitHub Actions workflowthat runs security regression scenarios on every push and pull request
docs/ci-github-actions.md: explains how the workflow works, how passand fail are triggered, and how to adapt it for another project
Approach
Uses
--trace-filemode so the workflow requires no live agent, no API keys,and produces deterministic results on every run. The example uses the
scenarios and trace files already present in the repository.
Testing
Ran all commands locally before opening this PR.