[CI/CD Assessment] CI/CD Assessment: Pipeline Status and Quality Gaps #2073

2026-04-18T12:48:31Z

github-actions[bot]
Bot Apr 18, 2026

📊 Current CI/CD Pipeline Status

The repository has a well-structured, multi-layer CI/CD pipeline with 19 standard workflows and 29 agentic workflows. Most standard workflows run on PRs targeting main. The overall pipeline health is good, with the majority of checks passing consistently.

Recent run outcomes (last 50 runs):

Status	Count
✅ Success	38
❌ Failure	10
⏭️ Skipped	1

Notable recurring failures: Smoke tests (Claude, Copilot, Codex, BYOK, OpenCode, Services), Performance Monitor, Dependency Vulnerability Audit, Build Test Suite, Security Guard.

✅ Existing Quality Gates

Gate	Workflow	Runs On
ESLint (TypeScript)	`lint.yml`	PR + push
Markdown lint	`lint.yml`	PR + push
TypeScript type check	`test-integration.yml`	PR + push
Build verification (Node 20 & 22)	`build.yml`	PR + push
Unit tests with coverage	`test-coverage.yml`	PR + push
Integration tests (end-to-end, Docker)	`test-integration.yml`	PR + push
Chroot language & sandbox tests	`test-chroot.yml`	PR + push
Examples test	`test-examples.yml`	PR + push
PR title semantic check	`pr-title.yml`	PR
CodeQL (JS/TS + Actions)	`codeql.yml`	PR + push + weekly
Dependency vulnerability audit	`dependency-audit.yml`	PR + push + weekly
Link checking	`link-check.yml`	(on push)
Performance benchmarking	`performance-monitor.yml`	Scheduled daily
Smoke tests (Claude, Copilot, Codex, etc.)	6 smoke workflows	PR + scheduled
Security guard (agentic)	`security-guard.md`	PR
Build test suite (multi-language)	`build-test.md`	PR

🔍 Identified Gaps

🔴 High Priority

1. Low unit test coverage with weak thresholds

Current coverage: ~38% statements, ~32% branches
The two most critical files have near-zero coverage: cli.ts (0%) and docker-manager.ts (18%)
Thresholds are set at 38%/30%/35%/38% — barely above the current baseline, providing no enforcement incentive
Any regression in these core orchestration files goes undetected

2. Smoke tests are consistently failing and not blocking PRs

All 6 smoke workflow types (Claude, Copilot, Codex, BYOK, OpenCode, Services) show recent failures
Smoke tests require reactions to trigger manually or run on a schedule — they are not required status checks blocking PR merge
A PR could ship broken agent integration and pass all required checks

3. Integration tests not run for all changed paths

test-integration.yml runs on all PRs, but only the chroot tests (test-chroot.yml) have scoped paths: filtering
The domain/network, protocol/security, and container/ops test categories (~195 tests) have no dedicated CI workflow — they depend on the generic integration test run, whose scope is unclear from the config

4. dependency-audit.yml consistently failing

Recent runs show repeated failures on both PR and push triggers
Failing security audits that don't block merges create a false sense of security
Need to distinguish audit failures (vulnerabilities found) from check infrastructure failures

🟡 Medium Priority

5. No coverage diff enforcement on PRs

test-coverage.yml runs baseline comparison but only posts a comment — there is no hard gate preventing coverage regression
A PR could drop coverage from 38% to 30% (within threshold) with no warning

6. Performance benchmark not integrated into PR flow

performance-monitor.yml runs on schedule only (daily) — PR authors get no feedback on whether their change caused startup/runtime regressions
The benchmark infrastructure already exists in scripts/ci/benchmark-performance.ts

7. No container image security scanning (Trivy/Grype)

Three Docker images (squid, agent, api-proxy) are built and published but there is no automated CVE scan of the container images themselves
CodeQL covers source code; npm audit covers Node deps — but base image vulnerabilities (OS packages in ubuntu:22.04, ubuntu/squid) are not scanned

8. Security Guard is an agentic check, not a deterministic gate

security-guard.md is an LLM-based security review on PRs — it has shown recent failures (likely infra/model issues)
There is no deterministic static analysis complement (e.g., eslint-plugin-security, semgrep rules) that would reliably catch common vulnerability patterns

9. No enforcement of action pinning / workflow security in CI

Some workflows use unpinned actions/checkout@v4 (e.g., performance-monitor.yml) while others are pinned to SHAs
poutine or zizmor security scanners are available in the agenticworkflows-compile tool but not wired into any standard PR check

🟢 Low Priority

10. No artifact/bundle size tracking

dist/ output size is not monitored; a PR that accidentally pulls in a large transitive dependency would be undetected
build-bundle.mjs exists, suggesting bundle awareness — could add size checks

11. Link checker not scoped/reported clearly

link-check.yml appears to run but its trigger conditions are not on PRs explicitly; broken doc links in PRs may not be caught before merge

12. No Node.js 18 LTS compatibility test

Build matrix covers Node 20 and 22 but not 18 (still in maintenance LTS); users on older Node versions could hit incompatibilities

13. No automated changelog/release notes validation on PRs

update-release-notes.md runs post-release; there is no check that significant PRs include changelog entries or that version bumps are consistent

📋 Actionable Recommendations

1. Raise coverage thresholds incrementally (High · Low complexity · High impact)

Update jest.config.js thresholds to ratchet upward (e.g., statements: 50, branches: 40) and add cli.ts and docker-manager.ts to a per-file threshold config. This forces coverage improvement with each PR cycle.

2. Make smoke tests required status checks (High · Low complexity · High impact)

Configure branch protection to require at least one smoke test workflow (e.g., smoke-copilot) as a required status check. For the others, fix the recurring infrastructure failures so they are reliable enough to gate merges.

3. Add dedicated CI workflow for domain/network and security integration tests (High · Low complexity · High impact)

The ~195 integration tests for domain filtering, protocol security, and container ops are spread across files but have no dedicated workflow job. Add explicit Jest --testPathPattern runs for these groups in test-integration.yml or a new test-security.yml.

4. Fix or quarantine the dependency audit failures (High · Low complexity · High impact)

Investigate and resolve the recurring dependency-audit.yml failures. If vulnerabilities exist with no fix available, use npm audit --production --audit-level=high to set an appropriate severity gate rather than failing on all advisories.

5. Add performance regression gate on PRs (Medium · Medium complexity · High impact)

Add a PR-triggered job to performance-monitor.yml (or a new perf-check.yml) that runs npm run benchmark with a limited iteration count and fails if key metrics (e.g., startup time) regress beyond a threshold (e.g., +20%). The benchmarking infrastructure already exists.

6. Add container image scanning (Medium · Low complexity · Medium impact)

Add a step to build.yml (or a dedicated container-security.yml) that runs trivy image or grype against the locally built squid, agent, and api-proxy images. Upload results as SARIF to the Security tab.

7. Add coverage regression gate (Medium · Low complexity · Medium impact)

In test-coverage.yml, fail the workflow (not just comment) if coverage drops more than 1% on any metric compared to the base branch. The baseline comparison logic already exists — just add a hard failure step.

8. Add deterministic security linting (Medium · Medium complexity · Medium impact)

Add eslint-plugin-security to the ESLint config and/or add a semgrep step to lint.yml. This provides a reliable, non-LLM complement to the agentic Security Guard.

9. Pin all action references to commit SHAs (Low · Low complexity · Medium impact)

performance-monitor.yml and a few others use tag references (@v4). Standardize all workflows to use SHA pinning (already done in most workflows). Consider adding poutine or zizmor scanning via agenticworkflows-compile --poutine as a CI gate.

10. Add bundle size check (Low · Low complexity · Low impact)

Add a step to build.yml that checks dist/ total size and fails if it exceeds a threshold (e.g., 2MB). This prevents accidental dependency bloat.

📈 Metrics Summary

Metric	Value
Standard workflow files	19
Agentic workflow files	29 (compiled to `.lock.yml`)
Workflows running on PRs	12 (standard) + 7 (agentic)
Unit test files	19 (src/) + integration/
Integration test files	~26 files, ~265 tests
Current statement coverage	38.39% (threshold: 38%)
Current branch coverage	31.78% (threshold: 30%)
`cli.ts` coverage	0% ⚠️
`docker-manager.ts` coverage	18% ⚠️
Recent workflow success rate	38/49 runs (77.6%)
Smoke test recent success rate	1/7 (14%) ⚠️

The pipeline has solid foundations — semantic PR titles, multi-Node build matrix, CodeQL, dependency auditing, and a rich integration test suite. The primary gaps are low coverage enforcement on critical files, unreliable smoke tests that are not required checks, and missing container image/bundle security scanning.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 337.1K · ◷

expires on Apr 25, 2026, 12:48 PM UTC

2026-04-25T12:55:39Z

github-actions[bot]
Bot Apr 25, 2026
Author

This discussion was automatically closed because it expired on 2026-04-25T12:48:31.594Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Assessment: Pipeline Status and Quality Gaps #2073

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Assessment: Pipeline Status and Quality Gaps #2073

Uh oh!

github-actions[bot] Bot Apr 18, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

1. Raise coverage thresholds incrementally (High · Low complexity · High impact)

2. Make smoke tests required status checks (High · Low complexity · High impact)

3. Add dedicated CI workflow for domain/network and security integration tests (High · Low complexity · High impact)

4. Fix or quarantine the dependency audit failures (High · Low complexity · High impact)

5. Add performance regression gate on PRs (Medium · Medium complexity · High impact)

6. Add container image scanning (Medium · Low complexity · Medium impact)

7. Add coverage regression gate (Medium · Low complexity · Medium impact)

8. Add deterministic security linting (Medium · Medium complexity · Medium impact)

9. Pin all action references to commit SHAs (Low · Low complexity · Medium impact)

10. Add bundle size check (Low · Low complexity · Low impact)

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Apr 25, 2026 Author

github-actions[bot]
Bot Apr 18, 2026

github-actions[bot]
Bot Apr 25, 2026
Author