Skip to content

[reliability] Daily Reliability Review - 2026-05-20 #33648

Description

@github-actions

Executive Summary

Over the last 24h Sentry recorded 5,926 spans (5,867 gen_ai, 59 default) from github/gh-aw. Most workflows completed: 2,901 spans carry gh-aw.run.status:success vs 32 spans failure across 6 workflows. No errors or logs events were ingested in the same window — an explicit observability gap, not a sign of health. Latency tail is heavy: 7 individual gen_ai spans exceeded 15 min (max ~32 min), all on long-running scheduled jobs and all marked success.

Core reliability fields are missing or null on the bulk of spans: span.status is null on 100% of spans, gen_ai.response.finish_reasons and gh-aw.run.conclusion are absent from the queryable index, and release / service.version are not populated. This means runtime outcome cannot be inferred from traces alone — most "failure" signal is currently locked behind a single attribute (gh-aw.run.status) emitted only on the conclusion span.

A representative cross-check also surfaced a within-run inconsistency: trace 2fa055d5fa45b83d898cc0908a369e65 (PR Sous Chef, run 26166375489) carries both success and failure values of gh-aw.run.status on different spans, yet gh run view reports the run's overall conclusion as success. Worth investigating before relying on that attribute as the canonical failure signal.

Top Reliability Findings

Priority Workflow Problem Evidence Next Action
P1 All gh-aw workflows gen_ai.response.finish_reasons and gh-aw.run.conclusion not queryable; span.status null on 100% of spans (5,923) spans dataset, 24h, has:gen_ai.response.finish_reasons → 0 results; has:gh-aw.run.conclusion → 0 results; span.status aggregate → null for both ops Verify conclusion-span emission path in actions/setup/js/send_otlp_span.cjs (lines 1745, 1798–1799) — confirm conclusion spans are being exported and that Sentry is indexing these attribute keys
P1 Errors / Logs datasets No events ingested in 24h window dataset:errors and dataset:logs both return zero rows Confirm Sentry SDK / OTLP exporter is configured to forward error + log signals (current setup appears to emit spans only)
P2 Contribution Check 8 failure spans in 24h (avg 94s, max 5.3 min); confirmed real failure on run 26179902753 (conclusion=failure) spans dataset, gh-aw.run.status:failure grouped by workflow Open targeted investigation of Contribution Check job; current cadence implies repeated failure pattern (8 failure spans across at least 2 runs in 24h)
P2 PR Sous Chef 8 failure-marked spans, but cross-check of run 26166375489 shows GH conclusion=success; both success and failure appear in the same trace trace 2fa055d5fa45b83d898cc0908a369e65 shows spans with both run.status values for the same run.id Audit gh-aw.run.status emission — confirm it's not being written from an intermediate step (e.g. retried agent attempt) that doesn't reflect the run's final conclusion
P2 Safe Output Health Monitor 4 failure spans, but max span duration 12.7 min — suggests slow failure path gh-aw.run.status:failure grouped by workflow, max(span.duration)=766180ms Inspect the long failure span; if this is a timeout, finish_reasons would distinguish it from a logic error — but that attribute is currently not present in the index
P3 Daily Security Observability Report, Copilot Session Insights, Typist - Go Type Analysis, GitHub API Consumption Report, Copilot Agent Prompt Clustering, Copilot PR Conversation NLP Analysis, Daily AW Cross-Repo Compile Check 7 single-span outliers > 15 min (max 32 min) spans dataset, span.duration:>900000, all gh-aw.run.status:success Confirm whether these are expected wall-clock budgets for these workflows. If unintended, add per-workflow latency SLO before treating as regressions
P3 All gh-aw spans release (Sentry) and service.version (OTLP resource) not present in index has:release → all 5,926 rows have release=null; has:service.version → 0 rows Map OTLP resource attribute service.version to Sentry release in the project's ingest config — required for regression-by-version analysis
P3 Test Quality Sentinel, Matt Pocock Skills Reviewer, PR Code Quality Reviewer Highest input-token consumers (135M, 108M, 86M input tokens / 24h) spans dataset, sum(gen_ai.usage.input_tokens) grouped by workflow Inconclusive whether this is truncation-driven without gen_ai.response.finish_reasons; once that attribute is queryable, recheck for length finishes

Representative Traces

View representative traces
  • Longest gen_ai span (32 min, success) — Daily Security Observability Report, span c1b79721b772812e, trace 00b0c204449765a74224a36a160e77c2. Single very-long gen_ai span at 2026-05-20T16:48:44Z; no failure markers; example of latency outlier with no observable cause attribute.
  • Run.status divergence — PR Sous Chef run 26166375489, trace 2fa055d5fa45b83d898cc0908a369e65. Spans 669eff6fe7906642 (6m 31s) and 2ec351c8bf044ecd (62s) both carry gh-aw.run.status:failure, while interleaved spans carry success. GitHub Actions reports the overall run as success.
  • Confirmed real failure — Contribution Check run 26179902753, trace 291367aad0386117e8f212775a33bf37. 2 failure-marked spans (5m 6s and 55s); gh run view confirms conclusion=failure.
  • Slow failure path — Safe Output Health Monitor failure span, max duration 12.7 min. Investigate why a monitor workflow takes >12 min on the failure path.

Recommendations

  1. Restore conclusion attributes in the span index. send_otlp_span.cjs:1798-1799 claims gen_ai.response.finish_reasons is emitted on the conclusion span, but it returns 0 results in the spans dataset over 24h. Either the conclusion span is not reaching Sentry, the attribute is being dropped/renamed during ingest, or Sentry's span index is not capturing it. Pick one: validate locally with /tmp/gh-aw/otel.jsonl, or open the conclusion span in Sentry UI to confirm whether the attribute is present at the event level but excluded from the queryable index.
  2. Map service.version to Sentry release. Resource attribute is emitted at send_otlp_span.cjs:322 but does not appear in the index — likely an ingest-side mapping. Without it, regression-by-version triage is impossible.
  3. Audit gh-aw.run.status semantics. A single run with both success and failure spans is a signal that this attribute is set per-step/attempt rather than per-run. Either (a) restrict emission to the final conclusion span only, or (b) rename mid-run status to gh-aw.step.status and reserve gh-aw.run.status for the terminal value.
  4. Forward errors and logs to Sentry. Both datasets returned zero rows in 24h, which is almost certainly under-instrumentation rather than zero errors. Confirm exporter scope before the next reliability review so the report can include error-class evidence.

Notes

View notes
  • Inconclusive runtime outcome for the latency outliers. All 7 spans > 15 min are marked gh-aw.run.status:success, but without gen_ai.response.finish_reasons or OTLP status.code, we cannot distinguish a long successful run from a runaway one. Treat the latency table as observation-only until conclusion attributes are queryable.
  • span.status is null in 100% of sampled spans. This is the OTLP status.code mapping in Sentry; emit-side sets it at send_otlp_span.cjs:295, but it is not appearing in the index. Suggests the same ingest gap as release / service.version.
  • gh-aw.run.conclusion is not queryable. Considered for cross-checking with gh-aw.run.status, but it returns 0 rows on has: queries.
  • gen_ai.response.finish_reasons:length returned 0 rows. This is consistent with the attribute being absent entirely, not with a real absence of truncated responses. Token-heavy workflows (Test Quality Sentinel: 134M input tokens / 24h) cannot be evaluated for truncation until this is fixed.
  • errors and logs datasets are empty for the project in the 24h window. This is reported as an instrumentation/forwarding gap, not as "no failures occurred."
  • The Sentry MCP build available here exposes list_events only; search_events and get_trace_details were not available. Trace continuity was verified by list_events filtered on trace:<id>. All trace links above are direct UI links.

References:

Generated by 🚨 Daily Reliability Review · ● 10.4M ·

  • expires on May 22, 2026, 11:22 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions