Skip to content

Commit 6e278ed

Browse files
authored
Merge pull request #141 from LAA-Software-Engineering/feat/111-run-attribution
feat(state,security): tenant/thread/actor attribution for runs and traces
2 parents 83c4217 + 9b94309 commit 6e278ed

30 files changed

Lines changed: 1625 additions & 56 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
88

99
### Added
1010

11+
- **Run attribution** (issue #111): `tenant_id`, `thread_id`, `actor_id`, `parent_run_id`, `request_id`, `idempotency_key`, and `source` on `runs`; trace events carry matching tenant/thread/actor for filterable logs and inspector queries. `agentctl run` accepts `--tenant-id`, `--thread-id`, `--actor-id` (local defaults `tenant-1` / `thread-1` / `user-1`); `agentctl logs` and `GET /api/runs` filter by the same dimensions. `--resume` reuses persisted `run_id` and `thread_id`. OTel spans emit `gen_ai.tenant.id`, `gen_ai.thread.id`, `gen_ai.actor.id`, and `gen_ai.request.id`. See [`docs/ATTRIBUTION.md`](docs/ATTRIBUTION.md).
1112
- **Trace payload redaction** (issue #110): trace events are sanitized, key-redacted, and size-capped before SQLite storage. Defaults mask common secret key names; override via `Project.spec.traces.redactKeys`, `maxPayloadBytes`, and `spec.traces.redaction` (`maxDepth`, `maxBytes` for binary previews, `maxStringChars`). HITL edit `argsDiff` is redacted before persistence. Local runs use [trace.NewRecorderForGraph] from project spec.
1213
- **Optional OpenTelemetry trace export** (issue #108): `Project.spec.telemetry` (`enabled`, `serviceName`, `endpoint` with `env:` tokens, `consoleExport`) emits WayFind-aligned `gen_ai.*` spans (`agent.run`, `model.chat`, `tool.exec`, `approval`) alongside SQLite traces. Disabled by default; init failures log a warning and never fail runs. See [`docs/OTEL.md`](docs/OTEL.md) for a Jaeger quick start.
1314
- **`agentctl inspect --web`** — read-only local inspector (default `http://127.0.0.1:8787`) over SQLite state: runs, trace timeline, run steps, applied deployment resources, and checkpoints ([#109](https://github.com/LAA-Software-Engineering/agentic-control-plane/issues/109)).
@@ -20,6 +21,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
2021

2122
### Changed
2223

24+
- **`agentctl logs` table output** (issue #111): the default (non-JSON) run list adds `TENANT`, `THREAD`, and `ACTOR` columns. Scripts that parse fixed column positions should switch to `-o json` or match by header names.
2325
- **Breaking — tool calls without explicit policy are no longer unrestricted.** Previously, `CheckToolCall` with a nil [spec.PolicySpec] allowed all tools. Now fail-closed safety always applies from the project graph (even when the workflow omits `spec.policy` or the Policy resource is missing).
2426
- Tools with **no** `spec.safety` block behave as **untrusted with side effects** after normalization → require `--approve` unless an explicit `approvals.requiredFor` rule matches.
2527

docs/ATTRIBUTION.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Run attribution (tenant, thread, actor)
2+
3+
Issue [#111](https://github.com/LAA-Software-Engineering/agentic-control-plane/issues/111) adds lightweight tenancy and attribution to `runs` and `trace_events`.
4+
5+
## Fields
6+
7+
| Field | Purpose |
8+
| --- | --- |
9+
| `tenant_id` | Outermost multi-tenant scope |
10+
| `thread_id` | Session continuity across runs and `--resume` |
11+
| `actor_id` | Who triggered the run (caller-asserted for now) |
12+
| `parent_run_id` | Lineage for sub-runs (not set on resume of the same run) |
13+
| `request_id` | Per-invocation correlation id (distinct from `run_id`) |
14+
| `idempotency_key` | Client reference key (stored only; dedupe is not enforced yet) |
15+
| `source` | Origin label (`cli`, `actions`, `api`, …) |
16+
17+
Trace events duplicate `tenant_id`, `thread_id`, and `actor_id` from the parent run so `logs` and the inspector can filter without joins.
18+
19+
## CLI defaults (local only)
20+
21+
When flags are omitted, `agentctl run` stores:
22+
23+
- `tenant_id`: `tenant-1`
24+
- `thread_id`: `thread-1`
25+
- `actor_id`: `user-1`
26+
- `source`: `cli`
27+
28+
**Do not rely on these defaults in CI or production.** A stderr warning is emitted when defaults apply. For CI/prod, pass real actor ids, set env vars, or enable the guardrail:
29+
30+
```bash
31+
export AGENTCTL_REQUIRE_ATTRIBUTION=1
32+
# or: agentctl run ... --require-attribution --tenant-id ... --thread-id ... --actor-id ...
33+
```
34+
35+
Env overrides when flags are omitted: `AGENTCTL_TENANT_ID`, `AGENTCTL_THREAD_ID`, `AGENTCTL_ACTOR_ID`.
36+
37+
```bash
38+
agentctl run workflow/demo \
39+
--tenant-id acme \
40+
--thread-id prod-review-42 \
41+
--actor-id github-actions@acme
42+
```
43+
44+
Filter history:
45+
46+
```bash
47+
agentctl logs --tenant-id acme --thread-id prod-review-42
48+
```
49+
50+
## Resume
51+
52+
`agentctl run --resume <run-id>` reuses the original `run_id` and `thread_id` from the persisted run row. Attribution flags on resume are ignored so thread timelines stay coherent. `--parent-run-id` is for genuine sub-runs, not resumes.
53+
54+
## Inspector API
55+
56+
`GET /api/runs` accepts optional query parameters:
57+
58+
- `tenant_id`
59+
- `thread_id`
60+
- `actor_id`
61+
- `workflow`
62+
- `limit`
63+
64+
## OpenTelemetry
65+
66+
When telemetry is enabled, spans emit `gen_ai.tenant.id`, `gen_ai.thread.id`, `gen_ai.actor.id`, `gen_ai.run.id`, and `gen_ai.request.id` alongside existing gen_ai attributes. See [OTEL.md](./OTEL.md).
67+
68+
## request_id
69+
70+
When omitted, [state.RuntimeStore.StartRun] assigns a new UUID via `util.NewRequestID()`. Legacy rows migrated from pre-005 databases may have `request_id == run_id`.
71+
72+
## Idempotency key
73+
74+
`idempotency_key` is persisted and exposed in JSON for future dedupe. There is no unique index or at-most-once enforcement in this release — do not assume idempotent execution from the key alone.
75+
76+
## Production guidance
77+
78+
- SQLite attribution is advisory; DB-level tenant isolation belongs to a future remote/Postgres store.
79+
- `actor_id` is supplied by the caller and is not authenticated in this release. Do not use attribution for access control.

internal/cli/attribution.go

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
package cli
2+
3+
import (
4+
"fmt"
5+
"io"
6+
"os"
7+
"strings"
8+
9+
"github.com/LAA-Software-Engineering/agentic-control-plane/internal/runtime"
10+
"github.com/LAA-Software-Engineering/agentic-control-plane/internal/state"
11+
)
12+
13+
// EnvRequireAttribution, when set to a truthy value, requires explicit tenant/thread/actor ids on run.
14+
const EnvRequireAttribution = "AGENTCTL_REQUIRE_ATTRIBUTION"
15+
16+
// EnvTenantID overrides --tenant-id when the flag is omitted.
17+
const EnvTenantID = "AGENTCTL_TENANT_ID"
18+
19+
// EnvThreadID overrides --thread-id when the flag is omitted.
20+
const EnvThreadID = "AGENTCTL_THREAD_ID"
21+
22+
// EnvActorID overrides --actor-id when the flag is omitted.
23+
const EnvActorID = "AGENTCTL_ACTOR_ID"
24+
25+
func resolveRunAttributionFlags(tenantID, threadID, actorID, parentRunID, requestID, idempotencyKey, source string, requireFlag bool) runtime.WorkflowRunOptions {
26+
return runtime.WorkflowRunOptions{
27+
TenantID: firstNonEmpty(strings.TrimSpace(tenantID), os.Getenv(EnvTenantID)),
28+
ThreadID: firstNonEmpty(strings.TrimSpace(threadID), os.Getenv(EnvThreadID)),
29+
ActorID: firstNonEmpty(strings.TrimSpace(actorID), os.Getenv(EnvActorID)),
30+
ParentRunID: strings.TrimSpace(parentRunID),
31+
RequestID: strings.TrimSpace(requestID),
32+
IdempotencyKey: strings.TrimSpace(idempotencyKey),
33+
Source: strings.TrimSpace(source),
34+
RequireAttribution: requireFlag || envTruthy(EnvRequireAttribution),
35+
}
36+
}
37+
38+
func applyRunAttributionOpts(opts *runtime.WorkflowRunOptions, tenantID, threadID, actorID, parentRunID, requestID, idempotencyKey, source string, requireFlag bool) {
39+
if opts == nil {
40+
return
41+
}
42+
resolved := resolveRunAttributionFlags(tenantID, threadID, actorID, parentRunID, requestID, idempotencyKey, source, requireFlag)
43+
opts.TenantID = resolved.TenantID
44+
opts.ThreadID = resolved.ThreadID
45+
opts.ActorID = resolved.ActorID
46+
opts.ParentRunID = resolved.ParentRunID
47+
opts.RequestID = resolved.RequestID
48+
opts.IdempotencyKey = resolved.IdempotencyKey
49+
opts.Source = resolved.Source
50+
opts.RequireAttribution = resolved.RequireAttribution
51+
}
52+
53+
// warnAttributionDefaults writes a one-line stderr warning when local attribution defaults apply.
54+
func warnAttributionDefaults(w io.Writer, attr state.RunAttribution) {
55+
if w == nil || !state.UsesAttributionDefaults(attr) {
56+
return
57+
}
58+
_, _ = fmt.Fprintf(w, "warning: run attribution using local defaults (tenant-1/thread-1/user-1); set --tenant-id, --thread-id, and --actor-id or AGENTCTL_REQUIRE_ATTRIBUTION=1 in CI/prod\n")
59+
}
60+
61+
func firstNonEmpty(values ...string) string {
62+
for _, v := range values {
63+
if strings.TrimSpace(v) != "" {
64+
return strings.TrimSpace(v)
65+
}
66+
}
67+
return ""
68+
}
69+
70+
func envTruthy(name string) bool {
71+
switch strings.ToLower(strings.TrimSpace(os.Getenv(name))) {
72+
case "1", "true", "yes", "on":
73+
return true
74+
default:
75+
return false
76+
}
77+
}

internal/cli/attribution_test.go

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
package cli
2+
3+
import (
4+
"bytes"
5+
"io"
6+
"strings"
7+
"testing"
8+
9+
"github.com/LAA-Software-Engineering/agentic-control-plane/internal/state"
10+
)
11+
12+
func TestResolveRunAttributionFlags_envOverrides(t *testing.T) {
13+
t.Setenv(EnvTenantID, "env-tenant")
14+
t.Setenv(EnvThreadID, "env-thread")
15+
t.Setenv(EnvActorID, "env-actor")
16+
17+
got := resolveRunAttributionFlags("", "", "", "", "", "", "", false)
18+
if got.TenantID != "env-tenant" || got.ThreadID != "env-thread" || got.ActorID != "env-actor" {
19+
t.Fatalf("env overrides: %+v", got)
20+
}
21+
22+
got = resolveRunAttributionFlags("flag-tenant", "", "", "", "", "", "", false)
23+
if got.TenantID != "flag-tenant" {
24+
t.Fatalf("flag wins: %+v", got)
25+
}
26+
}
27+
28+
func TestWarnAttributionDefaults(t *testing.T) {
29+
var buf bytes.Buffer
30+
warnAttributionDefaults(&buf, state.RunAttribution{})
31+
if !strings.Contains(buf.String(), "warning:") || !strings.Contains(buf.String(), "tenant-1") {
32+
t.Fatalf("warn: %q", buf.String())
33+
}
34+
buf.Reset()
35+
warnAttributionDefaults(&buf, state.RunAttribution{TenantID: "t", ThreadID: "th", ActorID: "a"})
36+
if buf.Len() != 0 {
37+
t.Fatalf("no warn expected: %q", buf.String())
38+
}
39+
}
40+
41+
func TestRun_requireAttributionRejectsDefaults(t *testing.T) {
42+
root := runProjRoot(t)
43+
db := t.TempDir() + "/req.db"
44+
45+
ResetGlobalsForTest()
46+
cmd := NewRootCmd()
47+
cmd.SetOut(io.Discard)
48+
var errBuf bytes.Buffer
49+
cmd.SetErr(&errBuf)
50+
cmd.SetArgs([]string{
51+
"run", "workflow/demo",
52+
"--project", root,
53+
"--state", db,
54+
"--input", "topic=x",
55+
"--require-attribution",
56+
})
57+
err := cmd.Execute()
58+
if err == nil {
59+
t.Fatal("expected validation error")
60+
}
61+
if !strings.Contains(err.Error(), "attribution required") {
62+
t.Fatalf("err = %v", err)
63+
}
64+
}
65+
66+
func TestRun_requireAttributionViaEnv(t *testing.T) {
67+
t.Setenv(EnvRequireAttribution, "1")
68+
root := runProjRoot(t)
69+
db := t.TempDir() + "/req-env.db"
70+
71+
ResetGlobalsForTest()
72+
cmd := NewRootCmd()
73+
cmd.SetOut(io.Discard)
74+
cmd.SetErr(io.Discard)
75+
cmd.SetArgs([]string{
76+
"run", "workflow/demo",
77+
"--project", root,
78+
"--state", db,
79+
"--input", "topic=x",
80+
})
81+
err := cmd.Execute()
82+
if err == nil {
83+
t.Fatal("expected validation error")
84+
}
85+
if !strings.Contains(err.Error(), "attribution required") {
86+
t.Fatalf("err = %v", err)
87+
}
88+
}
89+
90+
func TestRun_warnsOnDefaultAttribution(t *testing.T) {
91+
root := runProjRoot(t)
92+
db := t.TempDir() + "/warn.db"
93+
94+
ResetGlobalsForTest()
95+
cmd := NewRootCmd()
96+
cmd.SetOut(io.Discard)
97+
var errBuf bytes.Buffer
98+
cmd.SetErr(&errBuf)
99+
cmd.SetArgs([]string{
100+
"run", "workflow/demo",
101+
"--project", root,
102+
"--state", db,
103+
"--input", "topic=warn-test",
104+
})
105+
if err := cmd.Execute(); err != nil {
106+
t.Fatal(err)
107+
}
108+
if !strings.Contains(errBuf.String(), "warning:") {
109+
t.Fatalf("stderr = %q", errBuf.String())
110+
}
111+
}
112+
113+
func TestEnvTruthy(t *testing.T) {
114+
t.Setenv("AGENTCTL_TEST_TRUTHY", "yes")
115+
if !envTruthy("AGENTCTL_TEST_TRUTHY") {
116+
t.Fatal("expected truthy")
117+
}
118+
t.Setenv("AGENTCTL_TEST_TRUTHY", "0")
119+
if envTruthy("AGENTCTL_TEST_TRUTHY") {
120+
t.Fatal("expected false")
121+
}
122+
}

0 commit comments

Comments
 (0)