This project is a declarative control plane for agent systems.
The goal is not to be another Python agent framework.
The goal is to let teams define agents, tools, workflows, policies, and environments as versioned YAML, then:
- validate
- diff
- plan
- apply
- run
- trace
- govern
them like real systems.
The closest mental model is:
- Terraform for desired state and plan/apply
- Kubernetes for declarative resources and reconciliation
- GitOps for versioned, reviewable changes
- OpenAPI for explicit contracts
This document defines:
- Go project structure
- YAML spec v0
- CLI UX and commands
- internal engine architecture
- MVP vs end goal
Today, most agent systems are built as:
- Python/JS code
- prompts embedded in source
- tool bindings hidden in runtime code
- weak contracts
- unclear permissions
- poor change review
- little or no drift detection
- little or no governance
This creates:
- prompt spaghetti
- hidden behavior changes
- weak reproducibility
- hard reviews
- weak deployment discipline
- poor portability across runtimes/providers
We want a system where teams can say:
“This is the desired shape of my agent system.”
And the platform can answer:
- Is it valid?
- What changed?
- What risk changed?
- What will be applied?
- What is deployed?
- What drifted?
- What happened during execution?
This project is not:
- a foundation model runtime
- a model serving system
- a training platform
- an attempt to standardize chain-of-thought
- a replacement for every orchestration framework
- a magic auto-agent builder
This project does not try to define:
- exact internal reasoning behavior
- latent planner internals
- hidden model state
- model training/inference kernels
It defines the control plane around agent systems.
Users define desired state in YAML.
Inputs, outputs, tools, permissions, and policies should be explicit.
Deployment state and execution traces are different things.
Specs should not be hard-coupled to one runtime.
Behavior, cost, permissions, and policy changes should be diffable.
Tool permissions, approvals, budgets, and policy limits must be first-class.
MVP should be local and simple. End state can add remote runtimes and reconciliation.
The system manages these resource types:
- Project
- Agent
- Tool
- Workflow
- Policy
- Environment
- ModelProvider
- MemoryStore later
- RuntimeTarget later
- Module later
Each resource has:
apiVersionkindmetadataspec
This is intentionally Kubernetes-like because agent systems are graph-shaped and nested.
agentctl/
cmd/
agentctl/
main.go
internal/
app/
app.go
wiring.go
cli/
root.go
validate.go
plan.go
apply.go
diff.go
run.go
logs.go
inspect.go
test.go
spec/
types.go
kinds.go
loader.go
parser.go
normalize.go
defaults.go
validator.go
refs.go
errors.go
schema/
jsonschema.go
registry.go
validate.go
project/
loader.go
resolver.go
graph.go
plan/
planner.go
diff.go
risk.go
cost.go
output.go
state/
store.go
models.go
sqlite/
store.go
migrations.go
memory/
store.go
apply/
applier.go
executor.go
checkpoint.go
runtime/
runtime.go
local/
runtime.go
runner.go
interfaces/
tool_runtime.go
agent_runtime.go
workflow_runtime.go
engine/
workflow.go
steps.go
interpolation.go
execution.go
approvals.go
retries.go
timeout.go
tools/
registry.go
mcp/
client.go
transport_stdio.go
transport_http.go
http/
client.go
native/
registry.go
models/
registry.go
openai/
client.go
anthropic/
client.go
local/
client.go
policy/
engine.go
evaluator.go
approvals.go
budget.go
permissions.go
trace/
recorder.go
events.go
reader.go
logs/
printer.go
formatter.go
testkit/
runner.go
fixtures.go
assertions.go
module/
resolver.go
lockfile.go
render/
yaml.go
json.go
table.go
util/
fs.go
ids.go
clock.go
errors.go
slices.go
api/
proto/
controlplane.proto
execution.proto
pkg/
sdk/
types.go
examples/
minimal/
pr-review/
incident-triage/
docs/
spec-v0.md
architecture.md
scripts/
generate.sh
lint.sh
test.sh
migrations/
sqlite/
postgres/
go.mod
go.sum
Makefile
Binary entrypoint.
Command definitions, flag parsing, output formatting.
Parsing YAML resources, type definitions, defaults, normalization, reference resolution.
JSON Schema loading and validation for structured inputs/outputs.
Loads a project directory, merges resources, resolves imports.
Computes desired vs current state diff, plus risk/cost delta.
Stores deployment state and runtime metadata. MVP: SQLite. Later: Postgres backend.
Takes a plan and mutates runtime/control-plane state.
Runtime abstraction. MVP: local runtime only. Later: remote runtimes.
Workflow execution engine, step orchestration, retries, interpolation.
Tool abstraction and integrations. MVP: native mock tools + MCP stdio. Later: HTTP, gRPC, plugins.
Model abstraction and providers.
Permission checks, budget checks, approval rules, safety gates.
Structured execution events and trace persistence.
Fixture-driven workflow tests.
Later feature for reusable modules and lockfiles.
Optional internal control-plane API for remote mode later.
Every YAML file uses:
apiVersion: agentic.dev/v0
kind: <Kind>
metadata:
name: <resource-name>
labels: {}
annotations: {}
spec: {}Rules:
apiVersionrequiredkindrequiredmetadata.namerequired, DNS-like identifierlabelsoptionalannotationsoptionalspecrequired
MVP:
ProjectAgentToolWorkflowPolicyEnvironment
End goal later:
ModuleMemoryStoreRuntimeTargetApprovalPolicyas separate kindSecretRefSchedule
Defines root project settings and imports.
apiVersion: agentic.dev/v0
kind: Project
metadata:
name: platform-assistant
spec:
imports:
- ./agents
- ./tools
- ./workflows
- ./policies
- ./env
defaults:
runtime: local
model: openai/gpt-4.1
policy: default
providers:
models:
openai:
type: openai
apiKeyFrom: env:OPENAI_API_KEY
anthropic:
type: anthropic
apiKeyFrom: env:ANTHROPIC_API_KEY
tools:
mcp:
enabled: true
state:
backend: sqlite
dsn: .agentic/state.db
traces:
backend: sqlite
retentionDays: 14importsare relative file or directory pathsdefaultsapply when resources omit explicit valuesprovidersconfigures integrationsstateandtracesare local in MVP
Defines an agent contract and runtime binding.
apiVersion: agentic.dev/v0
kind: Agent
metadata:
name: reviewer
spec:
description: Reviews pull requests for correctness, security, and maintainability.
model: openai/gpt-4.1
instructions: |
You are a senior code reviewer.
Prioritize correctness, security, and maintainability.
Cite concrete evidence from tool outputs when possible.
tools:
- github
- docs
policy: default
memory:
type: session
maxMessages: 20
constraints:
maxIterations: 8
timeoutSeconds: 90
temperature: 0.2
requireStructuredOutput: true
input:
schema: ./schemas/review-input.json
output:
schema: ./schemas/review-output.jsondescriptionmodelinstructionstoolspolicymemory.typeconstraintsinput.schemaoutput.schema
- few-shot examples
- tool-choice policy
- retrieval sources
- external memory refs
- model fallback chains
- cost ceilings per agent
- redaction rules
- audit annotations
Defines an external capability.
apiVersion: agentic.dev/v0
kind: Tool
metadata:
name: github
spec:
type: mcp
mcp:
transport: stdio
command: npx
args:
- -y
- "@modelcontextprotocol/server-github"
permissions:
allow:
- pull_requests.read
- issues.read
- contents.read
deny: []
retry:
maxAttempts: 3
backoff: exponentialRemote MCP servers may expose a single JSON-RPC endpoint over HTTP(S). Set transport: http, url to that endpoint, and optional headers (including env: tokens). command / args must not be set together with url (see validator).
spec:
type: mcp
mcp:
transport: http
url: https://mcp.example.com/v1/mcp
headers:
Authorization: env:MCP_TOKENapiVersion: agentic.dev/v0
kind: Tool
metadata:
name: webhook
spec:
type: http
http:
baseUrl: https://api.example.com
headers:
Authorization: env:API_TOKEN
permissions:
allow:
- request.send
retry:
maxAttempts: 2
backoff: fixedmcphttpnativemock/local
- gRPC
- queue
- SQL
- filesystem
- plugin SDK
Defines graph execution.
apiVersion: agentic.dev/v0
kind: Workflow
metadata:
name: pr-review
spec:
description: Review a pull request and post a summary.
trigger:
type: manual
input:
schema: ./schemas/pr-review-input.json
policy: default
steps:
- id: fetch_pr
uses: tool.github.pull_request.get
with:
repo: ${input.repo}
number: ${input.number}
- id: review
agent: reviewer
with:
pr: ${steps.fetch_pr.output}
- id: post_comment
uses: tool.github.pull_request.comment
with:
repo: ${input.repo}
number: ${input.number}
body: ${steps.review.output.summary}
output:
value:
summary: ${steps.review.output.summary}
findings: ${steps.review.output.findings}- steps execute sequentially
- each step has either
agentoruses withmaps inputs${...}interpolation supported- output can map from prior step outputs
- only manual trigger in MVP
- conditional steps
- loops
- fan-out/fan-in
- scheduled triggers
- event triggers
- human approval steps
- parallel branches
- subworkflows
Defines execution and governance limits.
apiVersion: agentic.dev/v0
kind: Policy
metadata:
name: default
spec:
execution:
maxWallClockSeconds: 180
maxTotalCostUsd: 3.00
requireStructuredOutput: true
tools:
forbidUnknownTools: true
approvals:
requiredFor:
- tool.github.pull_request.merge
- tool.slack.message.send
security:
networkAccess: restricted
secretAccess: deny-by-default- cost ceiling
- wall clock limit
- require structured output
- forbid unknown tools
- approval-required actions
- per-step policy overrides
- redaction policy
- PII handling rules
- tenant isolation
- prompt injection controls
- environment-specific policy inheritance
Overrides resources for a target environment.
apiVersion: agentic.dev/v0
kind: Environment
metadata:
name: prod
spec:
overrides:
agents:
reviewer:
model: anthropic/claude-sonnet-4
constraints:
timeoutSeconds: 60
policies:
default:
execution:
maxTotalCostUsd: 10.00- agent and policy overrides only
- tool endpoint overrides
- secret binding overrides
- runtime target selection
- scheduling overrides
- provider selection overrides
my-agent-system/
project.yaml
agents/
reviewer.yaml
incident.yaml
tools/
github.yaml
slack.yaml
workflows/
pr-review.yaml
incident-triage.yaml
policies/
default.yaml
strict.yaml
env/
dev.yaml
prod.yaml
schemas/
pr-review-input.json
review-output.json
- all resources must have unique
kind/name - all references must resolve
- all imported paths must exist
- all schemas must be readable
- environment overrides must target existing resources
- referenced tools must exist
- referenced policy must exist
- input/output schema files must exist
- constraints must be sane
- model string must match configured provider namespace or allowed local alias
- exactly one transport block for selected
type - permission actions must be valid strings
- retry values must be non-negative
- step ids must be unique
- each step must specify exactly one of
agentoruses - interpolation refs must resolve
- forward refs allowed only where dependency order is valid
- no cycles in MVP since sequential only
- budgets non-negative
- action identifiers syntactically valid
- approval actions unique
The CLI should feel like:
- Terraform in clarity
- kubectl in resource mental model
- git in inspectability
Commands should be boring, stable, and scriptable.
Create starter project.
agentctl init my-agent-systemCreates:
- project.yaml
- sample dirs
- example workflow
yes
Validate project.
agentctl validate
agentctl validate -e prodChecks:
- YAML syntax
- schema correctness
- references
- imports
- interpolation refs
- policy and permission issues
Project: platform-assistant
Environment: prod
✓ Loaded 7 resources
✓ References resolved
✓ Schemas valid
✓ Workflow pr-review valid
Validation successful
yes
Show desired vs current diff.
agentctl plan
agentctl plan -e prodPlan: 2 to add, 1 to change, 0 to delete
+ create Agent/reviewer
+ create Workflow/pr-review
~ update Policy/default
maxTotalCostUsd: 3.00 -> 10.00
Risk delta:
- cost ceiling increased
- approval scope unchanged
- no new write permissions
partial
MVP plan supports:
- create/update detection
- field diff
- basic risk summary
MVP does not support:
- remote drift detection
- advanced behavioral estimates
Apply desired state.
agentctl apply
agentctl apply -e prod
agentctl apply --auto-approveBehavior:
- runs validate
- computes plan
- prompts for approval unless
--auto-approve - writes deployment state
yes, but local only
Show detailed resource diff.
agentctl diff
agentctl diff Agent/revieweroptional but strongly recommended
Execute workflow ad hoc.
agentctl run workflow/pr-review --input repo=acme/api --input number=42
agentctl run workflow/pr-review --input-file input.jsonBehavior:
- loads deployed or local desired config depending on mode
- validates input against workflow schema
- executes steps
- stores trace
yes
Show execution traces.
agentctl logs
agentctl logs --run <run-id>
agentctl logs --workflow pr-reviewyes, basic trace/event view
Print normalized resource.
agentctl inspect Workflow/pr-review
agentctl inspect Agent/reviewer -o yamlUseful for debugging defaults and env overrides.
optional but useful
Run fixture-based tests.
agentctl test
agentctl test workflow/pr-reviewworkflow: pr-review
cases:
- name: happy-path
input:
repo: acme/api
number: 42
expect:
outputContains:
- summary
- name: invalid-number
input:
repo: acme/api
number: -1
expectError: truestretch MVP or early post-MVP
Normalize YAML formatting.
agentctl fmtnice-to-have
Inspect stored state.
agentctl state list
agentctl state show Agent/revieweroptional
-e, --env <name> environment override
-o, --output <fmt> table|json|yaml
--project <path> project root
--state <path> explicit state DB path
--no-color disable color output
0success1generic failure2validation error3plan/apply conflict4execution error5policy denial
YAML Project
↓
Loader / Parser
↓
Normalization / Defaults
↓
Reference Resolution
↓
Validation
↓
Desired State Graph
↓
Planner
↓
Apply
↓
Stored Deployment State
Run Workflow
↓
Execution Engine
↓
Policy Engine
↓
Model + Tool Adapters
↓
Trace Recorder
↓
Runtime State
Responsibilities:
- load YAML files
- decode into typed structs
- normalize defaults
- resolve imports
- resolve references
- return canonical in-memory project graph
Key types:
type ResourceID struct {
Kind string
Name string
}
type Project struct {
Meta Metadata
Spec ProjectSpec
Agents map[string]*Agent
Tools map[string]*Tool
Workflows map[string]*Workflow
Policies map[string]*Policy
Environments map[string]*Environment
}Responsibilities:
- compare desired project state against stored deployment state
- compute create/update/delete operations
- compute human-readable diffs
- compute risk summary
Key output:
type Plan struct {
Operations []Operation
Risk RiskSummary
}
type Operation struct {
Action string // create, update, delete
Target ResourceID
Diff []FieldChange
}- new write permissions
- removed approvals
- model changes
- cost cap changes
- behavioral contract widening
- policy relaxations
- prompt changes with semantic classification
- runtime target change impact
Two different state domains:
Tracks what has been applied.
Example records:
- resource kind/name
- normalized spec hash
- applied timestamp
- env target
- version
Tracks workflow runs.
Example records:
- run id
- workflow name
- start/end
- status
- step events
- tool calls
- token/cost summary
- errors
SQLite for both.
Postgres for team/shared mode.
Responsibilities:
- take a plan
- confirm/persist operations
- update deployment state
- optionally prepare runtime-specific artifacts later
MVP apply does not deploy to an external cluster. It records local deployed desired state.
That is enough to establish plan/apply discipline.
Responsibilities:
- execute workflows
- resolve step inputs
- call tools
- invoke agents
- enforce retries/timeouts
- collect outputs
- produce final workflow output
- sequential steps only
- local execution only
- no background daemons
- no reconciliation loop
-
load workflow
-
validate runtime input
-
initialize run context
-
for each step:
- resolve interpolations
- enforce policy
- execute tool or agent
- validate output if configured
- record trace
-
compute workflow output
-
persist run result
Responsibilities:
- assemble prompt payload
- attach tools
- invoke provider
- return structured output
Abstraction:
type ModelClient interface {
Generate(ctx context.Context, req GenerateRequest) (GenerateResponse, error)
}MVP:
- OpenAI-compatible
- Anthropic optional
- mock provider for tests
Responsibilities:
- resolve tool name to executable transport
- enforce permissions
- execute operation
- normalize result
Abstraction:
type ToolExecutor interface {
Call(ctx context.Context, req ToolCallRequest) (ToolCallResponse, error)
}MVP:
- MCP stdio
- HTTP
- mock/native
Responsibilities:
- decide whether a workflow/step/tool call is allowed
- enforce budgets/timeouts
- gate approval-required actions
Abstraction:
type PolicyEvaluator interface {
CheckRun(ctx context.Context, run RunContext) error
CheckStep(ctx context.Context, step StepContext) error
CheckToolCall(ctx context.Context, call ToolCallContext) error
}- workflow wall-clock budget
- total cost ceiling
- no unknown tools
- approval-required actions denied unless explicitly approved
- environment-sensitive rules
- tenant rules
- sensitive output handling
- prompt injection mitigation hooks
- egress restrictions
Responsibilities:
- append structured events
- persist for logs and debugging
- support replay-ish inspection
Event examples:
type TraceEvent struct {
RunID string
Timestamp time.Time
Type string
StepID string
Message string
Data map[string]any
}Event types:
- run.started
- run.finished
- step.started
- step.finished
- step.failed
- tool.called
- tool.completed
- model.called
- model.completed
- policy.denied
Supported syntax:
${input.foo}${steps.fetch_pr.output}${steps.review.output.summary}
MVP:
- dot path lookup only
- no expressions
- no functions
- no loops in interpolation
This should stay simple. Do not invent a scripting language.
- id: review
agent: reviewer
with:
pr: ${steps.fetch_pr.output}- id: fetch_pr
uses: tool.github.pull_request.get
with:
repo: ${input.repo}
number: ${input.number}MVP step result shape:
{
"output": {},
"meta": {
"durationMs": 1200,
"costUsd": 0.02
}
}If a step or agent has output schema, validate returned output. Failure should fail the step unless policy later allows soft-fail.
MVP: hard fail
Tool retries in MVP:
- configured per tool
- only on retryable transport/provider errors
Agent retries in MVP:
- off by default
- optional single retry on transient provider failure
Do not retry semantic failure blindly.
Suggested tables:
kindnameenvspec_hashnormalized_spec_jsonapplied_at
project_nameenvversionapplied_at
run_idworkflow_nameenvstatusstarted_atfinished_atinput_jsonoutput_jsonerror_texttotal_cost_usd
run_idstep_idstatusstarted_atfinished_atinput_jsonoutput_jsonerror_textcost_usd
run_idseqtimestamptypestep_iddata_json
No modules.
Modules should allow reuse.
Example:
module:
source: github.com/acme/agent-modules/pr-reviewer
version: 0.2.1
inputs:
model: openai/gpt-4.1
githubTool: githubNeeds:
- lockfile
- version resolution
- input schema
- module output exposure
- integrity checks
Registry later:
- public or private modules
- reusable workflows
- policy packs
- tool packs
No controller. No daemon. No remote cluster.
apply only writes local deployed state.
This is enough to prove:
- spec
- validation
- plan
- run
- trace
Add remote runtime support.
Possible modes:
- local
- server mode
- Kubernetes-backed
- Temporal-backed
- worker pool mode
At that stage, reconciliation becomes meaningful:
- desired resources stored centrally
- controller compares desired vs actual
- controller converges state
- drift detectable remotely
This is post-MVP.
- spec parser
- reference resolution
- interpolation
- planner diff
- policy checks
- state store
- run sample workflow locally
- mock model/tool providers
- SQLite state/traces
CLI output for:
- validate
- plan
- diff
Workflow-level tests via YAML fixtures.
- Project
- Agent
- Tool
- Workflow
- Policy
- Environment
- local only
- MCP stdio
- HTTP
- mock/native
- at least one provider
- mock provider for tests
initvalidateplanapplyrunlogs
- sequential workflows
- interpolation
- schema validation
- basic policy enforcement
- trace recording
- SQLite
- deployment state
- runtime traces
- table + json
- modules
- registry
- reconciliation controller
- remote shared state
- parallel execution
- scheduled/event triggers
- subworkflows
- loops/conditionals
- rich approval workflows
- distributed execution
- plugin SDK
- advanced drift detection
- semantic change classification
- multi-tenant authn/authz
The end-state system should support:
- declarative multi-agent systems
- environment-aware config
- plan/apply/diff/drift
- reusable modules
- policy packs
- remote control plane
- centralized state
- controller reconciliation
- multiple runtimes
- team collaboration
- approval workflows
- event and schedule triggers
- observability and auditability
- registry ecosystem
In other words:
a real control plane for agent systems
not just a local runner.
Foundations.
- resource structs
- YAML loader
- validation
- project graph
- SQLite state
- local runtime
- sequential workflow execution
- model/tool interfaces
- trace recorder
- core CLI
Deployment discipline.
- better plan output
- apply confirmation
- environment overrides
- richer policy engine
- diff command
- inspect command
- test command
Reuse and governance.
- modules
- lockfile
- policy packs
- richer risk summaries
- workflow approvals
Control plane.
- server mode
- remote state
- controller loop
- remote runners
- drift detection
- team auth
Practical picks:
- CLI:
cobra - YAML:
gopkg.in/yaml.v3 - config/schema helpers:
invopop/jsonschemaor JSON Schema validator libs - SQLite:
modernc.org/sqliteormattn/go-sqlite3 - table output:
charmbracelet/lipgloss+ simple table lib - gRPC later:
google.golang.org/grpc - protobuf later:
google.golang.org/protobuf
Keep dependencies conservative.
User writes:
project.yamlagents/reviewer.yamltools/github.yamlworkflows/pr-review.yamlpolicies/default.yaml
agentctl validateagentctl planOutput:
Plan: 4 to add, 0 to change, 0 to delete
+ Agent/reviewer
+ Tool/github
+ Workflow/pr-review
+ Policy/default
agentctl applyagentctl run workflow/pr-review --input repo=acme/api --input number=42agentctl logs --workflow pr-reviewThat is enough to prove the product.
Build this as:
- Go CLI
- YAML declarative spec
- SQLite local state
- local-first engine
- clear separation between deployment state and execution state
Do not start with:
- server mode
- Kubernetes operator
- module registry
- remote control plane
That is how the project dies early.
The correct MVP is:
local declarative agent systems with validate, plan, apply, run, and logs.
That is small enough to build and sharp enough to matter.