Approaching-AI · xu16601526267 · May 13, 2026
diff --git a/.claude/skills/aima-operate/SKILL.md b/.claude/skills/aima-operate/SKILL.md
@@ -0,0 +1,167 @@
+---
+name: aima-operate
+description: Operate AIMA through its MCP tools and knowledge domain. Use when Codex needs to deploy, debug, inspect, tune, benchmark, troubleshoot, or review AIMA; when working with AIMA MCP tools such as hardware.detect, knowledge.resolve, deploy.apply, deploy.status, deploy.logs, benchmark.run, knowledge.save, knowledge.evaluate, knowledge.promote, or central.sync; and when turning real AIMA usage into evidence-backed reusable knowledge.
+---
+
+# AIMA Operate
+
+Use this skill as the agent-facing operating layer for AIMA. Keep AIMA itself as the runtime and knowledge source; use this skill to choose the path, call the right tools, interpret results, and decide what evidence should be saved.
+
+## Example Prompts
+
+- "Check what this machine can run with AIMA."
+- "Deploy this model through AIMA and verify it is ready."
+- "AIMA will not start; inspect status and logs."
+- "Run a benchmark for this deployment and tell me whether the result is reusable."
+- "Save this fix so AIMA can avoid the same problem next time."
+
+## Non-Negotiables
+
+- Treat AIMA MCP tools as the primary interface and source of truth.
+- Use CLI commands only when MCP is unavailable, incomplete, or the user is explicitly working in a shell-only context.
+- Do not create a separate long-term knowledge store inside this skill.
+- Save reusable findings through AIMA's knowledge domain, not through ad hoc notes.
+- Promote knowledge only when deployment, benchmark, hardware, model, engine, and config evidence are present.
+- Keep central sync optional. A local workflow must still succeed when central is offline.
+- Prefer small, reversible steps: inspect, resolve, dry run when useful, apply, verify, record.
+
+## Bootstrap
+
+Before taking action, establish the current operating surface:
+
+1. Confirm whether the current workspace is an AIMA checkout or a consumer project that has AIMA configured.
+2. Check whether AIMA MCP tools are available in the active agent environment.
+3. If MCP tools are available, use the actual exposed tool list as the API contract.
+4. If MCP tools are not available, check whether the `aima` CLI exists and inspect local help before guessing commands.
+5. Capture the AIMA version, target runtime, and deployment context when available.
+6. If neither MCP nor CLI is available, explain what is missing and stop before inventing commands.
+
+## Risk Gates
+
+Pause for explicit user confirmation before destructive or broad-impact actions:
+
+- Delete operations: `deploy.delete`, `model.remove`, `engine.remove`.
+- Reset or identity operations: `device.reset`, credential/token rotation, enrollment reset.
+- Overrides or shared state changes: `catalog.override`, central push/sync, global config writes.
+- Any action that may stop a running model service or replace a known-good deployment.
+
+When fixing failures:
+
+- Prefer `deploy.dry_run` before apply when deployment changes are non-trivial.
+- Change one variable at a time, then verify.
+- Do not remove models, engines, images, or historical knowledge as an automatic cleanup step.
+- Treat central sync as a final optional step, not part of the critical path.
+
+## First Move
+
+1. Identify the user's intent: detect, deploy, inspect, troubleshoot, benchmark, tune, fleet, or knowledge capture.
+2. Run the Bootstrap checks above.
+3. If MCP is available, use the actual exposed tools and the intent mapping from `references/tool-map.md`.
+4. If MCP is not available, use the nearest AIMA CLI fallback from `references/tool-map.md`.
+5. Apply Risk Gates before destructive, shared-state, or broad-impact actions.
+6. Read `references/knowledge-contract.md` before saving, evaluating, promoting, or syncing knowledge.
+
+## Intent Router
+
+### Detect Capability
+
+Use when the user asks what the machine can run, whether hardware fits a model, or why a target does not fit.
+
+Default path:
+
+1. Collect hardware and runtime facts with `hardware.detect` and `hardware.metrics`.
+2. Scan known models and engines with `model.scan` / `model.list` and `engine.scan` / `engine.list`.
+3. Resolve candidate config with `knowledge.resolve`.
+4. Summarize fit, blockers, and the lowest-risk next action.
+
+### Deploy Or Start AIMA Workload
+
+Use when the user asks to deploy, run, start, or make a model service available through AIMA.
+
+Default path:
+
+1. Resolve the configuration with `knowledge.resolve`.
+2. Inspect planned runtime changes with `deploy.dry_run` when a dry run would reduce risk.
+3. Apply with `deploy.apply` or run with `deploy.run` depending on the project convention.
+4. Verify with `deploy.status`, `deploy.list`, and `system.status`.
+5. If not ready, immediately switch to the troubleshoot path.
+6. If ready, propose or run benchmark capture when the result may become reusable knowledge.
+
+### Inspect Status Or Logs
+
+Use when the user asks whether AIMA is healthy, what is running, or why a deployment behaves oddly.
+
+Default path:
+
+1. Query `system.status` and `deploy.list`.
+2. Query `deploy.status` for exact runtime config, labels, restarts, exit code, and detailed state.
+3. Query `deploy.logs` for failure context.
+4. Avoid assuming `deploy.list` contains raw config or labels; use `deploy.status` for detail.
+
+### Troubleshoot Failure
+
+Use when deployment fails, readiness stalls, throughput is bad, logs show errors, or AIMA cannot find a model or engine.
+
+Default path:
+
+1. Capture status: `system.status`, `system.diagnostics`, `deploy.status`, `deploy.logs`.
+2. Capture resource state: `hardware.metrics`.
+3. Re-resolve config: `knowledge.resolve` with the current model, engine, slot, and overrides.
+4. Classify the failure: hardware fit, model asset, engine asset, container/runtime, config, port/API, benchmark profile, or knowledge mismatch.
+5. Make one minimal fix at a time.
+6. Re-run verification after each fix.
+7. If a fix works, create an evidence-backed knowledge note candidate.
+
+### Benchmark Or Tune
+
+Use when the user asks to measure performance, compare configs, tune parameters, or decide which deployment is better.
+
+Default path:
+
+1. Reuse a ready deployment if possible.
+2. Run `benchmark.run` or record externally measured results with `benchmark.record`.
+3. Keep benchmark profile fields: concurrency, requests, warmup, rounds, input/output token shape, and duration.
+4. Capture resource observations during the benchmark window.
+5. Evaluate with `knowledge.evaluate`.
+6. Save with `knowledge.save` only when the evidence contract is satisfied.
+
+### Knowledge Capture
+
+Use when the user asks to remember a fix, make a rule, update the knowledge base, sync knowledge, or avoid repeating a problem.
+
+Default path:
+
+1. Read `references/knowledge-contract.md`.
+2. Decide whether the finding is a draft note, candidate, validated config, or golden rule.
+3. Save locally with `knowledge.save` when structured evidence exists.
+4. Evaluate with `knowledge.evaluate`.
+5. Promote with `knowledge.promote` only after validation.
+6. Sync with `central.sync` only as an optional final step.
+
+## Output Style
+
+When reporting back, keep the result operational:
+
+- What was checked.
+- What AIMA reported.
+- What action was taken or should be taken next.
+- Whether evidence is strong enough to save or promote.
+- Any residual risk or manual decision needed.
+
+Use this compact template when useful:
+
+```text
+Conclusion:
+Checked:
+Found:
+Action:
+Knowledge:
+Next:
+```
+
+Avoid long explanations of AIMA internals unless the user asks. The useful answer is usually the next safe action plus the evidence behind it.
+
+## References
+
+- Use `references/tool-map.md` for AIMA MCP tool and CLI fallback selection.
+- Use `references/knowledge-contract.md` before saving or promoting knowledge.
diff --git a/.claude/skills/aima-operate/agents/openai.yaml b/.claude/skills/aima-operate/agents/openai.yaml
@@ -0,0 +1,7 @@
+interface:
+  display_name: "AIMA Operate"
+  short_description: "Operate and troubleshoot AIMA via MCP"
+  default_prompt: "Use $aima-operate to inspect, deploy, troubleshoot, benchmark, and save evidence-backed AIMA knowledge."
+
+policy:
+  allow_implicit_invocation: true
diff --git a/.claude/skills/aima-operate/references/knowledge-contract.md b/.claude/skills/aima-operate/references/knowledge-contract.md
@@ -0,0 +1,185 @@
+# AIMA Knowledge Contract
+
+Do not treat every useful observation as reusable knowledge. AIMA knowledge should be local-first, evidence-backed, and promoted only after validation.
+
+## Principle
+
+The skill does not own long-term knowledge. Save reusable findings through AIMA's knowledge domain:
+
+- `knowledge.save` for structured notes and findings.
+- `knowledge.evaluate` for validation, engine switch cost, or open questions.
+- `knowledge.promote` for validated configurations or rules.
+- `central.sync` only after local evidence is stable and sync is useful.
+
+## Knowledge States
+
+Use this lifecycle when describing or saving operational findings:
+
+```text
+draft -> candidate -> validated -> golden -> deprecated
+```
+
+- `draft`: raw observation, incomplete evidence, or one-off debugging note.
+- `candidate`: likely useful, but not yet proven across a benchmark or repeated run.
+- `validated`: backed by deploy artifacts, benchmark results, and hardware/model/engine context.
+- `golden`: repeatedly useful enough to become a default recommendation.
+- `deprecated`: obsolete because of newer engine, model, hardware, benchmark, or AIMA behavior.
+
+## Knowledge Types
+
+Use different evidence thresholds for different kinds of knowledge:
+
+### Incident Note
+
+Use for debugging history, failure symptoms, local workarounds, and lessons learned.
+
+Minimum evidence:
+
+- symptom or error signal
+- deployment or command context
+- observed environment
+- action taken
+- result after the action
+
+Incident notes may remain `draft` or `candidate`. They do not require benchmark evidence.
+
+### Config Candidate
+
+Use for a configuration that appears reusable for a model, engine, hardware profile, or deployment shape.
+
+Minimum evidence:
+
+- hardware profile
+- model
+- engine and version when known
+- deploy config
+- readiness result
+- relevant logs or status output
+- at least one resource observation
+
+Config candidates can become `validated` after successful deployment evidence and at least one meaningful measurement. Benchmark evidence is preferred, but a lightweight measured run may be enough for early validation if the limitation is recorded.
+
+### Golden Rule
+
+Use only for stable defaults, recommended parameters, or high-confidence compatibility rules.
+
+Minimum evidence:
+
+- complete deployment evidence
+- benchmark profile and results
+- resource observations
+- repeated success or strong reason to trust one result
+- clear applicability scope and limitations
+
+Golden rules must not be created from a single log interpretation or unmeasured workaround.
+
+## Minimum Evidence For Golden Promotion
+
+Promote only when these fields are known or explicitly unavailable with a reason:
+
+Identity:
+
+- hardware profile or GPU architecture
+- model
+- engine asset ID
+- engine version
+- engine image when containerized
+- benchmark ID
+- config ID
+
+Deployment:
+
+- real deploy config, not just a benchmark profile
+- important engine parameters, such as tensor parallelism, memory fractions, offload settings, max running requests, or GPU layers
+- deployment phase/status and readiness outcome
+
+Benchmark profile:
+
+- concurrency
+- number of requests
+- warmup count
+- rounds
+- input token shape
+- output token shape
+- duration
+
+Performance:
+
+- TTFT p50/p95/p99 when available
+- TPOT p50/p95 when available
+- throughput
+- QPS
+- error rate
+- sample count
+- stability
+
+Resource observation:
+
+- peak VRAM during benchmark window
+- peak RAM during benchmark window
+- average GPU utilization during benchmark window
+- average CPU utilization during benchmark window
+- average power draw when available
+
+## Save Pattern
+
+Use this shape when constructing a knowledge note or summarizing what should be saved:
+
+```yaml
+kind: knowledge_note
+title: "<hardware + model + engine + outcome>"
+status: candidate
+context:
+  hardware_profile: "<profile or observed hardware>"
+  model: "<model>"
+  engine: "<engine>"
+  engine_version: "<version>"
+  engine_image: "<image, if known>"
+evidence:
+  benchmark_id: "<benchmark id>"
+  config_id: "<config id>"
+  deploy_status: "<ready|failed|degraded>"
+  logs: "<short failure or success signal>"
+result:
+  ttft_p95_ms: null
+  throughput_tps: null
+  error_rate: null
+  stability: "<stable|unstable|unknown>"
+resources:
+  vram_peak_mib: null
+  ram_peak_mib: null
+  gpu_utilization_avg_pct: null
+  cpu_utilization_avg_pct: null
+lesson:
+  summary: "<what changed or what was learned>"
+  recommendation: "<reuse guidance>"
+scope:
+  applicable_to:
+    - "<hardware or scenario>"
+  limitations:
+    - "<where not to apply>"
+```
+
+## What Not To Promote
+
+Keep these as `draft` or plain run notes:
+
+- A fix with no benchmark or deploy evidence.
+- A guess based only on a log line.
+- A result that depends on an undocumented local hack.
+- A one-time workaround with unknown version or hardware scope.
+- A note that conflicts with newer validated evidence.
+
+## Conflict Handling
+
+When findings conflict, do not overwrite older knowledge blindly.
+
+Rank candidates by:
+
+1. Hardware and model match.
+2. Benchmark evidence quality.
+3. Deployment readiness and stability.
+4. Recency.
+5. Repeat count.
+
+Mark stale findings as `deprecated` only when there is clearer evidence, not just a newer opinion.