Skip to content

Proposal: verified_skills mode — deterministic, auditable skill evolution (no GPU required) #55

@nutstrut

Description

@nutstrut

Hi — I’ve been working on a verification layer for agent execution and evolution, and MetaClaw’s skills_only mode is a perfect fit for this.

This proposal adds deterministic verification to skill promotion without requiring RL or GPU compute.

Proposal: verified_skills mode — GPU-free skill evolution with deterministic improvement verification

Summary

MetaClaw's skills_only mode is excellent for operators without GPU access. However, skill promotion currently relies on LLM judgment — which is subjective and unauditable. This proposal adds a verified_skills mode that gates skill promotion on deterministic, cryptographically signed verification, eliminating the need for cloud RL while producing a trustworthy, auditable skill evolution history.


The Gap

In skills_only mode, the evolution loop works like this:

Conversation ends
    ↓
LLM analyzes session
    ↓
New skills extracted and summarized
    ↓
Skills promoted to permanent library

The problem: there is no deterministic check that a promoted skill actually improved performance. Promotion is gated on LLM self-evaluation, which is:

  • Subjective (same inputs can produce different verdicts)
  • Unverifiable by third parties
  • Not auditable across environments
  • Vulnerable to regression (a skill that hurts performance can be promoted)

In RL mode, weight updates provide a learning signal — but require cloud compute. In skills_only mode, there is no equivalent verification mechanism at all.


The Proposal: verified_skills mode

Add a new operating mode that sits between skills_only and rl:

Mode          | GPU | Cloud | Verified | Auditable
--------------|-----|-------|----------|----------
skills_only   | No  | No    | No       | No
verified_skills (new) | No | No | Yes | Yes
rl            | No  | Yes   | Partial  | No
madmax        | No  | Yes   | Partial  | No

verified_skills adds one step to the existing skills_only loop:

Conversation ends
    ↓
LLM analyzes session (existing)
    ↓
New skills extracted (existing)
    ↓
NEW: Spec defined — "what does improvement look like?"
    ↓
NEW: SettlementWitness verifies deterministically
         PASS → skill promoted with receipt_id attached
         FAIL → skill rejected, counter-evidence logged
         INDETERMINATE → flagged for human review
    ↓
NEW: receipt_id stored in skill metadata
    ↓
Full audit trail — every promoted skill is provably verified

What SettlementWitness Is

SettlementWitness is a stateless verification oracle for agent workflows. It evaluates whether an output matches a specification and returns a cryptographically signed receipt (SAR — Settlement Attestation Receipt).

Key properties relevant to MetaClaw:

  • Deterministic — identical inputs always produce identical verdicts
  • Ed25519 signed — receipts are cryptographically verifiable
  • Offline verifiable — no callbacks required after receipt is issued
  • Stateless — no session state, no dependencies
  • Free during adoption — no cost for integration

The verification call is simple:

import httpx

response = httpx.post(
    "https://defaultverifier.com/settlement-witness",
    json={
        "task_id": f"skill-evolution-{skill_name}-{timestamp}",
        "agent_id": f"{wallet}:metaclaw",
        "spec": {
            "improvement_type": "skill_promotion",
            "skill_name": skill_name,
            "expected": "skill improves agent performance on defined criteria"
        },
        "output": {
            "skill_name": skill_name,
            "skill_content": skill_content,
            "evaluation_criteria": criteria,
            "evaluation_result": evaluation_result
        }
    }
)

receipt = response.json()
verdict = receipt["receipt_v0_1"]["verdict"]  # PASS | FAIL | INDETERMINATE
receipt_id = receipt["receipt_v0_1"]["receipt_id"]

Skill Metadata with Receipt

When a skill is promoted under verified_skills mode, its metadata includes the verification receipt:

{
  "skill_name": "handle_api_rate_limits",
  "version": "1.0",
  "promoted_at": "2026-04-01T08:00:00Z",
  "verified": true,
  "receipt_id": "sha256:14be931e638ef93d043edc0c3feaf37bcbab33691b25997fefcef1b9b9062d00",
  "verifier_kid": "sar-prod-ed25519-02",
  "verdict": "PASS",
  "promotion_source": "verified_skills"
}

Unverified skills (promoted in skills_only mode) remain valid — this is fully backward compatible.


Rollback Logic

verified_skills mode enables something skills_only cannot: safe rollback.

# If a previously promoted skill later receives FAIL verdicts
# across N sessions, trigger rollback:

if fail_count >= ROLLBACK_THRESHOLD:
    skill.status = "reverted"
    skill.revert_reason = f"Failed verification {fail_count} times"
    skill.reverted_at = timestamp
    # Log counter-evidence receipt

This turns MetaClaw's skill library from append-only to self-correcting.


Why This Matters for Academic Rigor

MetaClaw's technical report describes skill evolution as a core contribution. The current implementation has a reproducibility gap: skill promotion decisions are made by an LLM judge whose outputs are non-deterministic across environments.

verified_skills mode closes this gap:

  • Every promotion decision is deterministically reproducible
  • Every receipt can be independently verified by any third party
  • Skill evolution history becomes a cryptographically auditable record
  • Results are comparable across environments and deployments

This directly strengthens the empirical claims in the technical report.


Implementation Scope

The change is contained and non-breaking:

New config option:

mode: verified_skills  # new option alongside skills_only, rl, madmax
verification:
  endpoint: "https://defaultverifier.com/settlement-witness"
  agent_id: "your_wallet:metaclaw"
  rollback_threshold: 3  # fail count before rollback triggers
  require_pass_for_promotion: true

New dependency:

httpx  # already likely present for proxy

No GPU. No cloud training backend. No Tinker API key required.

Files affected:

  • metaclaw/config.py — add verified_skills mode + verification config
  • metaclaw/skills/ — add receipt metadata to skill storage format
  • metaclaw/trainer.py or equivalent — add verification gate before promotion
  • README.md — document new mode in the mode comparison table

Integration Reference

A working reference implementation is available:

  • Live endpoint: https://defaultverifier.com/settlement-witness
  • Public key registry: https://defaultverifier.com/.well-known/sar-keys.json
  • TypeScript SDK: npm install sar-sdk ([sarprotocol.org](https://sarprotocol.org))
  • Spec: https://defaultverifier.com/spec/sar-v0.1
  • MCP server: https://defaultverifier.com/mcp

The endpoint is live, deterministic, and free to call. No API key required.


What This Enables for MetaClaw Users

For operators without GPU access:

  • Full skill evolution capability without cloud compute costs
  • Verified improvement history they can trust and audit

For researchers:

  • Reproducible skill promotion decisions
  • Cryptographic audit trail for empirical claims
  • Cross-environment comparability

For the ecosystem:

  • Promoted skills carry receipt_id — any downstream system can verify
  • Skill libraries become portable, trustworthy artifacts
  • Third parties can audit MetaClaw evolution history independently

Relationship to RL Mode

verified_skills is not a replacement for RL — it's a complement:

verified_skills  → behavioral improvement via verified skill injection
                   no weight updates, no cloud, full verification

rl / madmax      → weight updates via cloud training
                   + optional SAR verification of weight update outcomes

A future verified_rl mode could add SAR verification gates to weight updates as well — only applying updates that produce PASS outcomes across a validation set.


Offer

Happy to:

  • Contribute a reference implementation PR
  • Provide test fixtures and sample receipts for validation
  • Coordinate with the AIMING Lab team on spec alignment

This feels like a natural extension of MetaClaw’s architecture — bringing reproducibility and auditability to skill evolution.
The verification infrastructure is live and production-ready. Integration is a contained addition to the existing skill promotion flow.


Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions