Summary
A fleet lease outlives its own TTL. Acquire one with a short ttl_seconds, wait
past expires_at, and it stays state: "granted". The expiry never triggers a
reclaim. A second agent that asks for the same scope lands in state: "requested"
and waits there, instead of being granted the lease that should already be free.
Environment
- memtrace 0.6.11
- macOS (Apple Silicon)
- Embedded MemDB, fleet tools called over the MCP transport
Steps to reproduce
- Acquire a lease with a short TTL:
fleet_acquire_lease(repo_id, agent_id: "agent-A", scope: ["sym_x"], ttl_seconds: 3)
The response is state: "granted" with an expires_at about 3 seconds out.
- Wait well past the TTL (for example 25 seconds).
- Check the lease:
fleet_preflight(repo_id, touched: ["sym_x"]) still lists the lease as
state: "granted", with expires_at now in the past.
- From a second agent, acquire the same scope:
fleet_acquire_lease(repo_id, agent_id: "agent-B", scope: ["sym_x"])
returns state: "requested" (queued behind the stale holder), not granted.
- Re-check much later: a lease can still be
granted about 10 minutes past its
recorded expires_at.
Observed
- Expired leases linger as
granted. No expired state transition is observed.
- A re-acquire of an expired scope is blocked (
requested) until an explicit
fleet_release_lease or an engine restart.
Expected
After the TTL elapses, the lease should become reclaimable (released or expired)
so another agent can acquire the scope. Today the only recovery is an explicit
release, which a crashed or forgetful holder will never issue.
Impact
One holder that crashes, or just forgets to release, blocks the scope for every
other agent. There is no automatic recovery path back to a usable lease.
Note
Fleet intents behave differently here. A published intent does drop out on its
TTL: it disappears from node state once the window closes. The TTL mechanism
clearly works for intents, so the gap is specific to leases. Aligning the two
would resolve this.
Summary
A fleet lease outlives its own TTL. Acquire one with a short
ttl_seconds, waitpast
expires_at, and it staysstate: "granted". The expiry never triggers areclaim. A second agent that asks for the same scope lands in
state: "requested"and waits there, instead of being granted the lease that should already be free.
Environment
Steps to reproduce
fleet_acquire_lease(repo_id, agent_id: "agent-A", scope: ["sym_x"], ttl_seconds: 3)The response is
state: "granted"with anexpires_atabout 3 seconds out.fleet_preflight(repo_id, touched: ["sym_x"])still lists the lease asstate: "granted", withexpires_atnow in the past.fleet_acquire_lease(repo_id, agent_id: "agent-B", scope: ["sym_x"])returns
state: "requested"(queued behind the stale holder), notgranted.grantedabout 10 minutes past itsrecorded
expires_at.Observed
granted. Noexpiredstate transition is observed.requested) until an explicitfleet_release_leaseor an engine restart.Expected
After the TTL elapses, the lease should become reclaimable (released or expired)
so another agent can acquire the scope. Today the only recovery is an explicit
release, which a crashed or forgetful holder will never issue.
Impact
One holder that crashes, or just forgets to release, blocks the scope for every
other agent. There is no automatic recovery path back to a usable lease.
Note
Fleet intents behave differently here. A published intent does drop out on its
TTL: it disappears from node state once the window closes. The TTL mechanism
clearly works for intents, so the gap is specific to leases. Aligning the two
would resolve this.