Skip to content

Commit d5364ba

Browse files
committed
chore: pre-PR cleanup sweep — strip archeology, tighten comments, drop dead narrative
Net delta across 61 files: -1629 LOC (1069 added, 2698 deleted). No behavioral changes, no test count delta — 4789 tests passing before and after. Three parallel cleanup lanes worked the branch diff against main: Dataplane substrate (~-1500 LOC across coordinator, engine_startup, permissions, models, codec, scripts, transitions, lattices, ids, errors, crash, connection, the Lua scripts, and the co-located tests). Stripped multi-paragraph rationales describing the design, shrunk module docstrings, removed historical narrative ("previously", "we now", "before this"), collapsed multi-line in-body comment blocks to one-liners. The genuinely load-bearing WHY comments stayed (the cjson directive fallback, the NOSCRIPT coalescing note, the asyncio-only contract on OneShotFlag, etc.). Migrated callers (~-570 LOC across redis_program_storage, locking, state_manager, archive_storage, bandit, fetcher, stats, sync, mutation_operator, dag_runner, selectors, strict_chat_openai). One real code smell fixed in coevolution/stats.py: replaced `for idx, (db, prefix) in enumerate(...)` with a stale dangling `del db, prefix` that referenced no log line. Dual-path narration ("when wired, route through dp; when None, fall back...") collapsed across the storage / archive / bandit dispatch sites. The backwards-compat-shim apology comments tightened to one line each. Tests + entry-point + configs (~-340 LOC across the test tree, run.py, pyproject.toml, .gitignore). Test docstrings collapsed, internal-slug "covers C3" / "T-A4" references stripped, narrative about prior bypass behaviour replaced with what-the-test-asserts-now. pyproject.toml banned-import policy block tightened; every TID251 per-file-ignore verified live (zero stale entries). .gitignore left alone (no narrative to strip). Code-smell pattern surfaced across 5 files (NOT fixed in this sweep; flagged for a follow-up): the `if self._dataplane is not None: return await self._<op>_via_dataplane(...)` dispatch shape duplicates in redis_program_storage.atomic_state_transition, fast_state_transition, batch_transition_state, batch_transition_by_ids, and archive_storage.add_elite. A small dispatch_if_dataplane helper or a coordinator-mixin would centralise the branch + assert. A sibling pattern across prompts/fetcher.py, coevolution/stats.py, coevolution/sync.py for lazy-init owned/borrowed DataPlane could deduplicate via an OwnedDataPlane container. Out-of-lane finding (not fixed): errors.all_error_types() omits EliteInvalidError from its tuple but the test only checks subclassing, not exhaustiveness. One-line fix; counted as behavior change; left for a follow-up.
1 parent cfee628 commit d5364ba

61 files changed

Lines changed: 1069 additions & 2698 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

gigaevo/database/program_storage.py

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -270,12 +270,9 @@ async def batch_transition_state(
270270
Subclasses may override with pipelined operations.
271271
Returns the number of programs transitioned.
272272
273-
Per-program FSM validation runs against the program's actual
274-
current state rather than the caller-asserted ``old_state`` so
275-
a diverged in-memory state (e.g. an upstream actor mutated the
276-
program after this batch was assembled) surfaces as
277-
:class:`ValueError` instead of leaking into a persisted blob
278-
whose ``state`` field disagrees with the FSM.
273+
Per-program FSM validation runs against each program's actual
274+
current state so a diverged in-memory state surfaces as
275+
:class:`ValueError` rather than being silently persisted.
279276
"""
280277
old_enum = ProgramState(old_state)
281278
new_enum = ProgramState(new_state)

gigaevo/database/redis/locking.py

Lines changed: 30 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,12 @@
66
token-CAS, so renew and release only mutate the key when its stored
77
value still matches the holder's token.
88
9-
Two execution paths share the same lock-key shape:
10-
11-
* **Dataplane-routed**: when a :class:`~gigaevo.dataplane.DataPlane` is
12-
supplied at construction, the three Lua calls dispatch through
13-
:meth:`gigaevo.dataplane.scripts.LuaRegistry.evalsha` so the SHA
14-
cache, NOSCRIPT recovery, and :class:`ScriptLostError` escalation
15-
policy are owned by a single registry shared with every other
16-
coordinator-routed caller. The acquired lock is wrapped in a
17-
:class:`~gigaevo.dataplane.crash.CrashWatchedHandle` so renewal
18-
failure surfaces as a typed :class:`~gigaevo.dataplane.crash.CrashEvent`
19-
the caller can pattern-match against, instead of forcing readers to
20-
poll a private flag.
21-
22-
* **Legacy direct-aioredis**: when ``dataplane`` is ``None`` the lock
23-
reimplements ``SCRIPT LOAD`` + ``EVALSHA`` + ``NOSCRIPT`` retry
24-
locally. This path stays in place for callers that have not yet
25-
acquired a dataplane handle.
9+
When a :class:`~gigaevo.dataplane.DataPlane` is supplied the Lua calls
10+
dispatch through its :class:`LuaRegistry` and the lease is wrapped in a
11+
:class:`~gigaevo.dataplane.crash.CrashWatchedHandle` for typed
12+
renewal-loss observability. Without a dataplane the SCRIPT LOAD /
13+
EVALSHA / NOSCRIPT retry is handled locally and crash observability
14+
is unavailable.
2615
"""
2716

2817
from __future__ import annotations
@@ -79,10 +68,8 @@ def __init__(
7968
self._token: str | None = None
8069
self._renewal_task: asyncio.Task[None] | None = None
8170

82-
# Legacy direct-aioredis SHA cache. Populated lazily on the
83-
# first :meth:`_evalsha` call when ``self._dataplane is None``;
84-
# ignored on the dataplane-routed path where
85-
# :class:`LuaRegistry` owns the cache.
71+
# SHA cache for the dataplane-less path; ignored when a
72+
# dataplane is wired (its LuaRegistry owns the cache).
8673
self._script_sources: dict[ScriptName, str] = {
8774
_SCRIPT_LOCK_ACQUIRE: load_lua_source(_SCRIPT_LOCK_ACQUIRE),
8875
_SCRIPT_LOCK_RENEW: load_lua_source(_SCRIPT_LOCK_RENEW),
@@ -91,12 +78,8 @@ def __init__(
9178
self._script_shas: dict[ScriptName, str] = {}
9279
self._reload_lock: asyncio.Lock | None = None
9380

94-
# Typed-control-flow channel for renewal-loss observability.
95-
# Populated on :meth:`acquire`; the flag is signalled by the
96-
# background renewer on token-CAS failure or transport error.
97-
# Callers route lease-scoped operations through
98-
# ``self._wrapped_lease.call(...)`` to convert the polled flag
99-
# into a :class:`CrashEvent` branch.
81+
# Crash-flag signalled by the background renewer on token-CAS
82+
# failure or transport error; consumed via ``wrapped_lease``.
10083
self._lease_flag: OneShotFlag | None = None
10184
self._wrapped_lease: (
10285
CrashWatchedHandle[InstanceLease, str, InstanceLease] | None
@@ -114,15 +97,12 @@ def instance_id(self) -> str:
11497
def wrapped_lease(
11598
self,
11699
) -> CrashWatchedHandle[InstanceLease, str, InstanceLease] | None:
117-
"""Typed crash-watched handle around the held lease.
118-
119-
``None`` when the lock is not held or when the dataplane is not
120-
wired (the legacy path has no shared :class:`InstanceLease`
121-
vocabulary). On the dataplane-routed path the handle's
122-
:meth:`CrashWatchedHandle.call` short-circuits to a
123-
:class:`CrashEvent` once the background renewer has signalled
124-
renewal loss, eliminating a Redis round-trip for the loss-detection
125-
path.
100+
"""Crash-watched handle around the held lease, or ``None``.
101+
102+
``None`` when the lock is not held or when no dataplane is
103+
wired. Calls through the handle short-circuit to a
104+
:class:`CrashEvent` once the background renewer signals
105+
renewal loss.
126106
"""
127107
return self._wrapped_lease
128108

@@ -149,13 +129,9 @@ async def _load_sha(
149129
"""SCRIPT LOAD ``name`` and refresh the SHA cache, coalesced.
150130
151131
Concurrent NOSCRIPT recoveries for the same script collapse to
152-
one SCRIPT LOAD round-trip: callers that wait on the lock find
153-
the SHA refreshed by the first arrival and reuse it. The
154-
``stale_sha`` argument is the SHA that just raised NOSCRIPT;
155-
when it differs from the cache we are looking at a peer's
156-
already-installed result and skip the redundant LOAD.
157-
158-
Used only on the legacy direct-aioredis path.
132+
a single SCRIPT LOAD: when ``stale_sha`` differs from the
133+
cached entry a peer has already installed a fresh SHA and we
134+
reuse it.
159135
"""
160136
async with self._get_reload_lock():
161137
cached = self._script_shas.get(name)
@@ -176,17 +152,10 @@ async def _evalsha(
176152
) -> int:
177153
"""Invoke a lock script and coerce its return to ``int``.
178154
179-
Two dispatch paths, selected by whether the dataplane is wired:
180-
181-
* **Dataplane-routed**: delegates to
182-
:meth:`LuaRegistry.evalsha`, which owns the SHA cache,
183-
NOSCRIPT-reload retry, and :class:`ScriptLostError` escalation
184-
policy. ``r`` is ignored.
185-
* **Legacy**: replicates the SCRIPT LOAD + EVALSHA + NOSCRIPT
186-
retry locally against the per-lock SHA cache.
187-
188-
Returns the script's integer return value (every instance-lock
189-
script returns 1 / 0).
155+
When a dataplane is wired, dispatches through its
156+
:class:`LuaRegistry`; otherwise runs SCRIPT LOAD + EVALSHA +
157+
NOSCRIPT retry against the local SHA cache. Every instance-lock
158+
script returns 1 / 0.
190159
"""
191160
if self._dataplane is not None:
192161
lua = self._dataplane._lua
@@ -247,11 +216,6 @@ async def _acquire(r: aioredis.Redis) -> bool:
247216

248217
result = await self._conn.execute("acquire_instance_lock", _acquire)
249218
if self._dataplane is not None:
250-
# Build an InstanceLease-shaped record so the standard
251-
# dp.wrap_lease policy applies. The lease's ``token`` is the
252-
# holder's instance id (the same value token-CAS'd by the
253-
# underlying scripts); ``flag`` is signalled by the
254-
# background renewer below on renewal loss.
255219
self._lease_flag = OneShotFlag()
256220
lease = InstanceLease(
257221
token=LeaseToken(self._instance_id),
@@ -271,13 +235,9 @@ async def release(self) -> None:
271235
272236
Token-CAS DEL: only deletes the key if its stored value still
273237
matches our instance id. Idempotent; an already-released or
274-
already-lost lock is a silent no-op.
275-
276-
After ``release()`` returns the renewer is fully drained and the
277-
local holder state (``is_held``, ``_token``) is cleared, even if
278-
the Redis side errored — the caller's lock-holding posture ends
279-
at the call site regardless of network reachability so a follow-
280-
up acquire is not blocked by stale local state.
238+
already-lost lock is a silent no-op. Local holder state is
239+
cleared even when the Redis-side call errors, so a follow-up
240+
acquire is not blocked by stale state.
281241
"""
282242
if self._renewal_task:
283243
self._renewal_task.cancel()
@@ -355,16 +315,10 @@ async def observe_loss(
355315
) -> CrashEvent[str, InstanceLease] | None:
356316
"""Return a :class:`CrashEvent` iff the renewer has signalled loss.
357317
358-
``None`` on the non-loss path (the typical case). The first call
359-
after a loss returns a synthesized :class:`CrashEvent` carrying
360-
the lock key as ``peer`` and the lost :class:`InstanceLease` as
361-
``resource``. Subsequent calls return ``None`` so observers do
362-
not see the same loss twice.
363-
364-
Returns ``None`` unconditionally on the legacy direct-aioredis
365-
path where ``wrapped_lease`` is unavailable; callers that need
366-
crash-event observability MUST construct the lock with a
367-
dataplane handle.
318+
Single-shot: the first call after a loss returns a synthesized
319+
:class:`CrashEvent` (lock key as ``peer``, lost
320+
:class:`InstanceLease` as ``resource``); subsequent calls return
321+
``None``. Always ``None`` when no dataplane is wired.
368322
"""
369323
if self._wrapped_lease is None:
370324
return None
@@ -374,8 +328,6 @@ async def _noop(_: InstanceLease) -> bool:
374328

375329
_, evt = await self._wrapped_lease.call(_noop)
376330
if evt is not None:
377-
# Consume the loss: clear the wrapper so a second observer
378-
# does not see the same event a second time.
379331
self._wrapped_lease = None
380332
self._lease_flag = None
381333
return evt

0 commit comments

Comments
 (0)