Skip to content

feat(init,config): simplified init, rebuilt config TUI, canonical channel-ID resolution#1368

Merged
Aaronontheweb merged 160 commits into
netclaw-dev:devfrom
Aaronontheweb:docs/netclaw-validated-ui-components
Jun 17, 2026
Merged

feat(init,config): simplified init, rebuilt config TUI, canonical channel-ID resolution#1368
Aaronontheweb merged 160 commits into
netclaw-dev:devfrom
Aaronontheweb:docs/netclaw-validated-ui-components

Conversation

@Aaronontheweb

@Aaronontheweb Aaronontheweb commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Supersedes #1275 (folds in the guided-config-surfaces work). Reconciles netclaw init and netclaw config against a terminal-faithful browser prototype (design/tui-prototype/), implements the proven UX in Termina, and makes the channel allow-list store canonical channel IDs. Four OpenSpec changes, all archived.

simplify-netclaw-init

  • netclaw init reduced 11 → 5 steps: Provider → Identity → Security Posture → Enabled Features (Personal skips) → Health Check. Channels, Search, Browser Automation, and Skill Sources move to netclaw config.
  • Existing-install action menu (Redo identity / Open configuration editor / Start over / Cancel); double-confirmed reset (setup-only vs full); identity-redo writes only identity files; post-flight nudges to netclaw chat + netclaw config.

netclaw-validated-ui-components — removed

  • The universal validated-UI commit framework was over-opinionated (a single consumer, a generic mandatory pipeline, a never-built analyzer). Removed entirely (−1145 net lines); Skill Sources migrated onto the same inline-validation pattern the Search editor uses. Config pages are presentational (an inverted audit test enforces it); persistence stays section-preserving on the view models.

netclaw-config-command

  • Live status-summary dashboard (Search ✓ Brave, Security & Access Team · 4/6 enabled).
  • Channels resolve-before-add (probe the channel against the adapter before saving).
  • Telemetry & Alerting multi-webhook list editor over NotificationsConfig.Webhooks.
  • Unified full-width selection bar across config sub-pages.

section-editor-abstraction

  • Confirmed valid — ConfigEditorSession is now the single persistence seam the migrated editors write through.

Channel allow-list resolution (correctness fix)

The runtime ACL (SlackAclPolicy / DiscordAclPolicy / MattermostAclPolicy) matches the incoming channel ID, but the editors had been persisting human display names — inert allow-list entries that silently matched nothing (channel messages were dropped while DMs worked). Now, for all three adapters:

  • The stored ACL key is the platform's immutable channel ID; the mutable display name is resolved dynamically for rendering only, never persisted.
  • References are canonicalized through one shared path (ReconcileResolvedChannels): an ID-shaped reference is kept (the stable key — never dropped for a failed enumeration), a display name that maps to an ID is stored as the ID, and a reference that maps to no ID is dropped and flagged — fail-loud, no silent inert entries.
  • The add-channel field and the first-connect sub-flow share one resolution path (the bespoke single-channel resolver was deleted); both accept a comma-separated list.
  • Resolution runs off the Termina loop thread (no sync-over-async blocking) — captured as a reusable dev skill at .claude/skills/termina-tui-patterns.md.
  • Config / secrets / devices.json writes are atomic (temp + fsync + rename).

Testing

  • Build clean; full CLI suite 1090/1090.
  • Native smoke harness green (incl. rewritten init-wizard, new init-existing, and config-channels driving the dashboard / multi-webhook / resolve-before-add UX).
  • slopwatch clean; copyright headers verified.
  • Validated end-to-end against live Slack + Discord — channel replies route correctly (the original no-reply symptom).

Closes

Partially addresses / follow-ons

@Aaronontheweb Aaronontheweb force-pushed the docs/netclaw-validated-ui-components branch 5 times, most recently from 02bc07a to 55253b1 Compare June 15, 2026 21:55
@Aaronontheweb Aaronontheweb added tui Terminal UI (Termina) issues config Configuration issues, netclaw doctor, schema validation. UX/DX UI / UX / DX friction issue or user-facing annoyances. labels Jun 16, 2026
@Aaronontheweb Aaronontheweb force-pushed the docs/netclaw-validated-ui-components branch from c8e7263 to 69d8b00 Compare June 16, 2026 15:29
@Aaronontheweb Aaronontheweb changed the title feat(init,config): prototype-proven init + config TUI; remove validated-UI framework feat(init,config): simplified init, rebuilt config TUI, canonical channel-ID resolution Jun 17, 2026
@Aaronontheweb Aaronontheweb marked this pull request as ready for review June 17, 2026 05:26
This was referenced Jun 17, 2026
Adds three sequential OpenSpec changes for the netclaw init/config UX
overhaul planned in /home/petabridge/.claude/plans/so-a-big-ux-mossy-newt.md:

1. section-editor-abstraction — introduces ISectionEditor, registry,
   merge-on-save, single-step orchestrator mode, and refactors
   Provider/Identity/Posture to be reentrant. Lays the contract every
   future editable section must honor. Closes netclaw-dev#455.

2. netclaw-config-command — new menu-driven `netclaw config` TUI
   command composing 10 section editors (Search, 3x chat channels,
   Exposure Mode, Security Posture, Audience Profiles, Outbound +
   Inbound Webhooks, External Skills, Skill Feeds, Browser Automation),
   plus a generic ListEditor<T> framework, 4 new doctor checks, 12
   smoke tapes + audit. Closes netclaw-dev#1150.

3. simplify-netclaw-init — trims `netclaw init` to provider + identity
   + posture, adds existing-config refusal + --force backup-and-reset
   paths, post-flight nudge pointing operators at `netclaw config`.

All three validated with `openspec validate <change> --type change`.
Sequential dependencies: A enables the abstraction, B introduces the
command and editors, C cuts the long wizard. PR review can sequence
the implementation accordingly.

No production code changes in this PR; planning artifacts only.
Self-review pass on the three init/config UX changes surfaced real
issues. Fixes:

UI mockups landed in the repo (were stuck in ~/.claude/plans/):
- docs/ui/TUI-002-netclaw-config-wireframes.md (dashboard + 12
  editors + 8 page templates + doctor results + nudge)
- docs/ui/TUI-003-simplified-init-wireframes.md (3 init steps +
  post-flight + refusal + force-reset confirm)
- Each change's design.md references its corresponding wireframe
  document as the authoritative visual contract

ISectionEditor vs menu split (Change A):
- ISectionEditor gains `bool ShowInMenu` flag (default true)
- MenuRegistryAuditTests waives tape-existence check for
  ShowInMenu == false editors (e.g. Provider, Identity covered by
  init-wizard.tape and the netclaw provider CLI)
- Round-trip test + RelevantDoctorChecks contract still applies to
  every registered editor regardless of ShowInMenu

Schema/SectionId mismatches:
- Identity is NOT a top-level schema key; added to exemption list
  with category "synthetic-spans-multiple-sections" and
  ShowInMenu = false in Change A's tasks
- Top-level Security, Daemon, Tools added to exemption list in
  Change B's tasks with category "covered by another editor's
  dotted-path SectionId" naming the covering editor
- Exemption-list spec scenarios cover both top-level and
  dotted-path coverage

netclaw config show|validate reserved (Change B):
- Reserved subcommands now print an explicit "not yet implemented;
  PRD-004" notice and exit non-zero, preserving the documented
  future surface (previously rejected as unknown)

Important items tightened across the changes:
- Change B section editors explicitly REFACTOR existing init step
  viewmodels (not create duplicates) where the section already has
  an init step
- Daemon-restart nudge now specifies the PID-file + TCP probe with
  a 250 ms bound; timeout suppresses the nudge (conservative)
- In-place rename for list items now specifies originalKey/newKey
  tracking, secrets-store rekey, and array-position preservation
- BrowserAutomation schema-migration scenarios cover both the
  editor opening over a pre-existing config and doctor --fix
  auto-insert
- --force non-TTY refusal scenario added in Change C
- .bak filename collision handled via -1/-2 suffix; timestamp moves
  from unix-seconds to unix-millis
- Multi-instance editing and Test Connection partial-failure shape
  documented in Change B's design Risks section

All three changes re-validated:
  openspec validate section-editor-abstraction --type change  ✓
  openspec validate netclaw-config-command --type change      ✓
  openspec validate simplify-netclaw-init --type change       ✓
Three config view-models had a read-modify-write path that bypassed their catch
seam and could throw into the Termina event loop:

- InboundWebhooks: the Enter-key Save() called Save(string) directly; only the
  autosave path was wrapped in ConfigAutosave.Run. Route the public Save()
  through the same guard so a ListRouteFiles read error or WriteConfigFile
  IOException surfaces via Status instead of crashing.
- Workspaces: LoadJsonDict sat between the CreateDirectory and WriteConfigFile
  try/catch blocks, so a malformed netclaw.json threw JsonException uncaught.
  Bracket read+modify+write as one guarded unit and catch JsonException.
- SkillSources: SaveExternalConfig/SaveSkillFeedsConfig read via LoadJsonDict
  outside TryWriteConfigRoot, and that guard omitted JsonException. Replace it
  with a TryEditConfig helper that guards read+mutate+write (incl. JsonException).

Each fix has a fake-failure test (config path forced to a directory, or malformed
config) proving the failure surfaces as a status instead of crashing the loop.
…ared reader

ChannelsConfigViewModel.LoadDeploymentPosture threw InvalidOperationException for
a present-but-unparseable Security.DeploymentPosture, and it runs in the
constructor — so a renamed/stale/hand-edited posture value made the entire
Channels page permanently inaccessible. The Security editor reads the same value
fail-closed-to-Public (Task 2.1), so one bad value crashed one page and not the
other.

Extract that fail-closed-to-Public semantics into a shared DeploymentPostureReader
and have both view-models read through it. Channels now falls closed to Public
(the safe restrictive default for channel/DM audience ACL defaults) instead of
throwing; the Security editor — the posture's owner — still surfaces the
corruption. One posture-read implementation, fail-closed.

Test: constructing Channels with an unparseable posture no longer throws.
…ersists

ApplyResetConfirmation bypassed SaveAsync, which cancels and awaits the in-flight
Slack label refresh at its top (the deep-review race guard). So a reset that ran
while a background normalizer was in flight could have that normalizer write a
stale snapshot over the reset's config file or clobber the just-reloaded
view-model state.

Cancel and await the refresh before the reset persists and rebuilds state, using
the same guard SaveAsync uses. Bridged synchronously like the VM's existing sync
save entry points — Termina has no SynchronizationContext, so blocking the loop
thread for the cancelled refresh's prompt completion cannot deadlock, and the
reset stays synchronous.

Test: with a 5-minute-blocked probe refresh in flight, ApplyResetConfirmation
unwinds the tracked task to null (and returns promptly); fail-first confirmed the
refresh is otherwise left running.
…tion/JsonException

Cycle-1's reset-race fix added the cancel-and-await label-refresh guard at the
top of ApplyResetConfirmation but left session.Save() and the _mapper.Load()
reload bare — so a disk-full / permission-denied write or a malformed existing
netclaw.json still escaped as an unhandled exception into the Termina event loop
and crashed the page. This reset path bypasses SaveAsync (whose ConfigAutosave
guard covers every other save), so it needed its own.

Wrap the persist+reload block in the same IOException/UnauthorizedAccessException/
JsonException guard, surface the failure via Status, and stay on the confirmation
screen so the reset can be retried. Fake-failure test (config path -> directory)
proves the failure surfaces instead of crashing.
…formed config

Cycle-2 HIGH: cycle-1's TryEditConfig guarded only its own internal read, but
every mutation handler does a typed pre-save read (LoadExternalConfig /
LoadSkillFeedsSection(LoadJsonDict(...))) OUTSIDE that guard. A malformed /
partially-written netclaw.json threw JsonException from those reads — before
TryEditConfig was ever entered — crashing the Termina event loop on every
add/toggle/rename/path-change/remove/token-rotate (15 call sites). ReloadSources
(constructor + post-save reload) had the same unguarded read, so a malformed
config left the page permanently inaccessible.

Add TryLoadExternalConfig / TryLoadSkillFeeds guarded readers (same
IOException/UnauthorizedAccessException/JsonException catch + Status as the write
path) and route all 15 mutation reads through them with an early return. Guard
ReloadSources to degrade to the prior (or empty) source list with an error
status. Test: a malformed config crashes neither construction nor a mutation —
both surface an error status.
…ailure

Cycle-2 MED: ProtectApiKeyForConfig (DataProtection .Protect()) ran before
TryEditConfig at both the add-remote-source and rotate-token sites, so an
unavailable / rotated key ring (CryptographicException) or a missing/locked keys
directory (IOException) escaped unguarded into the Termina event loop. Wrap the
encrypt in TryProtectApiKey (catch CryptographicException/IOException/
UnauthorizedAccessException), surface 'Could not encrypt the API key: ...' as an
error status, and early-return before the save so nothing is persisted. Test:
with the keys directory replaced by a file, the remote-add commit surfaces an
error instead of crashing.
…eload

Cycle-2 MED (two issues in SearchConfigEditorViewModel):

- The reachability probe ran on the caller's token (CancellationToken.None from
  the page), so navigating/disposing during Validating left a stale probe whose
  result overwrote the reloaded state (or hit ObjectDisposedException). Give the
  VM an owned linked CTS per validation run, cancel it on reload/dispose, and
  bail after the await when it was cancelled so a stale result cannot overwrite
  the navigated-to screen.

- ReloadPersistedDraft's _mapper.Load was unguarded: a malformed config
  (JsonException) or rotated/unavailable key ring (CryptographicException, the
  mapper decrypts the persisted Brave key) crashed nav-back / reset / Save-Anyway.
  Guard it, keep the prior model, and return a bool so NavigateBack/ResetDraft/
  SaveWithoutProbeOverride do not navigate, clear status, or claim 'Saved' over a
  faulted reload (they previously overwrote the surfaced error).

Tests: a navigate-away during an in-flight (gated) probe leaves the navigated
screen intact; a malformed config on nav-back surfaces an error instead of
crashing.
…nfig

Cycle-2 MED: the ExposureMode, Workspaces, and TelemetryAlerting constructors
read netclaw.json (LoadJsonDictOrNull / LoadCurrentDirectory / LoadState) before
their Status surface is set, so a malformed / unreadable config threw straight
out of the constructor — making the page permanently inaccessible with an
unhandled exception rather than a repairable error. (SkillSources' ReloadSources
was already guarded in the pre-save-read fix.)

Add a shared ConfigFileHelper.TryLoadJsonDictOrNull(path, out error) that never
throws on a malformed/unreadable file, and have each constructor degrade to a
safe default (no existing config / no current directory / default telemetry
state) and surface the read error via Status instead of crashing. Tests:
constructing each VM against a malformed netclaw.json no longer throws and
reports an error.
…id (fail-loud)

Discovered in live testing: the Slack ACL silently dropped messages
(channel_not_allowed) because AllowedChannelIds held unresolved channel NAMES,
not the IDs SlackAclPolicy matches. The config editor was persisting those
unresolved names as inert allow-list entries with only a non-blocking warning,
so an operator could save a dead allow-list and the bot would never respond.

Per operator decision, make this fail loud: ValidateSlack/Discord/Mattermost-
ChannelsAsync now BLOCK the save (and persist nothing) when the probe leaves any
channel unresolved, surfacing 'Could not resolve #x — fix or remove before saving
(an unmatchable channel grants nothing).' Slack also no longer writes the
unresolved name into the draft — only resolved IDs persist.

Updates the four tests that codified the prior persist-inert behavior (including
the former 'persist everything' invariant) to assert the new block + nothing-
persisted contract. Full suite 1073/1073.
… like Slack

Discovered in live testing alongside the Slack inert-name bug: the Discord and
Mattermost config paths only VALIDATED raw channel ids — they had no display-name
resolution, so an operator who typed a channel name got a dead allow-list entry.

Teach both probes to resolve names: DiscordProbe enumerates the bot's guild text
channels (GET /users/@me/guilds + /guilds/{id}/channels) and matches each
reference by id or name; MattermostProbe enumerates the bot's team channels
(/users/me/teams + /users/me/teams/{id}/channels) and matches by id, url slug, or
display name. The config-editor validators now remap resolved names to their ids
and persist the ids (shared SetResolvedChannels helper mirroring the Slack path),
so the runtime ACL matches. Part A's block-on-unresolved covers the miss case.

Fakes echo inputs as resolved by default (set NextResolutionResult to stage a
specific outcome). Tests: a Discord/Mattermost name resolves to its id and only
the id persists. Full suite 1075/1075.
…Slack

The Discord and Mattermost channel wizard steps persisted raw channel
input into AllowedChannelIds and ChannelAudiences. The runtime ACL
(DiscordAclPolicy / MattermostAclPolicy) matches incoming channel IDs by
ordinal, so a persisted name is an inert allow-list entry that grants
nothing — the same name-vs-ID mismatch already fixed for Slack and for
the config editor. This aligns the wizard contribution path with both.

Mirror the Slack hardening in both wizard steps:
- ContributeConfig persists only resolved canonical channel IDs from
  LastChannelResolution; an unresolved reference is omitted, never
  written verbatim (an unmatchable ACL entry grants nothing).
- BuildChannelAudiences keys audiences by resolved channel ID (or the
  literal "dm" key) and omits unresolved names instead of writing dead
  ACL keys the runtime can't match.

Mattermost had no resolution probe wired into the wizard (its
LastChannelResolution was never populated), so inject IMattermostProbe
and resolve channel references in ContributeHealthChecksAsync
(id / slug / display name) before persistence, as Discord/Slack do.

Add regression tests proving resolved-id persistence and unresolved-name
omission for both adapters, plus Mattermost health-check resolution.
Captures the correct async-to-front-end pattern for the Termina TUI so agents
stop reaching for .GetAwaiter().GetResult() to 'stay on the loop thread'.

The key facts, evidenced from the codebase: Termina's loop is a single
await-foreach over an unbounded Channel<object>; RequestRedraw() is a
thread-safe Writer.TryWrite callable from any thread; there is no
SynchronizationContext and no R3 FrameProvider. So async continuations resume on
the thread pool and publish results by mutating ReactiveProperty state +
RequestRedraw() — never by blocking the loop. The chat streaming pipeline and the
SkillSources / label-refresh / provider probes are the in-repo reference
implementations; the four GetResult() sites in ChannelsConfigViewModel are the
anti-pattern to migrate.

Includes the tracked-task + owned-CTS recipe, the cancel-and-await-before-save
discipline, deterministic-test guidance (expose PendingX, no Task.Delay), and the
self-animating spinner note.
…und resolution

The adapter sub-flow's free-text channel field could persist raw display names
into AllowedChannelIds. The runtime ACL matches the platform's immutable channel
id, so a stored display name is an inert entry that silently grants nothing — the
no-reply bug for channel messages.

Fix it at the right altitude, honoring the platform constraint that the channel id
is the stable ACL key while the display name is mutable and render-only:

- ReconcileResolvedChannels canonicalizes the allow-list against each completed
  background resolution, for all three transports: a reference that is a channel id
  is kept (never dropped on a display-lookup miss); a display name that maps to an
  id is stored as the id (and its audience key remapped); a display name that maps
  to no id is NOT persisted (dropped, with a loud human-readable warning); a probe
  failure likewise persists no unmapped name and surfaces the reason.
- Wire Mattermost into the resolution (it had none) and make Discord canonicalize
  (it previously only set a display label). Resolution runs off the loop thread and
  publishes via RequestRedraw — no sync-over-async, per the termina-tui-patterns
  skill. Opening Manage Channels re-runs it, so existing configs self-heal.

Replaces the old keep-the-name normalizer (which left inert names in the ACL).

Adds definitive tests across Slack/Discord/Mattermost: resolved name persists as
its id, an unmappable name is dropped and warned, a scope error drops names and
surfaces the reason, and a real id the bot can't currently enumerate is kept.
… field

The add-channel field resolved its whole input as a single channel, so a CSV like
"openclaw, netclaw-test" became one bogus reference ("channel not found:
#openclaw,netclaw-test") while the first-connect sub-flow happily accepted the
same CSV. Use the one shared ChannelCsv parser in both places.

ApplyAddChannelAsync now splits the input, resolves each reference to its canonical
id (adapter-agnostic, so Slack/Discord/Mattermost all benefit), adds the resolved
ids, and reports the unresolved ones without persisting them. The single-channel
success message is preserved; multi-channel adds report a count. Placeholder updated
to "channel IDs or #names, comma-separated."

Tests: a comma-separated add resolves each reference to its id, and a mixed list
persists the resolvable channel while flagging the unresolvable one.
…channel

The add-channel route had its own resolver (ResolveSingleChannelAsync) that REQUIRED
the probe to enumerate the channel — so a valid Discord channel id the bot couldn't
list was rejected, even though the first-connect flow accepts it. Two divergent paths
for the same job.

Distill to one component used everywhere, for every adapter:
- ApplyAddChannelAsync now appends the typed reference(s) and canonicalizes them through
  the same ReconcileResolvedChannels the first-connect sub-flow uses: an id-shaped
  reference is kept as the stable ACL key (never dropped for failing enumeration), a
  display name that maps to an id becomes the id, and anything unmappable is dropped and
  flagged. Pasted Discord ids now behave identically in both flows.
- Delete the bespoke ResolveSingleChannelAsync + ChannelResolveOutcome.
- Collapse the three near-identical Refresh{Slack,Discord,Mattermost}ChannelLabelsAsync
  into one transport-dispatch (ResolveChannelReferencesAsync) feeding the transport-
  agnostic reconcile. Net -127 lines in the view-model.
- The add-channel hint is adapter-aware (Slack: names or ids; Discord/Mattermost: ids),
  matching the onboarding dialog and not suggesting display names where they don't resolve.
- Reconcile surfaces a warning only when a reference is actually dropped; a probe error
  that dropped nothing (every reference is an id-shaped key) is benign and no longer masks
  a successful add.

Tests updated to the unified contract: an unresolvable add is dropped+warned on the
permissions screen (not kept on the add screen); resolution batches the whole list.
…list view

The Mattermost channel rows rendered the opaque channel id while Slack and Discord
rendered the resolved human name. Add FormatMattermostChannelLabel (preferring the
display name, falling back to the #slug) so the config UI shows the readable name once
the background resolution has run — completing the Mattermost half of the channel
display-name work.

Closes netclaw-dev#1324.
…ing-specs

Both config-TUI OpenSpec changes are complete (implementation tested at 1091/1091).
`openspec archive` folds their delta specs into openspec/specs/:
- harden-config-tui-io-and-failloud -> new config-tui-resilience capability (+8 reqs:
  atomic writes, background-probe track/cancel/await lifecycle, fail-loud/deny-by-default
  config reads + security fallbacks).
- reconcile-config-onboarding-specs -> synced the as-built deltas across channel-audience-tui,
  feature-selection-wizard, inbound-webhooks, netclaw-config-command, netclaw-onboarding,
  and security-posture-tui (+14 / ~11 / -3).

Corrected one delta mis-categorization in reconcile: 'Channels area supports Slack/Discord/
Mattermost adapters' was filed under MODIFIED but is a new requirement (no matching header in
the spec) -> moved to ADDED so the archive could apply.

God-object viewmodel decomposition + the 53 low-severity review items remain deferred to a
follow-on change, per the harden proposal.
…S test deadlock

The Channels config viewmodel bridged async work (label-refresh cancel-and-await,
save, autosave, add-channel) to Termina's synchronous key handlers via
`.GetAwaiter().GetResult()`. That bridge is only safe when no SynchronizationContext
is captured — true in the real Termina loop, but NOT under the xunit v3 test runner,
which installs a MaxConcurrencySyncContext sized to the core count. On macOS CI's
smaller worker pool the blocked `.GetResult()` held the only free worker while the
cancelled probe's continuation was posted back to that same context — a
sync-over-async deadlock. `dotnet test` hung ~30 min and was killed by the job
timeout; it passed on higher-core Linux/Windows runners.

Remove all four sync-over-async bridges in ChannelsConfigViewModel:
- Save/ApplyAddChannel/ApplyResetConfirmation are now async; reset and add are
  dispatched fire-and-forget from the page key handlers.
- Autosave handlers enqueue the persist instead of blocking; the on-loop state
  mutation is unchanged.
- Add one explicit serialization point (PendingConfigWrite) to preserve the write
  ordering the loop-block provided for free, so two rapid mutations cannot race the
  disk write + state reload. It runs synchronously inline in the common case.
- ApplyAddChannelAsync settles screen + row focus synchronously before the async
  persist so a subsequent keypress navigates from a deterministic position.

Add a deterministic regression test (SingleThreadSynchronizationContext) that drives
the reset-with-in-flight-refresh scenario under a single-worker context: it deadlocks
the old blocking code (watchdog trips) and passes the async path.
…et errors

Follow-up hardening from the code review of the Channels async-write migration. Closes the
new exposures that the fire-and-forget writes introduced, plus adjacent findings:

- Lifecycle cancellation (review #1/netclaw-dev#5): add a VM-lifetime CancellationTokenSource, thread
  its token through the AddChannel / Reset / autosave entry points, and have Dispose cancel
  it and DRAIN the in-flight write + label refresh before disposing the reactive state they
  publish to. Pre-fix, closing the editor mid-write could let the write resume on a
  thread-pool continuation and mutate a disposed ReactiveProperty / Step; the runtime path
  also passed CancellationToken.None, so nothing honored cancellation.
- Reset fails loud (review #3): ApplyResetConfirmationAsync now surfaces ANY unexpected
  exception (e.g. an InvalidOperationException from a type-malformed config reload) as an
  Error status instead of letting it fault the fire-and-forget chain, where the next write's
  catch swallowed it silently. Honors the no-silent-fallbacks rule.
- Search re-entrancy (review netclaw-dev#6): SubmitCurrentConfigurationFromInputAsync now ignores a
  re-entrant submit while a probe is in flight (returns the same task), closing the same
  two-rapid-submits-race-the-disk-write hazard the Channels write chain fixed.
- Cleanup: collapse SaveCompletedAsync / SaveFromInputAsync onto one SaveViaAutosaveAsync
  helper (netclaw-dev#8); extract the shared ArrangeSlackResetWithLabelRefreshInFlight test setup (netclaw-dev#9);
  the single-worker SynchronizationContext test pump no longer dies on a throwing
  continuation, which would mask a real failure as a generic deadlock (netclaw-dev#7).

Tests: a dispose-drains-an-in-flight-write test (Channels) and a re-entrancy test (Search).

Deferred (pre-existing, tracked separately): the background label refresh and the save+reload
tail still mutate view-model state off the loop, so a concurrent on-loop edit during a network
resolve can lose an audience edit (review #2/netclaw-dev#4). Eliminating that requires marshalling those
mutations back onto the loop (Termina exposes only RequestRedraw) and is its own change; the
lifetime cancellation here bounds the window and closes the disposal portion.
@Aaronontheweb Aaronontheweb force-pushed the docs/netclaw-validated-ui-components branch from da38550 to 8621887 Compare June 17, 2026 16:54
…lopwatch SW003)

The EnqueueConfigWriteAsync chain-await guard and the Dispose drain guard were comment-only
empty catches. Both are defensive (every write is exception-safe and surfaces its own status),
but slopwatch SW003 flags an empty catch as swallowing without handling. Add a Debug.WriteLine
trace so the intentional swallow is logged if the defensive path is ever hit.
Test-macos-26 (Apple Silicon / ARM64) hangs in Netclaw.Cli.Tests for the full 30-min job cap with
no diagnostic, while ubuntu/windows (x64) pass — a single test stalls the assembly. Instrument the
dotnet test step with --blame-hang-timeout 300s --blame-hang-dump-type full so a hang fails fast,
prints the stuck test name to the log (Show hang sequence step), and uploads the dump + sequence as
the test-hang-dump-<os> artifact for offline stack analysis.

Leading theory (documented in docs/incidents/2026-06-17-macos-arm64-tui-test-hang.md): the TUI
view-models mutate plain fields from off-loop thread-pool continuations and read them on the render
loop with no barrier — safe under x64 TSO, potentially a stale-read hang under ARM64's weak memory
model. The incident also directs an audit of the termina-tui-patterns skill.
@Aaronontheweb Aaronontheweb merged commit 03ef94b into netclaw-dev:dev Jun 17, 2026
15 checks passed
@Aaronontheweb Aaronontheweb deleted the docs/netclaw-validated-ui-components branch June 17, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration issues, netclaw doctor, schema validation. tui Terminal UI (Termina) issues UX/DX UI / UX / DX friction issue or user-facing annoyances.

Projects

None yet

1 participant