Skip to content

Commit da38550

Browse files
committed
fix(config): harden async config-write lifecycle and fail loud on reset errors
Follow-up hardening from the code review of the Channels async-write migration. Closes the new exposures that the fire-and-forget writes introduced, plus adjacent findings: - Lifecycle cancellation (review #1/#5): add a VM-lifetime CancellationTokenSource, thread its token through the AddChannel / Reset / autosave entry points, and have Dispose cancel it and DRAIN the in-flight write + label refresh before disposing the reactive state they publish to. Pre-fix, closing the editor mid-write could let the write resume on a thread-pool continuation and mutate a disposed ReactiveProperty / Step; the runtime path also passed CancellationToken.None, so nothing honored cancellation. - Reset fails loud (review #3): ApplyResetConfirmationAsync now surfaces ANY unexpected exception (e.g. an InvalidOperationException from a type-malformed config reload) as an Error status instead of letting it fault the fire-and-forget chain, where the next write's catch swallowed it silently. Honors the no-silent-fallbacks rule. - Search re-entrancy (review #6): SubmitCurrentConfigurationFromInputAsync now ignores a re-entrant submit while a probe is in flight (returns the same task), closing the same two-rapid-submits-race-the-disk-write hazard the Channels write chain fixed. - Cleanup: collapse SaveCompletedAsync / SaveFromInputAsync onto one SaveViaAutosaveAsync helper (#8); extract the shared ArrangeSlackResetWithLabelRefreshInFlight test setup (#9); the single-worker SynchronizationContext test pump no longer dies on a throwing continuation, which would mask a real failure as a generic deadlock (#7). Tests: a dispose-drains-an-in-flight-write test (Channels) and a re-entrancy test (Search). Deferred (pre-existing, tracked separately): the background label refresh and the save+reload tail still mutate view-model state off the loop, so a concurrent on-loop edit during a network resolve can lose an audience edit (review #2/#4). Eliminating that requires marshalling those mutations back onto the loop (Termina exposes only RequestRedraw) and is its own change; the lifetime cancellation here bounds the window and closes the disposal portion.
1 parent b1fe119 commit da38550

5 files changed

Lines changed: 183 additions & 67 deletions

File tree

src/Netclaw.Cli.Tests/Tui/Config/ChannelsConfigViewModelTests.cs

Lines changed: 66 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1291,28 +1291,7 @@ public async Task SaveAsync_cancels_and_awaits_in_flight_label_refresh_before_wr
12911291
[Fact]
12921292
public async Task ApplyResetConfirmation_cancels_and_awaits_in_flight_label_refresh_before_writing()
12931293
{
1294-
WriteAllChannelConfig();
1295-
WriteAllChannelSecrets();
1296-
var slackProbe = new FakeSlackProbe
1297-
{
1298-
// Block the resolve so the background refresh is genuinely in flight during the reset.
1299-
DelayBeforeResult = TimeSpan.FromMinutes(5),
1300-
NextResolutionResult = new SlackChannelResolutionResult(
1301-
true, null, [new ResolvedSlackChannel("general", "C01")], []),
1302-
};
1303-
using var vm = CreateViewModel(slackProbe: slackProbe);
1304-
1305-
// Enter Manage Channels to start the background label refresh, then leave it in flight.
1306-
vm.OpenAdapterManagement(ChannelType.Slack);
1307-
MoveToManagementAction(vm, ChannelsManagementAction.ManageChannels);
1308-
vm.ActivateManagementMenuItem();
1309-
Assert.False(vm.PendingLabelRefresh?.IsCompleted ?? true); // background is in flight
1310-
1311-
// Drive the reset confirmation (which bypasses SaveAsync) while the refresh is still running.
1312-
vm.GoBack();
1313-
MoveToManagementAction(vm, ChannelsManagementAction.ResetConnection);
1314-
vm.ActivateManagementMenuItem();
1315-
vm.MoveResetConfirmation(1);
1294+
using var vm = ArrangeSlackResetWithLabelRefreshInFlight();
13161295

13171296
await vm.ApplyResetConfirmationAsync(TestContext.Current.CancellationToken);
13181297

@@ -1334,32 +1313,12 @@ public async Task Reset_with_in_flight_label_refresh_completes_under_a_single_wo
13341313
// deterministically: it drives the whole reset-with-in-flight-refresh scenario on a context
13351314
// with exactly ONE worker, so a reintroduced sync-over-async bridge hangs the worker and trips
13361315
// the watchdog instead of completing.
1337-
WriteAllChannelConfig();
1338-
WriteAllChannelSecrets();
1339-
var slackProbe = new FakeSlackProbe
1340-
{
1341-
// Keep the background refresh genuinely in flight while the reset runs.
1342-
DelayBeforeResult = TimeSpan.FromMinutes(5),
1343-
NextResolutionResult = new SlackChannelResolutionResult(
1344-
true, null, [new ResolvedSlackChannel("general", "C01")], []),
1345-
};
1346-
13471316
using var context = new SingleThreadSynchronizationContext();
13481317
var scenario = context.Run(async () =>
13491318
{
1350-
using var vm = CreateViewModel(slackProbe: slackProbe);
1351-
1352-
// Start the background label refresh and leave it in flight. Its continuation captures THIS
1353-
// single-worker context — exactly the condition that deadlocked the old blocking reset.
1354-
vm.OpenAdapterManagement(ChannelType.Slack);
1355-
MoveToManagementAction(vm, ChannelsManagementAction.ManageChannels);
1356-
vm.ActivateManagementMenuItem();
1357-
Assert.False(vm.PendingLabelRefresh?.IsCompleted ?? true);
1358-
1359-
vm.GoBack();
1360-
MoveToManagementAction(vm, ChannelsManagementAction.ResetConnection);
1361-
vm.ActivateManagementMenuItem();
1362-
vm.MoveResetConfirmation(1);
1319+
// Arrange on the single-worker context so the background refresh's continuation captures
1320+
// THIS context — exactly the condition that deadlocked the old blocking reset.
1321+
using var vm = ArrangeSlackResetWithLabelRefreshInFlight();
13631322

13641323
// Fire-and-forget exactly like the Termina key handler, then await the serialized write.
13651324
_ = vm.ResetConfirmationFromInputAsync();
@@ -1377,6 +1336,39 @@ public async Task Reset_with_in_flight_label_refresh_completes_under_a_single_wo
13771336
await scenario; // re-throw any assertion failure raised on the worker thread
13781337
}
13791338

1339+
[Fact]
1340+
public async Task Disposing_editor_cancels_and_drains_an_in_flight_config_write()
1341+
{
1342+
WriteChannelConfig();
1343+
WriteChannelSecrets();
1344+
var slackProbe = new FakeSlackProbe
1345+
{
1346+
// Hold the add's label-resolve open so the config write is genuinely in flight at Dispose.
1347+
DelayBeforeResult = TimeSpan.FromMinutes(5),
1348+
NextResolutionResult = new SlackChannelResolutionResult(
1349+
true, null, [new ResolvedSlackChannel("c-09", "C09")], []),
1350+
};
1351+
var vm = CreateViewModel(slackProbe: slackProbe);
1352+
vm.OpenAdapterManagement(ChannelType.Slack);
1353+
vm.BeginAddChannel();
1354+
vm.AddChannelInput = "c-09";
1355+
1356+
// Dispatch the add fire-and-forget exactly like the key handler; it blocks on the 5-minute resolve.
1357+
var write = vm.AddChannelFromInputAsync();
1358+
Assert.False(write.IsCompleted); // in flight, blocked on the label resolve
1359+
1360+
// Dispose must cancel the in-flight write via the lifetime token and drain it before returning —
1361+
// not hang for the 5-minute probe, and not let the write resume on a thread-pool continuation and
1362+
// mutate disposed reactive state. Run Dispose off the xunit synchronization context so the
1363+
// in-flight write's continuations can drain on the test context while Dispose waits.
1364+
var dispose = Task.Run(vm.Dispose, TestContext.Current.CancellationToken);
1365+
var finished = await Task.WhenAny(
1366+
dispose, Task.Delay(TimeSpan.FromSeconds(15), TestContext.Current.CancellationToken));
1367+
Assert.True(ReferenceEquals(finished, dispose), "Dispose did not drain the cancelled in-flight write promptly.");
1368+
await dispose; // surface any teardown exception
1369+
await write; // the cancelled add unwound without surfacing out
1370+
}
1371+
13801372
[Fact]
13811373
public async Task ApplyResetConfirmation_surfaces_save_failure_without_crashing_the_loop()
13821374
{
@@ -1831,6 +1823,35 @@ private static void MoveToManagementAction(ChannelsConfigViewModel vm, ChannelsM
18311823
vm.MoveManagementMenu(index);
18321824
}
18331825

1826+
// Arranges a Slack adapter parked on the reset-confirmation screen with a background label refresh
1827+
// genuinely in flight (a 5-minute probe delay holds it open). Shared by the reset tests that verify
1828+
// the reset cancels-and-awaits that refresh without racing its write or deadlocking. The caller owns
1829+
// disposal of the returned view-model.
1830+
private ChannelsConfigViewModel ArrangeSlackResetWithLabelRefreshInFlight()
1831+
{
1832+
WriteAllChannelConfig();
1833+
WriteAllChannelSecrets();
1834+
var slackProbe = new FakeSlackProbe
1835+
{
1836+
DelayBeforeResult = TimeSpan.FromMinutes(5),
1837+
NextResolutionResult = new SlackChannelResolutionResult(
1838+
true, null, [new ResolvedSlackChannel("general", "C01")], []),
1839+
};
1840+
var vm = CreateViewModel(slackProbe: slackProbe);
1841+
1842+
// Enter Manage Channels to start the background label refresh, then leave it in flight.
1843+
vm.OpenAdapterManagement(ChannelType.Slack);
1844+
MoveToManagementAction(vm, ChannelsManagementAction.ManageChannels);
1845+
vm.ActivateManagementMenuItem();
1846+
Assert.False(vm.PendingLabelRefresh?.IsCompleted ?? true); // background is in flight
1847+
1848+
vm.GoBack();
1849+
MoveToManagementAction(vm, ChannelsManagementAction.ResetConnection);
1850+
vm.ActivateManagementMenuItem();
1851+
vm.MoveResetConfirmation(1);
1852+
return vm;
1853+
}
1854+
18341855
private static int GetAdapterIndex(ChannelsConfigViewModel vm, ChannelType type)
18351856
=> vm.Step.Adapters
18361857
.Select((adapter, index) => (adapter.Type, index))

src/Netclaw.Cli.Tests/Tui/Config/SearchConfigEditorViewModelTests.cs

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,27 @@ public async Task NavigateBack_during_validation_abandons_the_stale_probe_result
209209
Assert.Equal(SearchConfigEditorScreen.ProviderSelection, vm.CurrentScreen.Value);
210210
}
211211

212+
[Fact]
213+
public async Task Re_entrant_submit_while_a_probe_is_in_flight_is_ignored()
214+
{
215+
var gate = new TaskCompletionSource();
216+
using var vm = new SearchConfigEditorViewModel(_paths, new GatedHttpClientFactory(gate.Task));
217+
vm.SelectBackendForEditing("searxng");
218+
vm.SetFieldValue("Search.SearXngEndpoint", "https://search.test.local");
219+
220+
// First submit starts a probe that blocks in the gated handler.
221+
var first = vm.SubmitCurrentConfigurationFromInputAsync(TestContext.Current.CancellationToken);
222+
Assert.False(first.IsCompleted);
223+
224+
// A second Enter while the first is still validating must NOT launch an overlapping probe + disk
225+
// write (two would race the same config file). The guard returns the same in-flight task.
226+
var second = vm.SubmitCurrentConfigurationFromInputAsync(TestContext.Current.CancellationToken);
227+
Assert.Same(first, second);
228+
229+
gate.SetResult();
230+
await first;
231+
}
232+
212233
[Fact]
213234
public void Save_anyway_persists_config_and_secret_semantically()
214235
{

src/Netclaw.Cli.Tests/Tui/SingleThreadSynchronizationContext.cs

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,26 @@ private void Pump()
6565
{
6666
SetSynchronizationContext(this);
6767
foreach (var (callback, state) in _queue.GetConsumingEnumerable())
68-
callback(state);
68+
{
69+
try
70+
{
71+
callback(state);
72+
}
73+
catch (Exception ex)
74+
{
75+
// Keep the single worker alive if a posted continuation throws. The Run() entry point
76+
// already funnels its scenario's exceptions to a TaskCompletionSource (so the test sees
77+
// the real failure); letting one stray continuation kill the worker here would drain no
78+
// further callbacks and make every awaiting test look like a generic deadlock instead.
79+
_lastError ??= ex;
80+
}
81+
}
6982
}
7083

84+
/// <summary>The first exception thrown by a posted continuation, if any — for test diagnostics.</summary>
85+
public Exception? LastError => _lastError;
86+
87+
private volatile Exception? _lastError;
88+
7189
public void Dispose() => _queue.CompleteAdding();
7290
}

src/Netclaw.Cli/Tui/Config/ChannelsConfigViewModel.cs

Lines changed: 49 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,13 @@ public sealed class ChannelsConfigViewModel : ReactiveViewModel
4343
private CancellationTokenSource? _labelResolutionCts;
4444
private Task? _labelRefreshTask;
4545

46+
// Cancels every input-triggered config write (and its channel-access probe) when the editor is
47+
// torn down. Fire-and-forget writes resume on thread-pool continuations (the loop has no
48+
// SynchronizationContext), so without a lifetime token a write started just before disposal would
49+
// run its probe to completion and then mutate already-disposed reactive state. Threaded into the
50+
// FromInput entry points; cancelled and drained in Dispose.
51+
private readonly CancellationTokenSource _lifetimeCts = new();
52+
4653
public ChannelsConfigViewModel(
4754
NetclawPaths paths,
4855
ISlackProbe slackProbe,
@@ -248,14 +255,20 @@ private static ConfigStatusMessage BuildSaveStatus(string successMessage, IReadO
248255
$"Saved. Could not resolve: {string.Join(", ", unresolved.Select(static name => $"#{name}"))} — flagged below; fix or remove them.",
249256
ConfigStatusTone.Warning);
250257

251-
internal async Task<bool> SaveFromInputAsync(CancellationToken ct = default)
252-
=> await ConfigAutosave.RunAsync(
253-
token => SaveAsync("Channels saved.", probeChannelAccess: true, token),
258+
// Single autosave wrapper for both the explicit save (probe on) and the completed-action autosaves
259+
// (probe off, via SaveCompletedAsync). ConfigAutosave.RunAsync catches any failure and surfaces it
260+
// as an Error status; this returns whether the save succeeded.
261+
private Task<bool> SaveViaAutosaveAsync(string successMessage, bool probeChannelAccess, CancellationToken ct)
262+
=> ConfigAutosave.RunAsync(
263+
token => SaveAsync(successMessage, probeChannelAccess, token),
254264
Status,
255265
"Channel settings save failed",
256266
RequestRedraw,
257267
ct);
258268

269+
internal Task<bool> SaveFromInputAsync(CancellationToken ct = default)
270+
=> SaveViaAutosaveAsync("Channels saved.", probeChannelAccess: true, ct);
271+
259272
// Input-triggered config writes (autosave, add-channel, reset) run async OFF the Termina loop so
260273
// they never freeze input/rendering on a probe or disk write. The sync .GetAwaiter().GetResult()
261274
// bridges they replaced were implicitly serialized by the single-threaded loop — exactly one ran
@@ -290,10 +303,10 @@ async Task ChainAsync()
290303
// while the add resolves channels against the platform API; the write itself is serialized behind
291304
// any prior config write by EnqueueConfigWriteAsync.
292305
internal Task AddChannelFromInputAsync()
293-
=> EnqueueConfigWriteAsync(() => ApplyAddChannelAsync());
306+
=> EnqueueConfigWriteAsync(() => ApplyAddChannelAsync(_lifetimeCts.Token));
294307

295308
internal Task ResetConfirmationFromInputAsync()
296-
=> EnqueueConfigWriteAsync(() => ApplyResetConfirmationAsync());
309+
=> EnqueueConfigWriteAsync(() => ApplyResetConfirmationAsync(_lifetimeCts.Token));
297310

298311
internal bool TryOpenSelectedAdapterManagement()
299312
{
@@ -1349,6 +1362,16 @@ internal async Task ApplyResetConfirmationAsync(CancellationToken ct = default)
13491362
// SaveAsync, so it carries its own equivalent guard.
13501363
Status.Value = new ConfigStatusMessage($"Could not save reset: {ex.Message}", ConfigStatusTone.Error);
13511364
}
1365+
catch (Exception ex)
1366+
{
1367+
// Fail LOUD on any other error (e.g. a type-malformed but JSON-valid config surfacing as
1368+
// InvalidOperationException from the reload's deserialization). This runs as a fire-and-forget
1369+
// chained write whose ChainAsync swallows a faulted prior task, so an unsurfaced throw here
1370+
// would vanish with no operator feedback — exactly the silent fallback to avoid. Surface it
1371+
// and stay on the confirmation screen for retry; catching here also keeps the chained task
1372+
// from faulting, so subsequent writes are unaffected.
1373+
Status.Value = new ConfigStatusMessage($"Could not reset {resetName}: {ex.Message}", ConfigStatusTone.Error);
1374+
}
13521375

13531376
NotifyContentChanged();
13541377
}
@@ -1361,8 +1384,27 @@ public void RequestQuit()
13611384

13621385
public override void Dispose()
13631386
{
1387+
// Cancel any in-flight config write / label refresh, then DRAIN them before disposing the
1388+
// reactive state they publish to. A fire-and-forget write resumes on a thread-pool continuation
1389+
// (the loop has no SynchronizationContext), so without this a write could mutate a disposed
1390+
// ReactiveProperty / Step after teardown. Cancellation makes the in-flight probe abort promptly;
1391+
// the bounded Wait is a last-resort backstop so Dispose can never block the loop indefinitely on
1392+
// a wedged probe (it returns false on timeout rather than throwing or hanging).
1393+
_lifetimeCts.Cancel();
13641394
_labelResolutionCts?.Cancel();
1395+
try
1396+
{
1397+
Task.WhenAll(_pendingConfigWrite, _labelRefreshTask ?? Task.CompletedTask)
1398+
.Wait(TimeSpan.FromSeconds(5));
1399+
}
1400+
catch
1401+
{
1402+
// A faulted/cancelled in-flight write has already surfaced its own status (autosave/reset
1403+
// both catch and report); swallow here so teardown completes regardless.
1404+
}
1405+
13651406
_labelResolutionCts?.Dispose();
1407+
_lifetimeCts.Dispose();
13661408
IsSaved.Dispose();
13671409
Screen.Dispose();
13681410
Status.Dispose();
@@ -1407,17 +1449,12 @@ private void SetActiveAdapterEnabled(bool enabled)
14071449
// is serialized behind any in-flight write and never blocks the loop. Autosave skips the blocking
14081450
// channel-access probe (the background label refresh re-validates instead).
14091451
private void AutosaveCompletedAction(string successMessage)
1410-
=> _ = EnqueueConfigWriteAsync(() => SaveCompletedAsync(successMessage));
1452+
=> _ = EnqueueConfigWriteAsync(() => SaveCompletedAsync(successMessage, _lifetimeCts.Token));
14111453

14121454
// Awaitable autosave for callers already running inside an enqueued write (ApplyAddChannelAsync),
14131455
// where re-enqueueing would deadlock the op behind itself. Returns whether the save succeeded.
14141456
private Task<bool> SaveCompletedAsync(string successMessage, CancellationToken ct = default)
1415-
=> ConfigAutosave.RunAsync(
1416-
token => SaveAsync(successMessage, probeChannelAccess: false, token),
1417-
Status,
1418-
"Channel settings save failed",
1419-
RequestRedraw,
1420-
ct);
1457+
=> SaveViaAutosaveAsync(successMessage, probeChannelAccess: false, ct);
14211458

14221459
private int GetAdapterIndex(ChannelType type)
14231460
=> Step.Adapters

src/Netclaw.Cli/Tui/Config/SearchConfigEditorViewModel.cs

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -309,17 +309,36 @@ public async Task TestCurrentConfigurationAsync(CancellationToken ct = default)
309309
public async Task SaveAsync(CancellationToken ct = default)
310310
=> await SubmitCurrentConfigurationAsync(ct);
311311

312-
internal async Task SubmitCurrentConfigurationFromInputAsync(CancellationToken ct = default)
312+
// Guards against a second Enter (or Enter while a probe is still running) launching an overlapping
313+
// submit. The dispatch is fire-and-forget from the synchronous key handler, so without this two
314+
// rapid submits would race the same network probe and disk write (the same hazard Channels solved
315+
// with its config-write chain). The in-flight task is read/written only on the loop thread (the
316+
// synchronous prefix before the first await), so it needs no synchronization. Exposed as
317+
// PendingSubmit so tests can await completion deterministically.
318+
private Task? _pendingSubmit;
319+
320+
internal Task? PendingSubmit => _pendingSubmit;
321+
322+
internal Task SubmitCurrentConfigurationFromInputAsync(CancellationToken ct = default)
313323
{
314-
try
315-
{
316-
await SubmitCurrentConfigurationAsync(ct);
317-
}
318-
catch (Exception ex)
324+
if (_pendingSubmit is { IsCompleted: false })
325+
return _pendingSubmit;
326+
327+
_pendingSubmit = RunAsync();
328+
return _pendingSubmit;
329+
330+
async Task RunAsync()
319331
{
320-
CurrentScreen.Value = SearchConfigEditorScreen.Entry;
321-
Status.Value = new ConfigStatusMessage($"Search settings save failed: {ex.Message}", ConfigStatusTone.Error);
322-
RequestRedraw();
332+
try
333+
{
334+
await SubmitCurrentConfigurationAsync(ct);
335+
}
336+
catch (Exception ex)
337+
{
338+
CurrentScreen.Value = SearchConfigEditorScreen.Entry;
339+
Status.Value = new ConfigStatusMessage($"Search settings save failed: {ex.Message}", ConfigStatusTone.Error);
340+
RequestRedraw();
341+
}
323342
}
324343
}
325344

0 commit comments

Comments
 (0)