Skip to content

Commit 9a8802a

Browse files
fix(datastore): handle commit failures gracefully instead of panicking (#572)
* fix(datastore): handle commit failures gracefully instead of panicking When a transaction commit fails (e.g. disk full / SQLITE_FULL), the worker thread panicked, permanently breaking the datastore channel. All subsequent requests returned MpscError (HTTP 500) until restart. Replace the panic with error logging and continue. The rolled-back events will be re-sent by watchers via heartbeat or retried by clients. Add CommitFailed error variant mapped to HTTP 503 (Service Unavailable) so clients know to back off and retry. Fixes #256 * fix(datastore): apply graceful error handling to legacy import commit The main work loop commit (line 193) was already handled gracefully (error log + continue), but the legacy import commit (line 143) still panicked on failure. This makes the error handling consistent. Addresses review feedback from Greptile. * docs(datastore): correct misleading comment about event recovery on commit failure * refactor(datastore): remove unused CommitFailed error variant
1 parent 5747056 commit 9a8802a

1 file changed

Lines changed: 17 additions & 2 deletions

File tree

aw-datastore/src/worker.rs

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,11 @@ impl DatastoreWorker {
142142
}
143143
match transaction.commit() {
144144
Ok(_) => (),
145-
Err(err) => panic!("Failed to commit datastore transaction! {err}"),
145+
Err(err) => {
146+
error!("Failed to commit legacy import transaction: {err}");
147+
// Continue without panicking — legacy import will be retried on
148+
// next startup if the commit didn't persist.
149+
}
146150
}
147151
}
148152

@@ -192,7 +196,18 @@ impl DatastoreWorker {
192196
);
193197
match tx.commit() {
194198
Ok(_) => (),
195-
Err(err) => panic!("Failed to commit datastore transaction! {err}"),
199+
Err(err) => {
200+
error!(
201+
"Failed to commit datastore transaction ({} events lost): {err}",
202+
self.uncommitted_events
203+
);
204+
// Continue instead of panicking — the worker thread survives this
205+
// transient failure (e.g. SQLITE_FULL on disk full). Note: clients
206+
// already received success responses before the commit, so they won't
207+
// know to retry. Rolled-back events create a gap in the timeline;
208+
// watchers will resume sending heartbeats from current state, but the
209+
// specific batch of events is permanently lost.
210+
}
196211
}
197212
if self.quit {
198213
break;

0 commit comments

Comments
 (0)