fix(datastore): handle commit failures gracefully instead of panicking (#572)

TimeToBuildBob · web-flow · commit 9a8802a374d8 · 2026-03-05T13:49:56.000+01:00
* fix(datastore): handle commit failures gracefully instead of panicking When a transaction commit fails (e.g. disk full / SQLITE_FULL), the worker thread panicked, permanently breaking the datastore channel. All subsequent requests returned MpscError (HTTP 500) until restart. Replace the panic with error logging and continue. The rolled-back events will be re-sent by watchers via heartbeat or retried by clients. Add CommitFailed error variant mapped to HTTP 503 (Service Unavailable) so clients know to back off and retry. Fixes #256 * fix(datastore): apply graceful error handling to legacy import commit The main work loop commit (line 193) was already handled gracefully (error log + continue), but the legacy import commit (line 143) still panicked on failure. This makes the error handling consistent. Addresses review feedback from Greptile. * docs(datastore): correct misleading comment about event recovery on commit failure * refactor(datastore): remove unused CommitFailed error variant
diff --git a/aw-datastore/src/worker.rs b/aw-datastore/src/worker.rs
@@ -142,7 +142,11 @@ impl DatastoreWorker {
             }
             match transaction.commit() {
                 Ok(_) => (),
-                Err(err) => panic!("Failed to commit datastore transaction! {err}"),
+                Err(err) => {
+                    error!("Failed to commit legacy import transaction: {err}");
+                    // Continue without panicking — legacy import will be retried on
+                    // next startup if the commit didn't persist.
+                }
             }
         }
 
@@ -192,7 +196,18 @@ impl DatastoreWorker {
             );
             match tx.commit() {
                 Ok(_) => (),
-                Err(err) => panic!("Failed to commit datastore transaction! {err}"),
+                Err(err) => {
+                    error!(
+                        "Failed to commit datastore transaction ({} events lost): {err}",
+                        self.uncommitted_events
+                    );
+                    // Continue instead of panicking — the worker thread survives this
+                    // transient failure (e.g. SQLITE_FULL on disk full). Note: clients
+                    // already received success responses before the commit, so they won't
+                    // know to retry. Rolled-back events create a gap in the timeline;
+                    // watchers will resume sending heartbeats from current state, but the
+                    // specific batch of events is permanently lost.
+                }
             }
             if self.quit {
                 break;