runtime: add adaptive Slack bridge restart policy#148
Merged
benvinegar merged 2 commits intomainfrom Feb 23, 2026
Merged
Conversation
Greptile SummaryReplaced fixed-delay Slack bridge restart loop with shared supervisor library supporting adaptive backoff, jitter, and failure-threshold signaling while preserving backward compatibility. Key improvements:
Backward compatibility:
Integration points:
All changes properly tested with 6 new test cases covering mode detection, integer parsing, backoff computation, and jitter bounds. Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[start.sh / startup-cleanup.sh] --> B{Policy Mode Detection}
B -->|No env vars set| C[Legacy Mode]
B -->|BAUDBOT_BRIDGE_RESTART_POLICY set| D{Policy Value}
B -->|Any adaptive knob set| E[Adaptive Mode]
D -->|adaptive| E
D -->|legacy| C
C --> C1[Fixed 5s delay restart loop]
C1 --> C2[Write status file: mode=legacy]
C2 --> C3[Log structured events]
E --> E1[Load adaptive parameters]
E1 --> E2[base_delay, max_delay, stable_window, max_failures, jitter]
E2 --> E3[Restart loop with runtime tracking]
E3 --> E4{Runtime >= stable_window?}
E4 -->|Yes| E5[Reset counters to base_delay]
E4 -->|No| E6[Increment failures, double delay]
E5 --> E7{failures >= threshold?}
E6 --> E7
E7 -->|Yes| E8[Set state=threshold_exceeded]
E7 -->|No| E9[Set state=restarting]
E8 --> E10[Add jitter, write status, sleep]
E9 --> E10
E10 --> E3
E10 -.-> S1[Status File JSON]
C2 -.-> S1
S1 --> S2[baudbot status reads supervisor state]
S2 --> S3[Display: healthy/degraded/restarting]
Last reviewed commit: d12b6f4 |
baudbot-agent
pushed a commit
that referenced
this pull request
Feb 23, 2026
When startup-cleanup.sh runs mid-session (called by the control agent), two inherited env vars cause bridge startup failures: 1. PKG_EXECPATH — leaked from the parent varlock-launched process, causes varlock's SEA binary to misinterpret subcommands as Node module paths. The varlock broker-key probes (lines 115-122) silently fail, resulting in 'No Slack transport configured' and the bridge never starting. 2. SLACK_BROKER_ACCESS_TOKEN / SLACK_BROKER_ACCESS_TOKEN_EXPIRES_AT — varlock does not override env vars already present in the parent process. If the broker token was rotated after session start, the supervisor passes the stale (expired) values instead of reading fresh ones from ~/.config/.env. Fix: unset PKG_EXECPATH at script top (before varlock probes), and unset broker token vars in the supervisor subshell (before varlock run). Regression from #148.
baudbot-agent
pushed a commit
that referenced
this pull request
Feb 24, 2026
…-cleanup When startup-cleanup.sh runs mid-session (called by the control agent), inherited env vars cause bridge startup failures: 1. PKG_EXECPATH — leaked from the parent varlock-launched process, causes varlock's SEA binary to misinterpret subcommands as Node module paths. The varlock broker-key probes (lines 115-122) silently fail, resulting in 'No Slack transport configured' and the bridge never starting. 2. varlock run does not override env vars already present in the parent process. If any managed value (broker tokens, API keys, config) was rotated after session start, the supervisor passes the stale values instead of reading fresh ones from ~/.config/.env. Fix: - unset PKG_EXECPATH at the script top (before varlock probes) - In the supervisor subshell, dynamically unset ALL varlock-managed keys via 'varlock load --format env' before calling 'varlock run', so every restart gets fresh values regardless of which keys changed. Regression from #148.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements #121 by replacing the fixed-delay Slack bridge restart loop with a shared supervisor that supports adaptive backoff, jitter, and failure-threshold signaling while preserving legacy behavior by default.
What changed
bin/lib/bridge-restart-policy.shwith reusable supervisor helpers:legacyvsadaptive)bridge-supervisorlog lines~/.pi/agent/slack-bridge-supervisor.json)start.shto usebb_bridge_superviseinstead of hardcoded fixed-delay restart loops.pi/skills/control-agent/startup-cleanup.shto use the same supervisor helper (with legacy fallback if helper is unavailable).bin/deploy.shto stage/deploybin/lib/bridge-restart-policy.shinto runtime.bin/lib/baudbot-runtime.sh(baudbot status) to surface supervisor status, including degraded/threshold-exceeded state.bin/lib/bridge-restart-policy.test.shand wired it intotest/shell-scripts.test.mjs.CONFIGURATION.mdand.env.schema.Backward compatibility
BAUDBOT_BRIDGE_RESTART_POLICY=adaptive, orValidation
bash -non updated shell scripts ✅npm run test:shell✅ (12 tests)npm run lint:shellshellcheckmissing in PATH)