|
| 1 | +# Source-Plugin Refactor Plan |
| 2 | + |
| 3 | +**Worktree:** `/Users/enilsen/Projects/TrogonStack/trogonai-plugin-system-dev` |
| 4 | +**Branch:** `plugin-system-dev` (currently equal to `main`) |
| 5 | +**Archive of prior attempt:** branch `plugin-system-wip-archive` (broken, do not merge) |
| 6 | +**Authored:** 2026-05-14 |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## 1. Why this plan replaces the previous one |
| 11 | + |
| 12 | +The previous `PLUGIN_SYSTEM_PLAN.md` (now on `plugin-system-wip-archive`) over-fits the problem. It frames the work as building a "plugin marketplace" — discovery, dynamic `.so/.dll/.dylib` loading, permission models, sandboxing, signed plugins, hot reload, health-monitoring background tasks, plus a `ConfigSchema` DSL with `ConfigField`/`ValidationRule` types. None of that is needed for 10 first-party source connectors that ship in the same binary. The implementation attempt produced ~2500 lines of new infrastructure to eliminate ~500 lines of real duplication, and the trait/registry/caller signatures don't even line up across layers (concrete `NatsJetStreamClient` vs `Arc<dyn JetStreamPublisher>` vs `impl JetStreamPublisher`). The archive branch contains two non-matching plugin impls per source and runs the legacy `nest("/github", ...)` chain in parallel with the registry — a guaranteed runtime `.nest()` panic. |
| 13 | + |
| 14 | +The original plan also references `crates/trogon-std/src/source_plugin.rs` as if it existed. It doesn't. |
| 15 | + |
| 16 | +This plan starts from the **real, measured** duplication and proposes the minimum abstraction that removes it. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## 2. The actual problem (measured) |
| 21 | + |
| 22 | +Three files on `main`: |
| 23 | + |
| 24 | +| File | Total lines | Duplicated dispatch / boilerplate | |
| 25 | +|---|---|---| |
| 26 | +| `crates/trogon-gateway/src/http.rs` | 145 | ~36 lines (9× 4-line `if let Some(ref cfg)` blocks calling `nest()`). Discord skipped (uses a gateway runner, not a webhook). | |
| 27 | +| `crates/trogon-gateway/src/streams.rs` | 165 | ~40 lines (10× 4-line `if let Some(ref cfg)` blocks calling `provision()`). | |
| 28 | +| `crates/trogon-gateway/src/config.rs` | 2677 | ~340 lines across 10 `resolve_X` functions. Each is ~60-70 lines; about 34 lines per function are pure copy-paste (subject_prefix / stream_name / nats_ack_timeout_secs / stream_max_age_secs validation). The rest is per-source secret types and source-specific fields (e.g. Discord `gateway_intents`, Linear `timestamp_tolerance_secs`, Sentry's 1s ack-timeout cap). | |
| 29 | + |
| 30 | +Total real duplication: ~400 lines. Not a "marketplace" problem — a trait-and-helper problem. |
| 31 | + |
| 32 | +Discord is a partial exception: it has both a webhook path (`http.rs`/`streams.rs`) **and** a long-running gateway runner spawned in `main.rs`. The runner is a separate concern and should not contort the trait. |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## 3. Goals & explicit non-goals |
| 37 | + |
| 38 | +### Goals |
| 39 | +1. Replace the 10-arm `if let Some(ref cfg)` chains in `http.rs` and `streams.rs` with one iteration over a registry. |
| 40 | +2. Eliminate the ~340 lines of common-field validation duplication in `config.rs`. |
| 41 | +3. Source crates own their own typed config and webhook router (no behavioral change for any source). |
| 42 | +4. Adding an 11th source = implementing one trait + one line in the registry, not editing three gateway files. |
| 43 | +5. The diff compiles, `cargo test --workspace` passes, no behavioral regression. |
| 44 | + |
| 45 | +### Non-goals (explicit — these are what blew up the previous plan) |
| 46 | +- ❌ Dynamic plugin loading (`.so`/`.dll`/`.dylib` / `libloading`). Static linking is fine. |
| 47 | +- ❌ Plugin discovery / `plugin.toml` manifests / filesystem scanning. |
| 48 | +- ❌ Permission model, sandboxing, signature verification. |
| 49 | +- ❌ Hot reload, plugin lifecycle states (`Starting`/`Stopped`/`Failed`), health-monitoring background task. |
| 50 | +- ❌ `ConfigSchema` DSL (`ConfigField`, `ConfigFieldType`, `ValidationRule`). The existing typed config structs already do this in Rust's type system. |
| 51 | +- ❌ Separate `WebhookHandler` trait split from `SourcePlugin`. One trait. Routes are part of what a webhook source does. |
| 52 | +- ❌ `catch_unwind` "failure isolation". Panics in first-party source crates are bugs to fix, not isolate; and `catch_unwind` around an `async` block only catches panics during future *construction*, not execution — the reasoning is wrong anyway. |
| 53 | +- ❌ Any new workspace crate (e.g. `trogon-plugin`, `trogon-source-api`). The trait is small enough to live in the gateway crate at `crates/trogon-gateway/src/source_plugin.rs`. A separate crate also creates a circular dependency since the trait takes `&ResolvedConfig`, which is defined in the gateway. |
| 54 | +- ❌ Replacing the typed `ResolvedConfig { github: Option<...>, slack: Option<...>, ... }` with a `HashMap<String, ResolvedSourceConfig> + serde_json::Value` bag. The typed-per-source shape is correct; we only need a *registry* over it. |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## 4. Proposed design |
| 59 | + |
| 60 | +### 4.1 Where things live |
| 61 | + |
| 62 | +``` |
| 63 | +crates/trogon-gateway/src/ |
| 64 | +├── source_plugin.rs (NEW, ~120 lines: trait, enum dispatch, wrapper impls) |
| 65 | +├── config.rs (MODIFIED: extract resolve_common(), keep per-source resolve_X) |
| 66 | +├── http.rs (MODIFIED: iterate over enum variants instead of 9 if-let arms) |
| 67 | +├── streams.rs (MODIFIED: iterate over enum variants instead of 10 if-let arms) |
| 68 | +└── main.rs (no source-crate registration needed — enum lives in source_plugin.rs) |
| 69 | +``` |
| 70 | + |
| 71 | +**No new workspace crates.** The trait is gateway-internal. Source crates stay untouched — their `Cargo.toml` files and `lib.rs` files do not change. The gateway's `source_plugin.rs` holds one wrapper struct per source (e.g., `struct GithubPlugin;`) that delegates to the source crate's existing `router()` and `provision()` functions. |
| 72 | + |
| 73 | +Why no new crate: putting the trait in a `trogon-source-api` crate (as one earlier draft suggested) creates a circular-dependency problem — the trait methods need to take `&crate::config::ResolvedConfig`, which lives in `trogon-gateway`. The trait would have to either duplicate the config type or take generic associated types, both worse than keeping it inside the gateway. |
| 74 | + |
| 75 | +### 4.2 The trait + enum dispatch (recommended approach) |
| 76 | + |
| 77 | +Use **enum dispatch**, not `Box<dyn SourcePlugin>`. The trait stays generic over the publisher/context types so the existing `MockJetStreamPublisher` / `MockJetStreamContext` mocks keep working unchanged. The enum collapses the runtime choice without needing `dyn`-safety. |
| 78 | + |
| 79 | +```rust |
| 80 | +// crates/trogon-gateway/src/source_plugin.rs |
| 81 | + |
| 82 | +use axum::Router; |
| 83 | +use trogon_nats::jetstream::{ |
| 84 | + ClaimCheckPublisher, JetStreamContext, JetStreamPublisher, ObjectStorePut, |
| 85 | +}; |
| 86 | + |
| 87 | +use crate::config::ResolvedConfig; |
| 88 | + |
| 89 | +/// Stable identifier for a source. Matches the TOML key and the URL prefix. |
| 90 | +pub type SourceId = &'static str; |
| 91 | + |
| 92 | +/// One source connector (one webhook receiver). Discord's *gateway runner* |
| 93 | +/// is NOT a SourcePlugin — see §6. |
| 94 | +pub trait SourcePlugin: Send + Sync { |
| 95 | + fn id(&self) -> SourceId; |
| 96 | + fn is_enabled(&self, resolved: &ResolvedConfig) -> bool; |
| 97 | + |
| 98 | + async fn provision<C: JetStreamContext>( |
| 99 | + &self, |
| 100 | + js: &C, |
| 101 | + resolved: &ResolvedConfig, |
| 102 | + ) -> Result<(), C::Error>; |
| 103 | + |
| 104 | + fn routes<P, S>( |
| 105 | + &self, |
| 106 | + publisher: ClaimCheckPublisher<P, S>, |
| 107 | + resolved: &ResolvedConfig, |
| 108 | + ) -> Router |
| 109 | + where |
| 110 | + P: JetStreamPublisher, |
| 111 | + S: ObjectStorePut; |
| 112 | +} |
| 113 | + |
| 114 | +// Wrappers. Each is a unit struct that delegates to the source crate's |
| 115 | +// existing pub fns. No changes to the source crates themselves. |
| 116 | +pub struct GithubPlugin; |
| 117 | +pub struct SlackPlugin; |
| 118 | +pub struct TelegramPlugin; |
| 119 | +pub struct TwitterPlugin; |
| 120 | +pub struct GitlabPlugin; |
| 121 | +pub struct IncidentioPlugin; |
| 122 | +pub struct LinearPlugin; |
| 123 | +pub struct NotionPlugin; |
| 124 | +pub struct SentryPlugin; |
| 125 | +// (Discord is intentionally absent: its webhook-less primary path is the |
| 126 | +// gateway runner in main.rs. See §6.) |
| 127 | + |
| 128 | +impl SourcePlugin for GithubPlugin { |
| 129 | + fn id(&self) -> SourceId { "github" } |
| 130 | + fn is_enabled(&self, r: &ResolvedConfig) -> bool { r.github.is_some() } |
| 131 | + async fn provision<C: JetStreamContext>( |
| 132 | + &self, js: &C, r: &ResolvedConfig, |
| 133 | + ) -> Result<(), C::Error> { |
| 134 | + if let Some(cfg) = &r.github { trogon_source_github::provision(js, cfg).await } else { Ok(()) } |
| 135 | + } |
| 136 | + fn routes<P, S>(&self, publisher: ClaimCheckPublisher<P, S>, r: &ResolvedConfig) -> Router |
| 137 | + where P: JetStreamPublisher, S: ObjectStorePut, |
| 138 | + { |
| 139 | + r.github.as_ref() |
| 140 | + .map(|cfg| trogon_source_github::router(publisher, cfg)) |
| 141 | + .unwrap_or_else(Router::new) |
| 142 | + } |
| 143 | +} |
| 144 | +// ...similar for the other 8. |
| 145 | + |
| 146 | +/// Iterate the registry. Generic-method dispatch resolves at compile time |
| 147 | +/// per concrete (P, S, C) — no vtables. |
| 148 | +pub fn for_each_plugin<F: FnMut(&dyn ErasedIter)>(_f: F) { |
| 149 | + // see below — actually iterate concretely |
| 150 | +} |
| 151 | +``` |
| 152 | + |
| 153 | +**Iterating without `dyn`:** since the methods are generic, you can't store the plugins in a `Vec<Box<dyn SourcePlugin>>`. Two viable patterns — pick one: |
| 154 | + |
| 155 | +- **Pattern A (tuple + macro): preferred.** A small `for_each_source!` macro that expands to a sequence of statements over `(GithubPlugin, SlackPlugin, …)`. Each call site (provision loop, mount loop) writes the macro invocation. Adding a source = adding one entry to the macro list. ~15 lines for the macro. **This is the recommended pattern.** |
| 156 | +- **Pattern B (explicit fns):** `pub fn provision_all<C>(js: &C, r: &ResolvedConfig) -> ...` and `pub fn mount_all<P, S>(router: Router, publisher: ..., r: &ResolvedConfig) -> Router` each containing the 9 calls in sequence (no `if let` — `is_enabled` gates inside each wrapper). Slightly more boilerplate, slightly clearer in stack traces. |
| 157 | + |
| 158 | +Either way, `http.rs` shrinks from 9 if-let arms to one call (`source_plugin::mount_all(router, publisher, &config)`); `streams.rs` similarly shrinks to one call. |
| 159 | + |
| 160 | +**Why not `dyn`:** It would require concrete-typed methods (e.g. `&NatsJetStreamClient`, `ClaimCheckPublisher<Arc<dyn JetStreamPublisher>, Arc<dyn ObjectStorePut>>`), which forces blanket impls of the publisher/context traits for `Arc<dyn T>` that this codebase doesn't currently have. Adding those blanket impls is doable but unrelated work; enum/macro dispatch sidesteps the question for now. If the project ever wants runtime-loaded plugins, revisit then. |
| 161 | + |
| 162 | +### 4.3 Common-field validation extraction |
| 163 | + |
| 164 | +Add to `config.rs`, around line ~470 (above the per-source resolve fns): |
| 165 | + |
| 166 | +```rust |
| 167 | +struct ResolvedCommon { |
| 168 | + subject_prefix: NatsToken, |
| 169 | + stream_name: NatsToken, |
| 170 | + stream_max_age: StreamMaxAge, |
| 171 | + nats_ack_timeout: NonZeroDuration, |
| 172 | +} |
| 173 | + |
| 174 | +fn resolve_common( |
| 175 | + source: &'static str, |
| 176 | + subject_prefix: String, |
| 177 | + stream_name: String, |
| 178 | + nats_ack_timeout_secs: u64, |
| 179 | + stream_max_age_secs: u64, |
| 180 | + errors: &mut Vec<ConfigValidationError>, |
| 181 | +) -> Option<ResolvedCommon> { |
| 182 | + // Move the 4 repeated match blocks here. Return None on any failure |
| 183 | + // (errors pushed). Returns Some only when all four validate. |
| 184 | +} |
| 185 | +``` |
| 186 | + |
| 187 | +Each `resolve_X` shrinks from ~65 lines to ~20-25 lines (status check + source-specific secret/field + `let common = resolve_common(...)?;` + final struct). |
| 188 | + |
| 189 | +Net deletion in `config.rs`: ~250 lines. (Less than 340 because each call site keeps a few lines, and we add ~30 lines of helper.) |
| 190 | + |
| 191 | +### 4.4 Source crates stay untouched |
| 192 | + |
| 193 | +All `impl SourcePlugin` blocks live in `gateway/src/source_plugin.rs` and delegate to the existing `pub fn router(...)` and `pub fn provision(...)` of each source crate. No source crate's `Cargo.toml` or `lib.rs` needs to change — this is purely a gateway-internal refactor. If `gateway/src/source_plugin.rs` grows past ~300 LoC, revisit and consider moving impls into source crates as a follow-up; behavior would be identical. |
| 194 | + |
| 195 | +--- |
| 196 | + |
| 197 | +## 5. Phases |
| 198 | + |
| 199 | +Each phase is a small, mergeable PR. |
| 200 | + |
| 201 | +### Phase 1 — Trait + GitHub-only spike (1 day) |
| 202 | +1. In `gateway/src/source_plugin.rs` (new file): define the `SourcePlugin` trait (§4.2), the `GithubPlugin` wrapper, and the dispatch entry points (`provision_all` / `mount_all` containing just one call each for now). |
| 203 | +2. In `gateway/src/main.rs`: no change yet (the dispatch fns are called from `http.rs` / `streams.rs`). |
| 204 | +3. In `http.rs`: **replace** the `if let Some(ref cfg) = config.github { app = app.nest(...) }` arm with a call to `source_plugin::mount_all(app, publisher, &config)` (which internally only mounts GitHub for now). Other 8 arms untouched. |
| 205 | +4. In `streams.rs`: **replace** the `if let Some(ref cfg) = config.github { ... }` arm with `source_plugin::provision_all(client, config).await?;` (which internally only provisions GitHub). Other 9 arms untouched. |
| 206 | +5. **Do not leave both paths active for GitHub.** This is the trap the archive fell into. |
| 207 | +6. **Verify:** `cargo build --workspace --all-targets` clean, `cargo test --workspace` green. The existing `mount_sources_with_all_sources_builds_router` and `provision_all_sources_creates_all_streams` tests must still pass byte-identical results. |
| 208 | +7. **PR boundary:** open and merge this PR before starting Phase 2. The diff is small (~200 LoC), reviewable, zero behavioral change. Phase 2 must rebase on the merged Phase 1 — do **not** stack all four phases on one branch. |
| 209 | + |
| 210 | +### Phase 2 — Migrate the remaining 8 (~1 hour each, ship separately or batched per ~3) |
| 211 | +Discord has **no** webhook route on `main` (verified: `http.rs` has 9 if-let arms, Discord is not among them — only the gateway runner spawn in `main.rs`). Do not add Discord to `SourcePlugin`. |
| 212 | + |
| 213 | +Order (least → most exotic): |
| 214 | +- slack, gitlab, telegram, twitter (pure webhook + signing secret pattern, identical to GitHub) |
| 215 | +- linear (adds `timestamp_tolerance_secs`) |
| 216 | +- notion (`verification_token` not a secret) |
| 217 | +- sentry (1s ack-timeout cap — preserve the cap logic in `resolve_sentry`) |
| 218 | +- incidentio (`signing_secret` + `timestamp_tolerance_secs`) |
| 219 | + |
| 220 | +For each: add the wrapper to `gateway/src/source_plugin.rs`, add to `build_registry()`, delete the corresponding if-let arm from `http.rs`/`streams.rs`. |
| 221 | + |
| 222 | +**Verify after each:** existing tests still pass. Don't batch — migrate, test, commit, repeat. The gateway's `mount_sources_with_all_sources_builds_router` test is the canary. |
| 223 | + |
| 224 | +### Phase 3 — Extract `resolve_common()` (half day) |
| 225 | +Refactor `config.rs` per §4.3. Pure code motion; no behavior change. All existing `config.rs` tests must still pass byte-for-byte. |
| 226 | + |
| 227 | +### Phase 4 — Cleanup (half day) |
| 228 | +- Delete `config.rs` value-object newtypes that have become unused (GitHub gating its own newtype in `trogon-source-github` is the existing pattern — verify none of the moved ones are needed elsewhere). |
| 229 | +- Update any docs in `crates/trogon-gateway/AGENTS.md` referencing the dispatch chain. |
| 230 | +- Final `cargo clippy --workspace -- -D warnings` (workspace already sets `all = "deny"`). |
| 231 | + |
| 232 | +--- |
| 233 | + |
| 234 | +## 6. Discord's gateway runner — leave alone |
| 235 | + |
| 236 | +`main.rs` spawns `trogon_source_discord::gateway_runner::run(...)` as a long-running task. **Do not** try to model this through `SourcePlugin`. It's a different lifecycle (long-running, no HTTP route, no JetStream stream provisioning at the same shape). Keep the explicit spawn in `main.rs`. Mention this in a one-line comment so the next reader knows it's intentional. |
| 237 | + |
| 238 | +If a future need arises to abstract multi-runtime sources (e.g., Discord-style WebSocket connectors), add a separate trait (`LongRunningSource` or similar) at that point. Not now. |
| 239 | + |
| 240 | +--- |
| 241 | + |
| 242 | +## 7. Definition of done |
| 243 | + |
| 244 | +1. `cargo build --workspace --all-targets` clean on `plugin-system-dev`. |
| 245 | +2. `cargo test --workspace` passes (same test count or higher than `main`). |
| 246 | +3. `cargo clippy --workspace --all-targets -- -D warnings` clean. |
| 247 | +4. `http.rs` and `streams.rs` each contain at most one explicit reference to a specific source crate (the discord gateway-runner spawn lives in `main.rs`, not these files). |
| 248 | +5. Adding a hypothetical 11th source = ~50 lines: a `router()` and `provision()` in the source crate (existing pattern), a wrapper in `gateway/src/source_plugin.rs`, one line in `build_registry()`, and a `resolve_<x>` in `config.rs` that calls `resolve_common()`. |
| 249 | +6. `config.rs` is shorter than on `main` (target: -200 LoC or better). |
| 250 | +7. **No new top-level crates.** |
| 251 | +8. No new dependencies added to any `Cargo.toml`. |
| 252 | +9. PRs ship in the order in §5. Each is independently revertable. |
| 253 | + |
| 254 | +--- |
| 255 | + |
| 256 | +## 8. For the implementing model — don't do these things |
| 257 | + |
| 258 | +These are the specific failure modes from the prior attempt; flagging them so they're not repeated: |
| 259 | + |
| 260 | +1. **Don't put both the new path and the legacy path side-by-side** ("Use plugin registry to mount routes" *then* keep the `if let Some(ref cfg) = config.github { app = app.nest("/github", ...) }` block "for sources not yet migrated"). That double-mounts at runtime → `nest()` panics on the second call. Either fully migrate a source or leave it alone for the PR; never both. |
| 261 | + |
| 262 | +2. **Don't define the trait with concrete types in one place and `dyn`-erased generics elsewhere.** Pick one and stick to it. See §4.2. |
| 263 | + |
| 264 | +3. **Don't write two separate `impl SourcePlugin` for one source** (the archive has `trogon-source-github/src/plugin.rs` AND `plugin_adapter.rs`, neither of which matches the trait). |
| 265 | + |
| 266 | +4. **Don't add `catch_unwind` around `async` blocks** and call it "failure isolation". It doesn't catch panics inside the future. Don't add panic isolation at all — see non-goals. |
| 267 | + |
| 268 | +5. **Don't replace the typed `ResolvedConfig { github: Option<...>, ... }` with a `HashMap<String, ResolvedSourceConfig>` of `serde_json::Value` bags.** The typed shape is a feature, not duplication. |
| 269 | + |
| 270 | +6. **Don't introduce a `WebhookHandler` trait separate from `SourcePlugin`.** One trait. If you can't fit routes into the trait without coupling to Axum, just couple to Axum — the gateway uses Axum already; this trait is gateway-internal. |
| 271 | + |
| 272 | +7. **Don't create a `ConfigSchema` / `ConfigField` / `ValidationRule` runtime DSL.** Rust's type system is the schema. |
| 273 | + |
| 274 | +8. **Don't add any new workspace crate.** The trait lives in `crates/trogon-gateway/src/source_plugin.rs`. A separate crate for the trait creates a circular dependency with `ResolvedConfig`. |
| 275 | + |
| 276 | +9. **Reset to `main` before starting.** The archive on `plugin-system-wip-archive` is for reference only — do not cherry-pick from it. The trait shapes there are wrong. |
0 commit comments