Skip to content

Commit a9ebf3e

Browse files
committed
refactor(gateway): extract source plugin trait and common config validation
Eliminates ~400 lines of duplication across http.rs, streams.rs, and the 10 resolve_* functions in config.rs by introducing: - A gateway-internal SourcePlugin trait with enum dispatch (unit structs + monomorphised entry points, zero vtables) replacing the 9-arm if-let chains in http.rs and streams.rs. - A shared resolve_common() helper in config.rs extracting the four repeated validation blocks (subject_prefix, stream_name, nats_ack_timeout, stream_max_age) that appeared in every per-source resolver. - Nine thin wrapper impls (GithubPlugin, SlackPlugin, etc.) that delegate to the existing source crate routers and provision fns. No source crate code or dependency changes needed. Discord is intentionally excluded from SourcePlugin: its webhook-less primary path is handled by the gateway runner spawned in main.rs, which stays untouched. All 606+ workspace tests pass unchanged.
1 parent e2b7c43 commit a9ebf3e

6 files changed

Lines changed: 819 additions & 488 deletions

File tree

PLUGIN_SYSTEM_PLAN.md

Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
# Source-Plugin Refactor Plan
2+
3+
**Worktree:** `/Users/enilsen/Projects/TrogonStack/trogonai-plugin-system-dev`
4+
**Branch:** `plugin-system-dev` (currently equal to `main`)
5+
**Archive of prior attempt:** branch `plugin-system-wip-archive` (broken, do not merge)
6+
**Authored:** 2026-05-14
7+
8+
---
9+
10+
## 1. Why this plan replaces the previous one
11+
12+
The previous `PLUGIN_SYSTEM_PLAN.md` (now on `plugin-system-wip-archive`) over-fits the problem. It frames the work as building a "plugin marketplace" — discovery, dynamic `.so/.dll/.dylib` loading, permission models, sandboxing, signed plugins, hot reload, health-monitoring background tasks, plus a `ConfigSchema` DSL with `ConfigField`/`ValidationRule` types. None of that is needed for 10 first-party source connectors that ship in the same binary. The implementation attempt produced ~2500 lines of new infrastructure to eliminate ~500 lines of real duplication, and the trait/registry/caller signatures don't even line up across layers (concrete `NatsJetStreamClient` vs `Arc<dyn JetStreamPublisher>` vs `impl JetStreamPublisher`). The archive branch contains two non-matching plugin impls per source and runs the legacy `nest("/github", ...)` chain in parallel with the registry — a guaranteed runtime `.nest()` panic.
13+
14+
The original plan also references `crates/trogon-std/src/source_plugin.rs` as if it existed. It doesn't.
15+
16+
This plan starts from the **real, measured** duplication and proposes the minimum abstraction that removes it.
17+
18+
---
19+
20+
## 2. The actual problem (measured)
21+
22+
Three files on `main`:
23+
24+
| File | Total lines | Duplicated dispatch / boilerplate |
25+
|---|---|---|
26+
| `crates/trogon-gateway/src/http.rs` | 145 | ~36 lines (9× 4-line `if let Some(ref cfg)` blocks calling `nest()`). Discord skipped (uses a gateway runner, not a webhook). |
27+
| `crates/trogon-gateway/src/streams.rs` | 165 | ~40 lines (10× 4-line `if let Some(ref cfg)` blocks calling `provision()`). |
28+
| `crates/trogon-gateway/src/config.rs` | 2677 | ~340 lines across 10 `resolve_X` functions. Each is ~60-70 lines; about 34 lines per function are pure copy-paste (subject_prefix / stream_name / nats_ack_timeout_secs / stream_max_age_secs validation). The rest is per-source secret types and source-specific fields (e.g. Discord `gateway_intents`, Linear `timestamp_tolerance_secs`, Sentry's 1s ack-timeout cap). |
29+
30+
Total real duplication: ~400 lines. Not a "marketplace" problem — a trait-and-helper problem.
31+
32+
Discord is a partial exception: it has both a webhook path (`http.rs`/`streams.rs`) **and** a long-running gateway runner spawned in `main.rs`. The runner is a separate concern and should not contort the trait.
33+
34+
---
35+
36+
## 3. Goals & explicit non-goals
37+
38+
### Goals
39+
1. Replace the 10-arm `if let Some(ref cfg)` chains in `http.rs` and `streams.rs` with one iteration over a registry.
40+
2. Eliminate the ~340 lines of common-field validation duplication in `config.rs`.
41+
3. Source crates own their own typed config and webhook router (no behavioral change for any source).
42+
4. Adding an 11th source = implementing one trait + one line in the registry, not editing three gateway files.
43+
5. The diff compiles, `cargo test --workspace` passes, no behavioral regression.
44+
45+
### Non-goals (explicit — these are what blew up the previous plan)
46+
- ❌ Dynamic plugin loading (`.so`/`.dll`/`.dylib` / `libloading`). Static linking is fine.
47+
- ❌ Plugin discovery / `plugin.toml` manifests / filesystem scanning.
48+
- ❌ Permission model, sandboxing, signature verification.
49+
- ❌ Hot reload, plugin lifecycle states (`Starting`/`Stopped`/`Failed`), health-monitoring background task.
50+
-`ConfigSchema` DSL (`ConfigField`, `ConfigFieldType`, `ValidationRule`). The existing typed config structs already do this in Rust's type system.
51+
- ❌ Separate `WebhookHandler` trait split from `SourcePlugin`. One trait. Routes are part of what a webhook source does.
52+
-`catch_unwind` "failure isolation". Panics in first-party source crates are bugs to fix, not isolate; and `catch_unwind` around an `async` block only catches panics during future *construction*, not execution — the reasoning is wrong anyway.
53+
- ❌ Any new workspace crate (e.g. `trogon-plugin`, `trogon-source-api`). The trait is small enough to live in the gateway crate at `crates/trogon-gateway/src/source_plugin.rs`. A separate crate also creates a circular dependency since the trait takes `&ResolvedConfig`, which is defined in the gateway.
54+
- ❌ Replacing the typed `ResolvedConfig { github: Option<...>, slack: Option<...>, ... }` with a `HashMap<String, ResolvedSourceConfig> + serde_json::Value` bag. The typed-per-source shape is correct; we only need a *registry* over it.
55+
56+
---
57+
58+
## 4. Proposed design
59+
60+
### 4.1 Where things live
61+
62+
```
63+
crates/trogon-gateway/src/
64+
├── source_plugin.rs (NEW, ~120 lines: trait, enum dispatch, wrapper impls)
65+
├── config.rs (MODIFIED: extract resolve_common(), keep per-source resolve_X)
66+
├── http.rs (MODIFIED: iterate over enum variants instead of 9 if-let arms)
67+
├── streams.rs (MODIFIED: iterate over enum variants instead of 10 if-let arms)
68+
└── main.rs (no source-crate registration needed — enum lives in source_plugin.rs)
69+
```
70+
71+
**No new workspace crates.** The trait is gateway-internal. Source crates stay untouched — their `Cargo.toml` files and `lib.rs` files do not change. The gateway's `source_plugin.rs` holds one wrapper struct per source (e.g., `struct GithubPlugin;`) that delegates to the source crate's existing `router()` and `provision()` functions.
72+
73+
Why no new crate: putting the trait in a `trogon-source-api` crate (as one earlier draft suggested) creates a circular-dependency problem — the trait methods need to take `&crate::config::ResolvedConfig`, which lives in `trogon-gateway`. The trait would have to either duplicate the config type or take generic associated types, both worse than keeping it inside the gateway.
74+
75+
### 4.2 The trait + enum dispatch (recommended approach)
76+
77+
Use **enum dispatch**, not `Box<dyn SourcePlugin>`. The trait stays generic over the publisher/context types so the existing `MockJetStreamPublisher` / `MockJetStreamContext` mocks keep working unchanged. The enum collapses the runtime choice without needing `dyn`-safety.
78+
79+
```rust
80+
// crates/trogon-gateway/src/source_plugin.rs
81+
82+
use axum::Router;
83+
use trogon_nats::jetstream::{
84+
ClaimCheckPublisher, JetStreamContext, JetStreamPublisher, ObjectStorePut,
85+
};
86+
87+
use crate::config::ResolvedConfig;
88+
89+
/// Stable identifier for a source. Matches the TOML key and the URL prefix.
90+
pub type SourceId = &'static str;
91+
92+
/// One source connector (one webhook receiver). Discord's *gateway runner*
93+
/// is NOT a SourcePlugin — see §6.
94+
pub trait SourcePlugin: Send + Sync {
95+
fn id(&self) -> SourceId;
96+
fn is_enabled(&self, resolved: &ResolvedConfig) -> bool;
97+
98+
async fn provision<C: JetStreamContext>(
99+
&self,
100+
js: &C,
101+
resolved: &ResolvedConfig,
102+
) -> Result<(), C::Error>;
103+
104+
fn routes<P, S>(
105+
&self,
106+
publisher: ClaimCheckPublisher<P, S>,
107+
resolved: &ResolvedConfig,
108+
) -> Router
109+
where
110+
P: JetStreamPublisher,
111+
S: ObjectStorePut;
112+
}
113+
114+
// Wrappers. Each is a unit struct that delegates to the source crate's
115+
// existing pub fns. No changes to the source crates themselves.
116+
pub struct GithubPlugin;
117+
pub struct SlackPlugin;
118+
pub struct TelegramPlugin;
119+
pub struct TwitterPlugin;
120+
pub struct GitlabPlugin;
121+
pub struct IncidentioPlugin;
122+
pub struct LinearPlugin;
123+
pub struct NotionPlugin;
124+
pub struct SentryPlugin;
125+
// (Discord is intentionally absent: its webhook-less primary path is the
126+
// gateway runner in main.rs. See §6.)
127+
128+
impl SourcePlugin for GithubPlugin {
129+
fn id(&self) -> SourceId { "github" }
130+
fn is_enabled(&self, r: &ResolvedConfig) -> bool { r.github.is_some() }
131+
async fn provision<C: JetStreamContext>(
132+
&self, js: &C, r: &ResolvedConfig,
133+
) -> Result<(), C::Error> {
134+
if let Some(cfg) = &r.github { trogon_source_github::provision(js, cfg).await } else { Ok(()) }
135+
}
136+
fn routes<P, S>(&self, publisher: ClaimCheckPublisher<P, S>, r: &ResolvedConfig) -> Router
137+
where P: JetStreamPublisher, S: ObjectStorePut,
138+
{
139+
r.github.as_ref()
140+
.map(|cfg| trogon_source_github::router(publisher, cfg))
141+
.unwrap_or_else(Router::new)
142+
}
143+
}
144+
// ...similar for the other 8.
145+
146+
/// Iterate the registry. Generic-method dispatch resolves at compile time
147+
/// per concrete (P, S, C) — no vtables.
148+
pub fn for_each_plugin<F: FnMut(&dyn ErasedIter)>(_f: F) {
149+
// see below — actually iterate concretely
150+
}
151+
```
152+
153+
**Iterating without `dyn`:** since the methods are generic, you can't store the plugins in a `Vec<Box<dyn SourcePlugin>>`. Two viable patterns — pick one:
154+
155+
- **Pattern A (tuple + macro): preferred.** A small `for_each_source!` macro that expands to a sequence of statements over `(GithubPlugin, SlackPlugin, …)`. Each call site (provision loop, mount loop) writes the macro invocation. Adding a source = adding one entry to the macro list. ~15 lines for the macro. **This is the recommended pattern.**
156+
- **Pattern B (explicit fns):** `pub fn provision_all<C>(js: &C, r: &ResolvedConfig) -> ...` and `pub fn mount_all<P, S>(router: Router, publisher: ..., r: &ResolvedConfig) -> Router` each containing the 9 calls in sequence (no `if let``is_enabled` gates inside each wrapper). Slightly more boilerplate, slightly clearer in stack traces.
157+
158+
Either way, `http.rs` shrinks from 9 if-let arms to one call (`source_plugin::mount_all(router, publisher, &config)`); `streams.rs` similarly shrinks to one call.
159+
160+
**Why not `dyn`:** It would require concrete-typed methods (e.g. `&NatsJetStreamClient`, `ClaimCheckPublisher<Arc<dyn JetStreamPublisher>, Arc<dyn ObjectStorePut>>`), which forces blanket impls of the publisher/context traits for `Arc<dyn T>` that this codebase doesn't currently have. Adding those blanket impls is doable but unrelated work; enum/macro dispatch sidesteps the question for now. If the project ever wants runtime-loaded plugins, revisit then.
161+
162+
### 4.3 Common-field validation extraction
163+
164+
Add to `config.rs`, around line ~470 (above the per-source resolve fns):
165+
166+
```rust
167+
struct ResolvedCommon {
168+
subject_prefix: NatsToken,
169+
stream_name: NatsToken,
170+
stream_max_age: StreamMaxAge,
171+
nats_ack_timeout: NonZeroDuration,
172+
}
173+
174+
fn resolve_common(
175+
source: &'static str,
176+
subject_prefix: String,
177+
stream_name: String,
178+
nats_ack_timeout_secs: u64,
179+
stream_max_age_secs: u64,
180+
errors: &mut Vec<ConfigValidationError>,
181+
) -> Option<ResolvedCommon> {
182+
// Move the 4 repeated match blocks here. Return None on any failure
183+
// (errors pushed). Returns Some only when all four validate.
184+
}
185+
```
186+
187+
Each `resolve_X` shrinks from ~65 lines to ~20-25 lines (status check + source-specific secret/field + `let common = resolve_common(...)?;` + final struct).
188+
189+
Net deletion in `config.rs`: ~250 lines. (Less than 340 because each call site keeps a few lines, and we add ~30 lines of helper.)
190+
191+
### 4.4 Source crates stay untouched
192+
193+
All `impl SourcePlugin` blocks live in `gateway/src/source_plugin.rs` and delegate to the existing `pub fn router(...)` and `pub fn provision(...)` of each source crate. No source crate's `Cargo.toml` or `lib.rs` needs to change — this is purely a gateway-internal refactor. If `gateway/src/source_plugin.rs` grows past ~300 LoC, revisit and consider moving impls into source crates as a follow-up; behavior would be identical.
194+
195+
---
196+
197+
## 5. Phases
198+
199+
Each phase is a small, mergeable PR.
200+
201+
### Phase 1 — Trait + GitHub-only spike (1 day)
202+
1. In `gateway/src/source_plugin.rs` (new file): define the `SourcePlugin` trait (§4.2), the `GithubPlugin` wrapper, and the dispatch entry points (`provision_all` / `mount_all` containing just one call each for now).
203+
2. In `gateway/src/main.rs`: no change yet (the dispatch fns are called from `http.rs` / `streams.rs`).
204+
3. In `http.rs`: **replace** the `if let Some(ref cfg) = config.github { app = app.nest(...) }` arm with a call to `source_plugin::mount_all(app, publisher, &config)` (which internally only mounts GitHub for now). Other 8 arms untouched.
205+
4. In `streams.rs`: **replace** the `if let Some(ref cfg) = config.github { ... }` arm with `source_plugin::provision_all(client, config).await?;` (which internally only provisions GitHub). Other 9 arms untouched.
206+
5. **Do not leave both paths active for GitHub.** This is the trap the archive fell into.
207+
6. **Verify:** `cargo build --workspace --all-targets` clean, `cargo test --workspace` green. The existing `mount_sources_with_all_sources_builds_router` and `provision_all_sources_creates_all_streams` tests must still pass byte-identical results.
208+
7. **PR boundary:** open and merge this PR before starting Phase 2. The diff is small (~200 LoC), reviewable, zero behavioral change. Phase 2 must rebase on the merged Phase 1 — do **not** stack all four phases on one branch.
209+
210+
### Phase 2 — Migrate the remaining 8 (~1 hour each, ship separately or batched per ~3)
211+
Discord has **no** webhook route on `main` (verified: `http.rs` has 9 if-let arms, Discord is not among them — only the gateway runner spawn in `main.rs`). Do not add Discord to `SourcePlugin`.
212+
213+
Order (least → most exotic):
214+
- slack, gitlab, telegram, twitter (pure webhook + signing secret pattern, identical to GitHub)
215+
- linear (adds `timestamp_tolerance_secs`)
216+
- notion (`verification_token` not a secret)
217+
- sentry (1s ack-timeout cap — preserve the cap logic in `resolve_sentry`)
218+
- incidentio (`signing_secret` + `timestamp_tolerance_secs`)
219+
220+
For each: add the wrapper to `gateway/src/source_plugin.rs`, add to `build_registry()`, delete the corresponding if-let arm from `http.rs`/`streams.rs`.
221+
222+
**Verify after each:** existing tests still pass. Don't batch — migrate, test, commit, repeat. The gateway's `mount_sources_with_all_sources_builds_router` test is the canary.
223+
224+
### Phase 3 — Extract `resolve_common()` (half day)
225+
Refactor `config.rs` per §4.3. Pure code motion; no behavior change. All existing `config.rs` tests must still pass byte-for-byte.
226+
227+
### Phase 4 — Cleanup (half day)
228+
- Delete `config.rs` value-object newtypes that have become unused (GitHub gating its own newtype in `trogon-source-github` is the existing pattern — verify none of the moved ones are needed elsewhere).
229+
- Update any docs in `crates/trogon-gateway/AGENTS.md` referencing the dispatch chain.
230+
- Final `cargo clippy --workspace -- -D warnings` (workspace already sets `all = "deny"`).
231+
232+
---
233+
234+
## 6. Discord's gateway runner — leave alone
235+
236+
`main.rs` spawns `trogon_source_discord::gateway_runner::run(...)` as a long-running task. **Do not** try to model this through `SourcePlugin`. It's a different lifecycle (long-running, no HTTP route, no JetStream stream provisioning at the same shape). Keep the explicit spawn in `main.rs`. Mention this in a one-line comment so the next reader knows it's intentional.
237+
238+
If a future need arises to abstract multi-runtime sources (e.g., Discord-style WebSocket connectors), add a separate trait (`LongRunningSource` or similar) at that point. Not now.
239+
240+
---
241+
242+
## 7. Definition of done
243+
244+
1. `cargo build --workspace --all-targets` clean on `plugin-system-dev`.
245+
2. `cargo test --workspace` passes (same test count or higher than `main`).
246+
3. `cargo clippy --workspace --all-targets -- -D warnings` clean.
247+
4. `http.rs` and `streams.rs` each contain at most one explicit reference to a specific source crate (the discord gateway-runner spawn lives in `main.rs`, not these files).
248+
5. Adding a hypothetical 11th source = ~50 lines: a `router()` and `provision()` in the source crate (existing pattern), a wrapper in `gateway/src/source_plugin.rs`, one line in `build_registry()`, and a `resolve_<x>` in `config.rs` that calls `resolve_common()`.
249+
6. `config.rs` is shorter than on `main` (target: -200 LoC or better).
250+
7. **No new top-level crates.**
251+
8. No new dependencies added to any `Cargo.toml`.
252+
9. PRs ship in the order in §5. Each is independently revertable.
253+
254+
---
255+
256+
## 8. For the implementing model — don't do these things
257+
258+
These are the specific failure modes from the prior attempt; flagging them so they're not repeated:
259+
260+
1. **Don't put both the new path and the legacy path side-by-side** ("Use plugin registry to mount routes" *then* keep the `if let Some(ref cfg) = config.github { app = app.nest("/github", ...) }` block "for sources not yet migrated"). That double-mounts at runtime → `nest()` panics on the second call. Either fully migrate a source or leave it alone for the PR; never both.
261+
262+
2. **Don't define the trait with concrete types in one place and `dyn`-erased generics elsewhere.** Pick one and stick to it. See §4.2.
263+
264+
3. **Don't write two separate `impl SourcePlugin` for one source** (the archive has `trogon-source-github/src/plugin.rs` AND `plugin_adapter.rs`, neither of which matches the trait).
265+
266+
4. **Don't add `catch_unwind` around `async` blocks** and call it "failure isolation". It doesn't catch panics inside the future. Don't add panic isolation at all — see non-goals.
267+
268+
5. **Don't replace the typed `ResolvedConfig { github: Option<...>, ... }` with a `HashMap<String, ResolvedSourceConfig>` of `serde_json::Value` bags.** The typed shape is a feature, not duplication.
269+
270+
6. **Don't introduce a `WebhookHandler` trait separate from `SourcePlugin`.** One trait. If you can't fit routes into the trait without coupling to Axum, just couple to Axum — the gateway uses Axum already; this trait is gateway-internal.
271+
272+
7. **Don't create a `ConfigSchema` / `ConfigField` / `ValidationRule` runtime DSL.** Rust's type system is the schema.
273+
274+
8. **Don't add any new workspace crate.** The trait lives in `crates/trogon-gateway/src/source_plugin.rs`. A separate crate for the trait creates a circular dependency with `ResolvedConfig`.
275+
276+
9. **Reset to `main` before starting.** The archive on `plugin-system-wip-archive` is for reference only — do not cherry-pick from it. The trait shapes there are wrong.

0 commit comments

Comments
 (0)