Skip to content

Commit e9b10a1

Browse files
committed
fix(plugin-self-update): persist survives_respawn through install + heal stale manifests
Plugin authors who set `[capabilities].survives_respawn = true` in their source plugin.toml saw the bit silently dropped on `garyx plugins install` — `synthesize_manifest_toml` rendered five capability fields and didn't emit this one, so the regenerated plugin.toml always defaulted false. The host's `request_self_replace` handler reads the flag off the on-disk manifest and refuses every swap with `reason=no_survives_respawn` when false, stranding opted-in plugins at whatever version was first installed: every subsequent self-update tick called the RPC, the host refused, and the install fell back to "no upgrade" silently for as long as the bit kept disappearing on every install. Two coordinated changes: 1. **Fix the renderer.** `CapabilitiesResponse` gains a `survives_respawn` field (serde default false). `synthesize_manifest_toml` threads it into the generated `[capabilities]` block, so fresh installs of any plugin that advertises the bit in its `describe` response get it persisted correctly. The preflight drift comment notes why `survives_respawn` stays out of the capability mismatch check — it's operator opt-in intent rather than intrinsic binary behavior, and a hand-edited disk value may legitimately differ from `describe`. 2. **Heal stale on-disk manifests.** Existing installs synthesized by older garyx still have the field missing; without help they'd need a manual `garyx plugins install --force` to recover. The host now self-heals at startup: after preflight, if `describe` reports `survives_respawn = true` and the on-disk manifest is missing the field, the `[capabilities]` block is patched in place (atomic write via PID + atomic-counter + nanosecond temp name + O_EXCL + sync_all + rename). An explicit `survives_respawn = false` is left alone — operator opt-out is preserved. The in-memory manifest is mutated to match before the manager builds its snapshot, so the first `request_self_replace` this session accepts (no need to wait for next gateway restart). Edits scoped to `[capabilities]` to avoid false positives on stray comments mentioning the field name elsewhere. Section boundary detection only treats subsequent `[...]` headers as end-of-table — TOML blank lines do not end a table, so an explicit `survives_respawn = false` after a blank inside `[capabilities]` is still detected as opt-out (and the file is left byte-identical). Tests: 7 backfill unit tests (missing-field write, byte-identical opt-out, no-section error, trailing-section EOF terminator, scoped detection ignores comments outside the block, blank-line-then-opt-out correctness, 4-thread same-file contention lands a single writer with no duplicate key); 1 round-trip regression test pinning the synthesize emission contract.
1 parent 9582915 commit e9b10a1

9 files changed

Lines changed: 477 additions & 10 deletions

File tree

garyx-channels/src/dispatcher/tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -979,6 +979,7 @@ mod plugin_routing {
979979
streaming: false,
980980
images: false,
981981
files: false,
982+
survives_respawn: false,
982983
}
983984
}
984985

garyx-channels/src/plugin_host/inspect.rs

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,11 @@
2020
//! but neither is useful as the other's primitive.
2121
2222
use std::collections::BTreeMap;
23+
use std::fs;
24+
use std::io;
2325
use std::path::{Path, PathBuf};
2426
use std::sync::Arc;
27+
use std::sync::atomic::{AtomicU64, Ordering};
2528
use std::time::Duration;
2629

2730
use async_trait::async_trait;
@@ -294,6 +297,10 @@ pub fn synthesize_manifest_toml(
294297
out.push_str(&format!("streaming = {}\n", report.capabilities.streaming));
295298
out.push_str(&format!("images = {}\n", report.capabilities.images));
296299
out.push_str(&format!("files = {}\n", report.capabilities.files));
300+
out.push_str(&format!(
301+
"survives_respawn = {}\n",
302+
report.capabilities.survives_respawn
303+
));
297304
out.push('\n');
298305

299306
out.push_str("[runtime]\n");
@@ -430,6 +437,159 @@ fn ensure_executable(_path: &Path) -> Result<(), InspectError> {
430437
Ok(())
431438
}
432439

440+
/// Result of [`backfill_survives_respawn_in_place`].
441+
#[derive(Debug, PartialEq, Eq)]
442+
pub enum BackfillOutcome {
443+
/// Wrote `survives_respawn = true` into the on-disk `[capabilities]`
444+
/// block. The plugin manifest now permits silent self-update.
445+
Wrote,
446+
/// The field was already present (any value). Left untouched —
447+
/// an explicit `survives_respawn = false` is an operator opt-out
448+
/// and must be respected.
449+
AlreadyPresent,
450+
}
451+
452+
/// Patch an installed `plugin.toml` to add `survives_respawn = true`
453+
/// into its `[capabilities]` block when (a) the field is missing
454+
/// entirely (synthesized by an old garyx where the renderer dropped
455+
/// it) and (b) the running plugin advertises it via `describe`. Used
456+
/// by the host startup self-heal path so existing installs unstick
457+
/// themselves on first launch after upgrading garyx, without
458+
/// requiring the operator to re-run `garyx plugins install --force`.
459+
///
460+
/// Caller is responsible for the (b) check — this function just does
461+
/// the textual patch when (a) holds.
462+
///
463+
/// Write is atomic: emit to a sibling `.tmp` file then `rename`. If
464+
/// any step fails the original is untouched.
465+
pub fn backfill_survives_respawn_in_place(path: &Path) -> io::Result<BackfillOutcome> {
466+
let original = fs::read_to_string(path)?;
467+
let lines: Vec<&str> = original.lines().collect();
468+
469+
// Locate the [capabilities] section: only the first occurrence.
470+
// A malformed file with duplicate `[capabilities]` headers would
471+
// already be rejected by the TOML loader downstream.
472+
let Some(caps_start) = lines.iter().position(|l| l.trim() == "[capabilities]") else {
473+
return Err(io::Error::new(
474+
io::ErrorKind::InvalidData,
475+
"plugin.toml has no [capabilities] section to backfill",
476+
));
477+
};
478+
479+
// End of `[capabilities]` table = first subsequent line that's
480+
// a new section header (`[next_table]` or `[capabilities.sub]`),
481+
// or EOF. **Blank lines do NOT end a TOML table** — earlier
482+
// versions of this code treated them as boundaries and would
483+
// silently miss a `survives_respawn = false` opt-out that sat
484+
// after a stylistic blank inside the block, then write a
485+
// duplicate key and produce malformed TOML.
486+
let caps_end = lines
487+
.iter()
488+
.enumerate()
489+
.skip(caps_start + 1)
490+
.find(|(_, l)| l.trim().starts_with('['))
491+
.map(|(i, _)| i)
492+
.unwrap_or(lines.len());
493+
494+
// Look for an explicit `survives_respawn = …` key anywhere in
495+
// the block (not just comments mentioning the name). Operator
496+
// intent to OPT OUT (`survives_respawn = false`) must be
497+
// preserved verbatim.
498+
for line in &lines[caps_start + 1..caps_end] {
499+
let t = line.trim();
500+
if let Some(rest) = t.strip_prefix("survives_respawn") {
501+
// Require an actual `=` after optional whitespace so
502+
// unrelated identifiers don't false-match: e.g.
503+
// `survives_respawnish = true` leaves `ish = true`
504+
// (rejected), and `survives_respawn.subkey = true`
505+
// leaves `.subkey = true` (also rejected — dotted-key
506+
// forms target a different namespace and aren't the
507+
// scalar bool field we're managing).
508+
let rest = rest.trim_start();
509+
if rest.starts_with('=') {
510+
return Ok(BackfillOutcome::AlreadyPresent);
511+
}
512+
}
513+
}
514+
515+
// Insertion point: directly after the last actual key=value
516+
// line in the block (skip trailing blanks / comments so the
517+
// new line lands flush with the existing keys, matching what
518+
// `synthesize_manifest_toml` would have written from scratch).
519+
// Default to `caps_start + 1` when the block has no content
520+
// (the header was the trailing line of the file, or the next
521+
// section header sits immediately below it).
522+
let mut insert_at = caps_start + 1;
523+
for (i, line) in lines.iter().enumerate().take(caps_end).skip(caps_start + 1) {
524+
let t = line.trim();
525+
if !t.is_empty() && !t.starts_with('#') {
526+
insert_at = i + 1;
527+
}
528+
}
529+
530+
// Build output. Walk every original line, append our key
531+
// verbatim at `insert_at`, then continue. When `insert_at`
532+
// equals `lines.len()` (capabilities is trailing with no
533+
// following content) emit it after the loop.
534+
let mut out = String::with_capacity(original.len() + 32);
535+
for (i, line) in lines.iter().enumerate() {
536+
if i == insert_at {
537+
out.push_str("survives_respawn = true\n");
538+
}
539+
out.push_str(line);
540+
out.push('\n');
541+
}
542+
if insert_at == lines.len() {
543+
out.push_str("survives_respawn = true\n");
544+
}
545+
546+
// Atomic write: stage at sibling .tmp then rename so a crash
547+
// mid-write can't leave a half-written plugin.toml that the
548+
// host would refuse to load on next startup. PID + nanosecond
549+
// suffix prevents two concurrent garyx instances (e.g. launchd
550+
// + manual `gateway run`) from racing on the same temp path,
551+
// and `create_new` (O_EXCL) makes a still-impossible collision
552+
// surface as EEXIST instead of silently overwriting a peer's
553+
// staging file.
554+
let parent = path.parent().ok_or_else(|| {
555+
io::Error::new(
556+
io::ErrorKind::InvalidInput,
557+
"plugin.toml path has no parent directory",
558+
)
559+
})?;
560+
let file_name = path
561+
.file_name()
562+
.and_then(|n| n.to_str())
563+
.unwrap_or("plugin.toml");
564+
// PID handles cross-process collisions; the in-process atomic
565+
// counter handles same-process concurrent calls (two threads
566+
// calling backfill at the same nanosecond would otherwise read
567+
// the same SystemTime and collide on `create_new` below).
568+
static BACKFILL_SEQ: AtomicU64 = AtomicU64::new(0);
569+
let seq = BACKFILL_SEQ.fetch_add(1, Ordering::Relaxed);
570+
let nonce = std::time::SystemTime::now()
571+
.duration_since(std::time::UNIX_EPOCH)
572+
.map(|d| d.as_nanos())
573+
.unwrap_or(0);
574+
let tmp = parent.join(format!(
575+
".{file_name}.backfill.{}.{}.{}.tmp",
576+
std::process::id(),
577+
seq,
578+
nonce,
579+
));
580+
{
581+
use std::io::Write;
582+
let mut f = fs::OpenOptions::new()
583+
.write(true)
584+
.create_new(true)
585+
.open(&tmp)?;
586+
f.write_all(out.as_bytes())?;
587+
f.sync_all()?;
588+
}
589+
fs::rename(&tmp, path)?;
590+
Ok(BackfillOutcome::Wrote)
591+
}
592+
433593
/// Minimal TOML string quoter: escapes `\` and `"` so an
434594
/// "unfriendly" plugin id (slashes, quotes, newlines) survives
435595
/// round-trip. Deliberately narrower than `toml::to_string` to keep

0 commit comments

Comments
 (0)