Document benchmark safety planning

dahlia · dahlia · commit 8c48bcdb67c5 · 2026-06-07T23:18:37.000+09:00
Update the benchmark manual and CLI help to describe discovery-aware dry runs, DNS-backed target classification, and the narrower unsafe public-target override. Link deployment guidance to fedify bench so staging and CI checks are part of production preparation. #744 #784 Assisted-by: Codex:gpt-5.5
diff --git a/CHANGES.md b/CHANGES.md
@@ -286,9 +286,13 @@ To be released.
     throughput, success rate, and errors, reading server-side metrics from the
     target's stats endpoint.  Benchmarks are described by a YAML or JSON
     scenario suite validated against a published JSON Schema, with an `expect`
-    block per scenario that gates a run for CI.  [[#744], [#783]]
+    block per scenario that gates a run for CI.  The command refuses public
+    non-`benchmarkMode` targets without an explicit unsafe override, supports
+    discovery-aware `--dry-run` planning, and ships with a local benchmark
+    fixture used by the scenario tests.  [[#744], [#783], [#784]]
 
 [#783]: https://github.com/fedify-dev/fedify/issues/783
+[#784]: https://github.com/fedify-dev/fedify/issues/784
 
 ### @fedify/fixture
 
diff --git a/docs/manual/benchmarking.md b/docs/manual/benchmarking.md
@@ -151,7 +151,10 @@ The `# yaml-language-server:` line gives editors autocomplete and validation
 against the [published schema].
 Override the file's target with `--target`, choose the output with
 `--format`/`--output`, and inspect a run without sending anything with
-`--dry-run`.
+`--dry-run`.  A dry run still probes the target's benchmark stats endpoint and
+resolves scenario discovery, such as WebFinger and actor inbox lookup, so the
+printed plan shows the concrete destinations a real run would use.  It does
+not send benchmark load.
 
 An `inbox` scenario's `recipient` may be a single value or a list.  With a
 list, deliveries are rotated across the recipients (and across the synthetic
@@ -241,10 +244,22 @@ belongs in a controlled environment, not a shared CI runner.
 ### Safety
 
 `fedify bench` runs without friction against a loopback or private target, or
-any target that advertises benchmark mode.  A public target that does not
-advertise benchmark mode is refused unless you pass `--allow-unsafe-target`,
-which is mandatory (never prompted) in CI and any non-interactive context.  Use
-`--dry-run` to print the plan without sending anything.
+any target that advertises benchmark mode.  Hostnames are classified from their
+resolved addresses when possible, and DNS failures are treated as public so the
+gate stays conservative.  A public target that does not advertise benchmark
+mode is refused unless you pass `--allow-unsafe-target`, which is mandatory
+(never prompted) in CI and any non-interactive context.
+
+The unsafe override is deliberately narrow.  It must be paired with an
+explicit `--target` on the command line, and every scenario must set its load
+(`rate` or `concurrency`) and `duration` explicitly, either in the scenario or
+in suite defaults.  This prevents a public run from falling back to built-in
+defaults by accident.
+
+Use `--dry-run` as the first step against an unfamiliar target.  It performs
+the benchmark-mode probe and discovery requests needed to print the planned
+WebFinger resources and inbox destinations, but it does not send signed inbox
+deliveries or other benchmark load.
 
 ### Local targets over HTTP
 
diff --git a/docs/manual/deploy.md b/docs/manual/deploy.md
@@ -1287,6 +1287,17 @@ operations across your infrastructure.  For production:
 
 [OpenTelemetry]: https://opentelemetry.io/
 
+### Benchmarking before production
+
+Before changing queue backends, federation handlers, or signature-related
+configuration, run [`fedify bench`](./benchmarking.md) against a local or
+staging target that enables `benchmarkMode`.  The benchmark command drives
+signed inbox deliveries and WebFinger lookups, reports latency, throughput,
+success rate, and errors, and can gate CI with `expect` thresholds.  Do not
+enable `benchmarkMode` on production servers; for an unfamiliar target, start
+with `--dry-run` to resolve discovery and inspect the planned destinations
+without sending benchmark load.
+
 ### Error reporting
 
 For error aggregation, the pattern most Fedify applications use is a
diff --git a/packages/cli/src/bench/command.ts b/packages/cli/src/bench/command.ts
@@ -75,7 +75,7 @@ export const benchCommand = command(
       dryRun: withDefault(
         flag("--dry-run", {
           description:
-            message`Print the normalized plan without contacting the target or \
+            message`Resolve discovery and print the benchmark plan without \
 sending load.`,
         }),
         false,