Description
A single misspelled field anywhere in [[bindings]] currently causes the entire bindings table to fail parsing at config load. Because KernelConfig deliberately does not use deny_unknown_fields at the top level (forward-compat with new sections), the failure surfaces as a silent default: every agent ends up unbound, the daemon boots clean, and routing simply… stops working. There is no log line that points at the typo.
This is the worst possible failure mode for a routing config: catastrophic, silent, and global on a per-character mistake.
Reproducer
In config.toml:
[[bindings]]
agent = "assistant"
match_rule = { channel = "general" }
[[bindings]]
agent = "coder"
match_rule = { cccchannel = "dev" } # typo
[[bindings]]
agent = "researcher"
match_rule = { channel = "research" }
Current behavior: try_into::<KernelConfig>() fails on the entire bindings array because of deny_unknown_fields on BindingMatchRule. Daemon starts with zero bindings loaded. No agent receives any message. No clear log line indicates why.
Expected behavior: entry 2 is dropped with a clear ERROR naming the agent and the bad field; entries 1 and 3 load and route normally; a single WARN summarizes "dropped 1, loaded 2".
Fix
Parse the bindings table independently of the rest of the config, entry-by-entry, before the main try_into::<KernelConfig>() call.
- New
lenient_extract_bindings(&toml::Value) -> Vec<AgentBinding> runs after include-merge / [api] migration, before the main deserialize step.
- Each entry is
try_into::<AgentBinding>()'d on its own. On failure: ERROR with index, agent name (best-effort extracted from the raw table), and the underlying serde error message — which already lists the valid field names. On success: pushed to the surviving list.
- A single aggregate WARN reports total dropped vs. surviving counts.
- Per-entry
deny_unknown_fields on both AgentBinding and BindingMatchRule is preserved — silent typos still fail loudly per entry, just no longer catastrophically across the whole table.
- The raw
bindings array is then reinjected into the config value and the normal deserialize proceeds.
Example log output (from live test against the reproducer above)
ERROR Skipping malformed binding #1 (agent='coder'):
unknown field `cccchannel`, expected one of
`channel`, `account_id`, `peer_id`, `guild_id`, `channel_id`, `roles`
WARN Dropped 1 malformed binding(s); 2 binding(s) will load.
INFO Loaded agent bindings into router count=2
The error message names the agent, the typo'd field, and the valid alternatives. The fix is trivially derivable from the log alone.
Expected Behavior
- A typo in any single binding's
match_rule (or at the binding top level) drops only that entry.
- The error log names the agent and the offending field.
- All other bindings load.
- First-match-wins routing semantics are preserved across drops — survivor order matches source order.
Steps to Reproduce (current behavior)
- Apply the reproducer config above (or any config where one binding has any unknown field).
- Restart the daemon.
- Observe: zero bindings loaded, no agent receives messages, no clear error pointing at the typo.
OpenFang Version
v0.6.2 (head 15ed29c).
Operating System
Reproduced on macOS 14.x (Apple Silicon). Not OS-dependent — pure config-parsing logic.
Logs / Screenshots
Live log output included above. Tests in the accompanying PR cover all behaviors.
Test coverage in the proposed PR
crates/openfang-kernel/src/config.rs — +7 tests:
test_lenient_bindings_drops_typo_keeps_rest — the reproducer above.
test_lenient_bindings_all_valid_unchanged — happy path: no behavioral change for clean configs.
test_lenient_bindings_all_malformed_yields_empty_but_keeps_rest_of_config — every binding malformed → empty bindings, but the rest of the config still loads.
test_lenient_bindings_no_bindings_section_is_noop — configs without [[bindings]] are untouched.
test_lenient_bindings_missing_agent_field_dropped — entry without an agent field is dropped (cannot be routed).
test_lenient_bindings_preserves_survivor_order — locks in that first-match-wins routing semantics cannot silently regress when a middle entry is dropped.
test_lenient_bindings_top_level_field_typo_dropped — locks in that deny_unknown_fields catches operator typos at the binding top level (e.g. agnt = ...), not just inside match_rule.
Files touched
crates/openfang-kernel/src/config.rs (only)
Note for reviewers
This is intentionally narrow: it changes the failure mode of binding parsing, nothing else. The lenient pass is gated to the [[bindings]] array; every other config section continues through the existing try_into::<KernelConfig>() path unchanged. A TODO is left on the remaining warn!-on-load fallback for the non-binding silent-default path, as a follow-up.
Pairs naturally with #1144 (which adds deny_unknown_fields to the AgentBinding wrapper) — together they make binding misconfig caught early (#1144) and survivable (this issue) instead of silent-and-catastrophic.
Description
A single misspelled field anywhere in
[[bindings]]currently causes the entire bindings table to fail parsing at config load. BecauseKernelConfigdeliberately does not usedeny_unknown_fieldsat the top level (forward-compat with new sections), the failure surfaces as a silent default: every agent ends up unbound, the daemon boots clean, and routing simply… stops working. There is no log line that points at the typo.This is the worst possible failure mode for a routing config: catastrophic, silent, and global on a per-character mistake.
Reproducer
In
config.toml:Current behavior:
try_into::<KernelConfig>()fails on the entirebindingsarray because ofdeny_unknown_fieldsonBindingMatchRule. Daemon starts with zero bindings loaded. No agent receives any message. No clear log line indicates why.Expected behavior: entry 2 is dropped with a clear ERROR naming the agent and the bad field; entries 1 and 3 load and route normally; a single WARN summarizes "dropped 1, loaded 2".
Fix
Parse the bindings table independently of the rest of the config, entry-by-entry, before the main
try_into::<KernelConfig>()call.lenient_extract_bindings(&toml::Value) -> Vec<AgentBinding>runs after include-merge /[api]migration, before the main deserialize step.try_into::<AgentBinding>()'d on its own. On failure: ERROR with index, agent name (best-effort extracted from the raw table), and the underlying serde error message — which already lists the valid field names. On success: pushed to the surviving list.deny_unknown_fieldson bothAgentBindingandBindingMatchRuleis preserved — silent typos still fail loudly per entry, just no longer catastrophically across the whole table.bindingsarray is then reinjected into the config value and the normal deserialize proceeds.Example log output (from live test against the reproducer above)
The error message names the agent, the typo'd field, and the valid alternatives. The fix is trivially derivable from the log alone.
Expected Behavior
match_rule(or at the binding top level) drops only that entry.Steps to Reproduce (current behavior)
OpenFang Version
v0.6.2(head15ed29c).Operating System
Reproduced on macOS 14.x (Apple Silicon). Not OS-dependent — pure config-parsing logic.
Logs / Screenshots
Live log output included above. Tests in the accompanying PR cover all behaviors.
Test coverage in the proposed PR
crates/openfang-kernel/src/config.rs— +7 tests:test_lenient_bindings_drops_typo_keeps_rest— the reproducer above.test_lenient_bindings_all_valid_unchanged— happy path: no behavioral change for clean configs.test_lenient_bindings_all_malformed_yields_empty_but_keeps_rest_of_config— every binding malformed → empty bindings, but the rest of the config still loads.test_lenient_bindings_no_bindings_section_is_noop— configs without[[bindings]]are untouched.test_lenient_bindings_missing_agent_field_dropped— entry without anagentfield is dropped (cannot be routed).test_lenient_bindings_preserves_survivor_order— locks in that first-match-wins routing semantics cannot silently regress when a middle entry is dropped.test_lenient_bindings_top_level_field_typo_dropped— locks in thatdeny_unknown_fieldscatches operator typos at the binding top level (e.g.agnt = ...), not just insidematch_rule.Files touched
crates/openfang-kernel/src/config.rs(only)Note for reviewers
This is intentionally narrow: it changes the failure mode of binding parsing, nothing else. The lenient pass is gated to the
[[bindings]]array; every other config section continues through the existingtry_into::<KernelConfig>()path unchanged. ATODOis left on the remainingwarn!-on-load fallback for the non-binding silent-default path, as a follow-up.Pairs naturally with #1144 (which adds
deny_unknown_fieldsto theAgentBindingwrapper) — together they make binding misconfig caught early (#1144) and survivable (this issue) instead of silent-and-catastrophic.