Skip to content

kernel: lenient binding parsing — one typo no longer drops the entire bindings table #1145

@benhoverter

Description

@benhoverter

Description

A single misspelled field anywhere in [[bindings]] currently causes the entire bindings table to fail parsing at config load. Because KernelConfig deliberately does not use deny_unknown_fields at the top level (forward-compat with new sections), the failure surfaces as a silent default: every agent ends up unbound, the daemon boots clean, and routing simply… stops working. There is no log line that points at the typo.

This is the worst possible failure mode for a routing config: catastrophic, silent, and global on a per-character mistake.

Reproducer

In config.toml:

[[bindings]]
agent = "assistant"
match_rule = { channel = "general" }

[[bindings]]
agent = "coder"
match_rule = { cccchannel = "dev" }   # typo

[[bindings]]
agent = "researcher"
match_rule = { channel = "research" }

Current behavior: try_into::<KernelConfig>() fails on the entire bindings array because of deny_unknown_fields on BindingMatchRule. Daemon starts with zero bindings loaded. No agent receives any message. No clear log line indicates why.

Expected behavior: entry 2 is dropped with a clear ERROR naming the agent and the bad field; entries 1 and 3 load and route normally; a single WARN summarizes "dropped 1, loaded 2".

Fix

Parse the bindings table independently of the rest of the config, entry-by-entry, before the main try_into::<KernelConfig>() call.

  • New lenient_extract_bindings(&toml::Value) -> Vec<AgentBinding> runs after include-merge / [api] migration, before the main deserialize step.
  • Each entry is try_into::<AgentBinding>()'d on its own. On failure: ERROR with index, agent name (best-effort extracted from the raw table), and the underlying serde error message — which already lists the valid field names. On success: pushed to the surviving list.
  • A single aggregate WARN reports total dropped vs. surviving counts.
  • Per-entry deny_unknown_fields on both AgentBinding and BindingMatchRule is preserved — silent typos still fail loudly per entry, just no longer catastrophically across the whole table.
  • The raw bindings array is then reinjected into the config value and the normal deserialize proceeds.

Example log output (from live test against the reproducer above)

ERROR Skipping malformed binding #1 (agent='coder'):
      unknown field `cccchannel`, expected one of
      `channel`, `account_id`, `peer_id`, `guild_id`, `channel_id`, `roles`
WARN  Dropped 1 malformed binding(s); 2 binding(s) will load.
INFO  Loaded agent bindings into router count=2

The error message names the agent, the typo'd field, and the valid alternatives. The fix is trivially derivable from the log alone.

Expected Behavior

  • A typo in any single binding's match_rule (or at the binding top level) drops only that entry.
  • The error log names the agent and the offending field.
  • All other bindings load.
  • First-match-wins routing semantics are preserved across drops — survivor order matches source order.

Steps to Reproduce (current behavior)

  1. Apply the reproducer config above (or any config where one binding has any unknown field).
  2. Restart the daemon.
  3. Observe: zero bindings loaded, no agent receives messages, no clear error pointing at the typo.

OpenFang Version

v0.6.2 (head 15ed29c).

Operating System

Reproduced on macOS 14.x (Apple Silicon). Not OS-dependent — pure config-parsing logic.

Logs / Screenshots

Live log output included above. Tests in the accompanying PR cover all behaviors.

Test coverage in the proposed PR

crates/openfang-kernel/src/config.rs — +7 tests:

  • test_lenient_bindings_drops_typo_keeps_rest — the reproducer above.
  • test_lenient_bindings_all_valid_unchanged — happy path: no behavioral change for clean configs.
  • test_lenient_bindings_all_malformed_yields_empty_but_keeps_rest_of_config — every binding malformed → empty bindings, but the rest of the config still loads.
  • test_lenient_bindings_no_bindings_section_is_noop — configs without [[bindings]] are untouched.
  • test_lenient_bindings_missing_agent_field_dropped — entry without an agent field is dropped (cannot be routed).
  • test_lenient_bindings_preserves_survivor_order — locks in that first-match-wins routing semantics cannot silently regress when a middle entry is dropped.
  • test_lenient_bindings_top_level_field_typo_dropped — locks in that deny_unknown_fields catches operator typos at the binding top level (e.g. agnt = ...), not just inside match_rule.

Files touched

  • crates/openfang-kernel/src/config.rs (only)

Note for reviewers

This is intentionally narrow: it changes the failure mode of binding parsing, nothing else. The lenient pass is gated to the [[bindings]] array; every other config section continues through the existing try_into::<KernelConfig>() path unchanged. A TODO is left on the remaining warn!-on-load fallback for the non-binding silent-default path, as a follow-up.

Pairs naturally with #1144 (which adds deny_unknown_fields to the AgentBinding wrapper) — together they make binding misconfig caught early (#1144) and survivable (this issue) instead of silent-and-catastrophic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions