Background
Many tool response models carry a warnings: list[str] field where some entries are tagged with an internal BLOCK: sentinel prefix (BLOCK_WARNING_PREFIX). The string-level sentinel is double-purpose:
- Wire signal — tells programmatic consumers "this is a blocking failure, not advisory."
- UI signal — Prefab UI builders substring-match the prefix to gate the Confirm button.
- Apply-path guard — same substring-match used as a second defense so direct callers (skipping the UI) get the same refusal.
Now that #567 dropped the markdown rendering layer, JSON content exposes the raw prefix:
Surfaced in PR #719 (Copilot review comment).
Beyond the leak, the sentinel-string protocol has bigger problems that have been latent:
- Consumers wanting to programmatically react to specific failure modes ("retry with serial_numbers added") must regex the message text. Brittle.
- No structured way to tell the UI which input field caused the block, so form-level error highlighting isn't possible.
- Binary block/no-block doesn't distinguish
advisory (proceed-with-warning) from info (purely contextual) — every non-block warning gets the same treatment.
Proposal
Replace warnings: list[str] with warnings: list[Warning] across every response model that carries warnings (~12 models).
Structured shape
```python
from typing import Any, Literal
from pydantic import BaseModel, Field
class Warning(BaseModel):
"""Structured warning on a preview/apply response.
Replaces the legacy ``list[str]`` shape where the ``BLOCK:`` prefix
double-served as wire signal and UI cue. Three things are now
first-class: severity (how the consumer should respond), message
(human text), field_path (which input caused it, when applicable),
and code (stable programmatic identifier).
\"\"\"
severity: Literal[\"info\", \"advisory\", \"blocking\"] = Field(
description=(
\"How a consumer should respond:\\n\"
\"- 'blocking': apply path will refuse; consumer must fix \"
\"input or state before retrying.\\n\"
\"- 'advisory': operation will proceed but the user should know \"
\"(unusual state, soft warning).\\n\"
\"- 'info': pure context, no action implied.\"
),
)
message: str = Field(
description=(
\"Human-readable message. Must be self-contained — readable \"
\"without consulting other fields. Inline values directly \"
\"(e.g. 'expected 3, got 2.5'); don't reference other Warning \"
\"fields by name.\"
),
)
field_path: str | None = Field(
default=None,
description=(
\"Dotted-path to the offending input field, e.g. \"
\"'rows[0].sales_order_row_id' or 'actual_quantity'. Set when \"
\"the warning is caused by a specific input value (so the UI \"
\"can highlight the field and the LLM knows what to correct). \"
\"None for state-driven warnings (e.g. 'order is already \"
\"DONE') where no single input is at fault.\"
),
)
code: str | None = Field(
default=None,
description=(
\"Optional stable identifier for this warning kind \"
\"(e.g. 'serial_count_mismatch', 'unknown_row_id'). Consumers \"
\"match on this when they want to react programmatically to a \"
\"specific failure mode without parsing the message string. \"
\"Stable across releases; collected in \"
\"``katana_mcp/tools/warning_codes.py`` to prevent drift.\"
),
)
```
Severity decision table
| Case |
Severity |
| Katana will 4xx if we apply |
blocking |
| Operation is a no-op (already DONE, source=destination, zero rows) |
blocking |
| Input is internally inconsistent given current state (unknown row, duplicate keys) |
blocking |
| Input refers to a soft-state row (archived, deleted) and we'll still apply |
advisory |
| State is unusual but legal (MO is BLOCKED status, PO partially received) |
advisory |
| Default was auto-selected (warehouse, currency, etc.) and the user might want to override |
info |
| Operating-mode context (dry-run, sandbox) |
info |
Rule of thumb: set severity=\"blocking\" only when apply will fail or do nothing useful. If a sensible user might rationally proceed after reading the message, it's advisory or info.
Existing call sites → new shape
Every current BLOCK_WARNING_PREFIX and plain-string warning has clean material for all four fields:
| Existing string (paraphrased) |
severity |
field_path |
code |
BLOCK: MO {n} is already completed. No further action will mark it DONE again. |
blocking |
None (state) |
already_completed |
MO {n} is blocked - review before completing |
advisory |
None |
blocked_status |
BLOCK: MO {n} ({sku}) is serial-tracked but actual_quantity ({qty}) is not a whole number |
blocking |
actual_quantity |
serial_qty_not_integer |
BLOCK: MO {n} ({sku}) is serial-tracked. Pass serial_numbers ... |
blocking |
serial_numbers |
serial_numbers_required |
BLOCK: MO {n} ({sku}): serial_numbers count ({a}) must equal actual_quantity ({b}). |
blocking |
serial_numbers |
serial_count_mismatch |
BLOCK: Row override references unknown sales_order_row_id={x} ... |
blocking |
rows[i].sales_order_row_id |
unknown_row_id |
BLOCK: Multiple overrides for the same sales_order_row_id ({dups}); ... |
blocking |
rows |
duplicate_row_overrides |
BLOCK: Source and destination are the same location ... |
blocking |
destination_location_id |
same_source_destination |
BLOCK: Sales order {n} has no rows to fulfill. |
blocking |
None (state) |
empty_order |
Each row becomes a single Pydantic construction at the call site. No string concatenation, no sentinel prefix, no substring matching.
Design decisions worth pinning
Three severity levels, not binary. A boolean (blocked: bool) is one bit and we'd regret it. Three buckets cover real distinctions: refuse / let-them-decide / FYI. The info bucket has no callers today but it's free to reserve — and the first time someone wants a soft "we picked the default warehouse for you" surface, they won't have to misuse advisory.
field_path as a string, not a tuple. Pydantic's native error format is loc: tuple (e.g. ('rows', 0, 'sales_order_row_id')). We could mirror that. The rendered string form (rows[0].sales_order_row_id) wins because:
- Round-trips through JSON cleaner.
- LLM consumers parse it more reliably than a heterogeneous array.
- UIs splitting on
. / [ to highlight a form field have an easy job.
- A Pydantic validator can accept tuple input from internal callers and render to string on serialization.
code for programmatic matching. A consumer that wants to retry with corrected input shouldn't have to regex the message — they check code == \"serial_numbers_required\" and know to add the field. Messages can evolve for humans; codes stay stable for machines. Acceptance bullet below requires every existing BLOCK call site to pick a code and record it in a shared constants module.
No free-form details: dict[str, Any]. Considered and deliberately left out for v1:
- Consumers can't depend on bag keys without a per-warning schema, so it's back to free-text parsing.
- The values that matter for remediation are already in
message (we write "sales order has rows [101, 102, 103]" inline today).
- If a real pattern emerges (e.g. "every blocking warning should surface valid alternatives"), add a typed field then. Cheap to extend; expensive to remove.
Scope (~12 response models)
CreateStockTransferResponse
ModifyStockTransferResponse
CreatePurchaseOrderResponse / ModifyPurchaseOrderResponse
CreateSalesOrderResponse / ModifySalesOrderResponse
CreateManufacturingOrderResponse / ModifyManufacturingOrderResponse
CreateStockAdjustmentResponse / UpdateStockAdjustmentResponse / DeleteStockAdjustmentResponse
ReceivePurchaseOrderResponse
CorrectOrderResponse (correct_sales_order / correct_purchase_order / correct_manufacturing_order)
FulfillOrderResponse
Plus every Prefab UI builder in prefab_ui.py that substring-matches BLOCK: to gate the Confirm button — they switch to reading w.severity == \"blocking\" instead.
Plus every apply-path safety guard in orders.py (and friends) doing the same substring-match — same swap.
Migration
warnings: list[str] → warnings: list[Warning] is breaking on the wire. Three options considered:
- Hard swap. Just change the type.
- Parallel field for a release. Add
warnings_v2: list[Warning]; deprecate warnings; remove next major.
- Computed-field bridge. Keep
warnings: list[str] as a @computed_field that strips BLOCK: from new structured _warnings: list[Warning].
Going with (1). No external programmatic consumers of these responses live outside the MCP server itself, and the migration cost for any that exist is the same as today's: parse a different shape. PR will land as a feat!: with explicit notes.
Acceptance
Out of scope
- The Prefab UI's substring-match in
_modification.py's rendering helpers — separate from response models, will follow the same swap pattern but in a follow-up if not already covered.
- Cross-cutting refactor of
next_actions / other list[str] fields that don't carry sentinel prefixes — those are pure advisory text, no parallel signal.
- Localization / i18n of warning messages —
code is intentionally locale-free so a future i18n layer can map codes to localized strings without changing the wire format.
Discovered in
PR #719 review (2026-05-13). Original comment.
Design discussion
Full rationale (including alternatives considered for severity buckets, field_path format, and the case against details: dict) walked through in PR #719 review thread on 2026-05-13. Pasted into this issue body as the canonical reference; reviewers don't need to chase the thread.
Background
Many tool response models carry a
warnings: list[str]field where some entries are tagged with an internalBLOCK:sentinel prefix (BLOCK_WARNING_PREFIX). The string-level sentinel is double-purpose:Now that #567 dropped the markdown rendering layer, JSON
contentexposes the raw prefix:format=\"json\"was passed. Pre-existing condition; no regression from design(mcp): drop hand-written markdown formatters; content becomes structured_content JSON (umbrella) #567.create_stock_transfer— markdown was the only content path before design(mcp): drop hand-written markdown formatters; content becomes structured_content JSON (umbrella) #567 (the markdown formatter stripped the prefix). Post-design(mcp): drop hand-written markdown formatters; content becomes structured_content JSON (umbrella) #567,make_json_resultexposes the prefix for the first time. Net new exposure.Surfaced in PR #719 (Copilot review comment).
Beyond the leak, the sentinel-string protocol has bigger problems that have been latent:
advisory(proceed-with-warning) frominfo(purely contextual) — every non-block warning gets the same treatment.Proposal
Replace
warnings: list[str]withwarnings: list[Warning]across every response model that carries warnings (~12 models).Structured shape
```python
from typing import Any, Literal
from pydantic import BaseModel, Field
class Warning(BaseModel):
"""Structured warning on a preview/apply response.
```
Severity decision table
blockingblockingblockingadvisoryadvisoryinfoinfoRule of thumb: set
severity=\"blocking\"only when apply will fail or do nothing useful. If a sensible user might rationally proceed after reading the message, it'sadvisoryorinfo.Existing call sites → new shape
Every current
BLOCK_WARNING_PREFIXand plain-string warning has clean material for all four fields:severityfield_pathcodeBLOCK: MO {n} is already completed. No further action will mark it DONE again.blockingNone(state)already_completedMO {n} is blocked - review before completingadvisoryNoneblocked_statusBLOCK: MO {n} ({sku}) is serial-tracked but actual_quantity ({qty}) is not a whole numberblockingactual_quantityserial_qty_not_integerBLOCK: MO {n} ({sku}) is serial-tracked. Pass serial_numbers ...blockingserial_numbersserial_numbers_requiredBLOCK: MO {n} ({sku}): serial_numbers count ({a}) must equal actual_quantity ({b}).blockingserial_numbersserial_count_mismatchBLOCK: Row override references unknown sales_order_row_id={x} ...blockingrows[i].sales_order_row_idunknown_row_idBLOCK: Multiple overrides for the same sales_order_row_id ({dups}); ...blockingrowsduplicate_row_overridesBLOCK: Source and destination are the same location ...blockingdestination_location_idsame_source_destinationBLOCK: Sales order {n} has no rows to fulfill.blockingNone(state)empty_orderEach row becomes a single Pydantic construction at the call site. No string concatenation, no sentinel prefix, no substring matching.
Design decisions worth pinning
Three severity levels, not binary. A boolean (
blocked: bool) is one bit and we'd regret it. Three buckets cover real distinctions: refuse / let-them-decide / FYI. Theinfobucket has no callers today but it's free to reserve — and the first time someone wants a soft "we picked the default warehouse for you" surface, they won't have to misuseadvisory.field_pathas a string, not a tuple. Pydantic's native error format isloc: tuple(e.g.('rows', 0, 'sales_order_row_id')). We could mirror that. The rendered string form (rows[0].sales_order_row_id) wins because:./[to highlight a form field have an easy job.codefor programmatic matching. A consumer that wants to retry with corrected input shouldn't have to regex the message — they checkcode == \"serial_numbers_required\"and know to add the field. Messages can evolve for humans; codes stay stable for machines. Acceptance bullet below requires every existing BLOCK call site to pick a code and record it in a shared constants module.No free-form
details: dict[str, Any]. Considered and deliberately left out for v1:message(we write "sales order has rows [101, 102, 103]" inline today).Scope (~12 response models)
CreateStockTransferResponseModifyStockTransferResponseCreatePurchaseOrderResponse/ModifyPurchaseOrderResponseCreateSalesOrderResponse/ModifySalesOrderResponseCreateManufacturingOrderResponse/ModifyManufacturingOrderResponseCreateStockAdjustmentResponse/UpdateStockAdjustmentResponse/DeleteStockAdjustmentResponseReceivePurchaseOrderResponseCorrectOrderResponse(correct_sales_order / correct_purchase_order / correct_manufacturing_order)FulfillOrderResponsePlus every Prefab UI builder in
prefab_ui.pythat substring-matchesBLOCK:to gate the Confirm button — they switch to readingw.severity == \"blocking\"instead.Plus every apply-path safety guard in
orders.py(and friends) doing the same substring-match — same swap.Migration
warnings: list[str]→warnings: list[Warning]is breaking on the wire. Three options considered:warnings_v2: list[Warning]; deprecatewarnings; remove next major.warnings: list[str]as a@computed_fieldthat stripsBLOCK:from new structured_warnings: list[Warning].Going with (1). No external programmatic consumers of these responses live outside the MCP server itself, and the migration cost for any that exist is the same as today's: parse a different shape. PR will land as a
feat!:with explicit notes.Acceptance
Warningmodel defined inkatana_mcp.tools.tool_result_utils(or newwarnings.py) with the four fields above. Field docstrings match the proposal verbatim.katana_mcp.tools.warning_codesconstants module holds the canonicalcodevalues as string constants (e.g.ALREADY_COMPLETED = \"already_completed\"). Every existing BLOCK call site picks a code from this module.warnings: list[str]→warnings: list[Warning].w.startswith(BLOCK_WARNING_PREFIX)switches tow.severity == \"blocking\".BLOCK_WARNING_PREFIXconstant deleted fromtool_result_utils.py.json.loads(content)[\"warnings\"][i]returns a dict with{\"severity\", \"message\", \"field_path\", \"code\"}keys; no string content begins with\"BLOCK:\".Warningwithseverity=\"blocking\"and a non-emptycode. Pinned per-tool so a future contributor adding a BLOCK can't skip the code.Warningclass docstring or a sibling ADR so contributors aren't guessing.Out of scope
_modification.py's rendering helpers — separate from response models, will follow the same swap pattern but in a follow-up if not already covered.next_actions/ otherlist[str]fields that don't carry sentinel prefixes — those are pure advisory text, no parallel signal.codeis intentionally locale-free so a future i18n layer can map codes to localized strings without changing the wire format.Discovered in
PR #719 review (2026-05-13). Original comment.
Design discussion
Full rationale (including alternatives considered for severity buckets, field_path format, and the case against
details: dict) walked through in PR #719 review thread on 2026-05-13. Pasted into this issue body as the canonical reference; reviewers don't need to chase the thread.