Skip to content

Editor: Preserve empty object attributes through block parse/serialize#1

Draft
margolisj wants to merge 4 commits into
trunkfrom
preserve-empty-object-attributes
Draft

Editor: Preserve empty object attributes through block parse/serialize#1
margolisj wants to merge 4 commits into
trunkfrom
preserve-empty-object-attributes

Conversation

@margolisj

@margolisj margolisj commented May 30, 2026

Copy link
Copy Markdown
Owner

Summary

Block attribute JSON objects ({}) and arrays ([]) both decode to PHP arrays via json_decode( ..., true ), so an empty object is indistinguishable from an empty array after parsing and serializes back out as []. On round-trip paths that parse and then re-serialize existing post content — Block Hooks (apply_block_hooks_to_content()) and KSES filtering (filter_block_content()) — this silently rewrites {} to [], corrupting block markup whose attributes rely on the object shape.

This is the well-known PHP limitation that an empty associative array and an empty list are the same type, combined with json_encode() always emitting [] for an empty array. The canonical behavior is defined by the JavaScript parser/serializer, which preserves {} vs [] natively; serialize_block_attributes() is documented to remain in sync with the JS serializeAttributes function, so PHP was the divergent side. This change brings PHP in line with that contract rather than introducing new behavior.

Approach

An opt-in preserve_empty_object_attributes parse option:

  • WP_Block_Parser::parse() accepts an $options array. When the option is set, attribute JSON is decoded with json_decode( ..., false ) and each object-originated value is converted to an array tagged with an internal marker (WP_Block_Parser::OBJECT_ATTRIBUTE_MARKER). The top-level attribute container is intentionally not tagged, preserving the existing behavior of dropping empty top-level attributes.
  • serialize_block_attributes() calls the new wp_restore_block_attribute_object_types(), which strips the marker and re-casts tagged arrays back to objects so wp_json_encode() emits {}/{…} for objects and []/[…] for arrays.
  • The two internal round-trip callers (Block Hooks and KSES) opt in. The default parse_blocks() path is unchanged, so render output is byte-for-byte identical.

Design decisions / trade-offs

The intent here is to be explicit about why it is shaped this way, since each choice trades one risk for another.

Opt-in option vs. changing the default. Chosen: opt-in.

  • Pro: the default parse_blocks() return shape is byte-for-byte unchanged, so the entire ecosystem that relies on the documented "all attributes are arrays" contract (since 4.0.0) is unaffected. Only the two internal callers that actually round-trip stored markup opt in.
  • Con: there are now two behaviors to reason about, and the fix is scoped to the round-trip paths rather than being universally correct everywhere block attributes are parsed.
  • Rejected alternative — making preservation the default: would broadly change return semantics and risk breaking code that treats an empty object attribute as [].

In-band marker vs. returning stdClass vs. out-of-band tracking. Chosen: in-band sentinel key.

  • Returning real stdClass for JSON objects was rejected: it breaks the 4.0.0 "returns arrays not objects" contract; essentially every $block['attrs'][...] access in core and plugins assumes arrays, and would need ->prop access instead.
  • An out-of-band type map (a parallel structure describing which nodes were objects) was rejected as disproportionately complex to thread through parse → traverse → serialize for a flat nested-array return type.
  • The sentinel key keeps the all-arrays shape and recovers object-ness only at the serialize boundary. The cost is a leaky abstraction: while the option is in effect, the parsed structure is only "clean" after restoration, so code that inspects attributes mid-round-trip (e.g. a hooked_block callback) can observe the marker on nested object attributes. This is mitigated by the obscure key name, by tagging only nested values (never the attribute root), by stripping automatically inside serialize_block_attributes(), and by limiting opt-in to two internal callers.

Where the marker is stripped. Chosen: in serialize_block_attributes(), the single chokepoint all block-markup serialization flows through.

  • Pro: guarantees the marker can never reach persisted markup or output, and makes the round-trip idempotent — markers never reach markup, so re-parsing already-clean markup simply re-derives them.
  • Con: serialize_block_attributes() now performs an unconditional recursive walk of the attribute array on every call, including the default (no-marker) path. This is bounded to serialization contexts (save, REST, and the round-trip paths) and does not touch the front-end render_block() path, but it is added work — most visible for blocks with large, deeply nested attribute payloads.

Top-level empty objects still collapse. A block written as <!-- wp:x {} /--> still serializes to <!-- wp:x /-->, because the attribute root is deliberately not tagged. This preserves the historical "drop empty attributes" behavior and avoids churn; the trade-off is that the fix targets nested empty objects (the reported case) rather than a top-level empty-object attribute set, which is semantically equivalent to having no attributes anyway.

Marker key collision. wp_restore_block_attribute_object_types() runs on every serialize_block_attributes() call, not only on object-preserving parses, so the collision surface is the global serialize path. To keep that safe, the marker is only honored when it carries the exact boolean the parser sets (true === $value[ OBJECT_ATTRIBUTE_MARKER ]); a genuine attribute that merely shares the reserved key name with any other value is left intact. The only residual collision is an attribute literally named __wpBlockAttributeIsObject whose value is boolean true, which is treated as effectively reserved. This was preferred over escaping every key, which would add cost and complexity to a hot path.

Compatibility / ramifications

  • Consumers of the serialized markup get the correct {}. This is strictly more correct: it reduces editor block-validation invalidations (the editor's JS parser already expects the object) and fixes type: object attributes that previously arrived as [] and failed schema validation in render callbacks. The only code that could be negatively surprised is code that had adapted to the [] corruption — which was already inconsistent with the JS side.
  • Consumers of the parsed array within the round-trip window (third-party hooked_block / hooked_block_types / block-visitor callbacks, and KSES filtering) may observe the marker on nested object attributes. The contract is documented on parse_blocks() and wp_restore_block_attribute_object_types(): attributes parsed with the option must be passed through restoration before being read, iterated, compared, or re-encoded. It has been verified that core's own callbacks use targeted key access and are unaffected, and that the marker never leaks into _wp_ignored_hooked_blocks post meta or the REST response (those encode lists of block-type names, not attribute objects).
  • Custom block parsers. A plugin that replaces the parser via the block_parser_class filter and whose parse() accepts only $document will not error (PHP ignores the extra positional argument), but it silently ignores preserve_empty_object_attributes, so the Block Hooks / KSES paths fall back to the historical collapsing behavior under that parser. This is accepted graceful degradation.

Testing

  • serialize.php
    • test_serialize_identity_with_preserved_object_types: round-trip identity for empty objects, empty arrays, nested mixes, the reported attribute shape, arrays of empty objects, and numeric-string-keyed objects.
    • test_serialize_collapses_empty_object_without_option: back-compat lock — the default parse path still collapses {} to [].
    • test_restore_object_types_is_noop_without_marker / test_restore_object_types_recasts_tagged_arrays: unit coverage of wp_restore_block_attribute_object_types().
    • test_restore_object_types_ignores_marker_key_with_non_true_value: a genuine attribute sharing the marker key name (non-true value) survives the default serialize path.
    • test_serialize_drops_top_level_empty_object_even_with_option: locks in the intentional scoping — a top-level empty-object attribute set still serializes away even with the option enabled.
    • test_filter_block_content_preserves_empty_object_attributes: the KSES path preserves {} and never leaks the marker.
  • wpBlockParser.php
    • parser-level marker placement (nested objects tagged; arrays and the attribute root untagged), the default path adding no marker, and null attributes on invalid JSON.
    • test_parse_tolerates_non_array_options: a non-array $options is normalized to no options rather than erroring.
  • applyBlockHooksToContent.php
    • test_apply_block_hooks_to_content_preserves_empty_object_attributes: the Block Hooks path preserves {} while a hooked-block callback runs over the marked attributes, and never leaks the marker.

Full --group blocks suite: 1054 tests / 3180 assertions pass (no regressions). PHPCS clean (WordPress standard) on all changed files.

Trac ticket: https://core.trac.wordpress.org/ticket/63325

margolisj added 4 commits May 30, 2026 17:01
Block attribute JSON objects (`{}`) and arrays (`[]`) both decode to PHP
arrays via `json_decode( ..., true )`, so an empty object would serialize
back out as `[]`, corrupting markup on round-trip paths such as Block Hooks
and KSES filtering.

Add an opt-in `preserve_empty_object_attributes` parse option that tags
object-originated arrays with a hidden marker during parsing, then restore
them to objects in `serialize_block_attributes()` via the new
`wp_restore_block_attribute_object_types()`. The default parse path is
unchanged, so render output is byte-for-byte identical.

See #63325.
Document the contract of the `preserve_empty_object_attributes` parse option:
attributes tagged with WP_Block_Parser::OBJECT_ATTRIBUTE_MARKER are not plain
data and must be passed through serialize_block_attributes() or
wp_restore_block_attribute_object_types() before being read or re-encoded.
Tighten the DocBlocks on parse_blocks(), WP_Block_Parser::parse(),
serialize_block_attributes(), and wp_restore_block_attribute_object_types().

Add coverage for:
- the default parse path still collapsing `{}` to `[]` (back-compat lock);
- wp_restore_block_attribute_object_types() no-op vs. re-cast behavior;
- the marker not leaking through the KSES (filter_block_content) and Block
  Hooks (apply_block_hooks_to_content) round-trip paths;
- parser-level marker placement (nested objects tagged, arrays and the
  attribute root untagged) and null attributes on invalid JSON.

See #63325.
Correct the @SInCE tags from 6.10.0 to 7.1.0 to match the current release
cycle (trunk is 7.1-alpha).

Guard wp_restore_block_attribute_object_types() so it only treats the marker
key as its own when it carries the exact boolean the parser sets. Because the
restore runs on every serialize_block_attributes() call (not just on
object-preserving parses), this prevents a genuine attribute that merely shares
the reserved key name from being silently dropped on the default serialize
path. Add a test locking in that behavior.

See #63325.
Normalize a non-array $options argument to WP_Block_Parser::parse() to an
empty array, so a misuse falls back to default behavior instead of erroring
on offset access at a public entry point.

Add tests locking in two documented behaviors: a top-level empty object
attribute set still serializes away (the attribute root is never tagged) even
with object preservation enabled, and a non-array $options is tolerated.

See #63325.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant