Skip to content

fix: incorrect mask with additionalProperties:true and non-required properties#595

Open
ianliuy wants to merge 2 commits into
mlc-ai:mainfrom
ianliuy:fix/issue-208
Open

fix: incorrect mask with additionalProperties:true and non-required properties#595
ianliuy wants to merge 2 commits into
mlc-ai:mainfrom
ianliuy:fix/issue-208

Conversation

@ianliuy
Copy link
Copy Markdown
Contributor

@ianliuy ianliuy commented Apr 26, 2026

What's broken?

When a JSON schema has additionalProperties: true with properties that are not in the required array, xgrammar accepts values with the wrong type for those defined properties. For example, {"a": "wrong"} is accepted even though "a" is defined as integer.

Who is affected?

Anyone using additionalProperties: true (or unevaluatedProperties: true) with non-required typed properties. This includes the common pattern of optional typed fields with free-form extra data. Schemas with all properties required or additionalProperties: false are NOT affected.

When does it trigger?

Reproducible 100% of the time with:

{
    "type": "object",
    "properties": { "a": {"type": "integer"} },
    "additionalProperties": true,
    "required": []
}

Input {"a": "wrong"} is incorrectly accepted.

Root Cause

GetKeyPattern() returns basic_string, which matches any JSON string key -- including defined property names like "a". When the grammar adds additional_prop_pattern as an alternative, the parser tracks both paths simultaneously:

  1. Typed path: "a" : integer_rule -- correctly constrains value type
  2. Catch-all path: basic_string : any_rule -- accepts any value type

The token bitmask is the union of valid tokens across all active paths, so the catch-all's any_rule effectively overrides the typed constraint.

Fix

Replace GetKeyPattern() with GetKeyPatternExcluding(properties, rule_name) at 3 call sites in GetPartialRuleForProperties(). The new method generates a trie-based EBNF key pattern that matches any JSON string key except defined property names.

For excluded key "a", the generated pattern:

  • Accepts "", "b", "ab" (not the excluded key)
  • Rejects "a" (must use typed rule instead)

Testing

Added test_additional_properties_type_enforcement with 7 test cases:

  1. Wrong type + empty required -> rejected (core bug fix)
  2. Correct type -> accepted
  3. Unknown key (truly additional) -> accepted
  4. Defined + additional together -> accepted
  5. Partial required, wrong type on non-required -> rejected
  6. All correct types -> accepted
  7. Empty object -> accepted

Fixes #208

cc @Ubospica @Seven-Streams

Generate a trie-based EBNF key pattern that excludes defined property
names from the additional/unevaluated property matching. This prevents
additional properties from matching defined property names, which could
bypass type constraints.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Yiyang Liu <yiyangliu+microsoft@microsoft.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a trie-based mechanism to exclude defined property names from the 'additional properties' pattern, ensuring that defined properties adhere to their type constraints even when additional properties are allowed. Review feedback identified critical compilation errors due to illegal access of protected class members from an anonymous namespace and suggested escaping non-printable characters in the generated EBNF. Furthermore, a significant logic issue was raised regarding the potential bypass of property exclusion through JSON escape sequences in keys, which the current implementation does not handle.

Comment on lines +1257 to +1302
std::string BuildTrieBody(const TrieNode& node) {
std::string result;
bool first = true;
auto add = [&](const std::string& s) {
if (!first) result += " | ";
first = false;
result += s;
};

// 1. Close quote - only if no excluded key ends here
if (!node.is_terminal) {
add("\"\\\"\"");
}

// 2. Negated char class - excludes edge chars + JSON specials
std::string neg = "[^";
for (const auto& [c, _] : node.children) {
if (c == ']' || c == '\\' || c == '^' || c == '-') {
neg += "\\";
}
neg += c;
}
neg += "\\0-\\x1f\\\"\\\\\\r\\n]";
add(neg + " " + JSONSchemaConverter::kBasicStringSub);

// 3. Escape sequence
add("\"\\\\\" " + JSONSchemaConverter::kBasicEscape + " " +
JSONSchemaConverter::kBasicStringSub);

// 4. Trie edges - recurse
for (const auto& [c, child] : node.children) {
std::string child_body = BuildTrieBody(child);
std::string char_lit = "\"";
if (c == '"') {
char_lit += "\\\"";
} else if (c == '\\') {
char_lit += "\\\\";
} else {
char_lit += c;
}
char_lit += "\"";
add(char_lit + " " + child_body);
}

return "(" + result + ")";
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The BuildTrieBody function in the anonymous namespace cannot access the protected static members kBasicStringSub and kBasicEscape of JSONSchemaConverter. This will cause a compilation error. Additionally, non-printable characters in property names (like \n, \r, \t) should be escaped when building EBNF literals to avoid breaking the generated grammar script.

std::string BuildTrieBody(
    const TrieNode& node,
    const std::string& basic_string_sub,
    const std::string& basic_escape
) {
  std::string result;
  bool first = true;
  auto add = [&](const std::string& s) {
    if (!first) result += " | ";
    first = false;
    result += s;
  };

  // 1. Close quote - only if no excluded key ends here
  if (!node.is_terminal) {
    add("\"\\\"\"");
  }

  // 2. Negated char class - excludes edge chars + JSON specials
  std::string neg = "[^";
  for (const auto& [c, _] : node.children) {
    if (c == ']' || c == '\\' || c == '^' || c == '-') {
      neg += "\\";
    }
    neg += c;
  }
  neg += "\\0-\\x1f\\\"\\\\\\r\\n]";
  add(neg + " " + basic_string_sub);

  // 3. Escape sequence
  add("\"\\\\\" " + basic_escape + " " + basic_string_sub);

  // 4. Trie edges - recurse
  for (const auto& [c, child] : node.children) {
    std::string child_body = BuildTrieBody(child, basic_string_sub, basic_escape);
    std::string char_lit = "\"";
    if (c == '"') {
      char_lit += "\\\"";
    } else if (c == '\\') {
      char_lit += "\\\\";
    } else if (c == '\n') {
      char_lit += "\\n";
    } else if (c == '\r') {
      char_lit += "\\r";
    } else if (c == '\t') {
      char_lit += "\\t";
    } else {
      char_lit += c;
    }
    char_lit += "\"";
    add(char_lit + " " + child_body);
  }

  return "(" + result + ")";
}

}

// Generate EBNF body
std::string inner = BuildTrieBody(root);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Update the call to BuildTrieBody to pass the required constants.

Suggested change
std::string inner = BuildTrieBody(root);
std::string inner = BuildTrieBody(root, kBasicStringSub, kBasicEscape);

Comment on lines +1306 to +1330
std::string JSONSchemaConverter::GetKeyPatternExcluding(
const std::vector<ObjectSpec::Property>& properties,
const std::string& rule_name
) {
if (properties.empty()) {
return GetKeyPattern();
}

// Build trie from property names
TrieNode root;
for (const auto& prop : properties) {
TrieNode* cur = &root;
for (char c : prop.name) {
cur = &cur->children[c];
}
cur->is_terminal = true;
}

// Generate EBNF body
std::string inner = BuildTrieBody(root);
std::string ws = GetWhitespacePattern();
std::string body = "[\"] (" + inner + ") (= " + ws + " [,}\\]:])";

return ebnf_script_creator_.AddRule(rule_name + "_addl_key", body);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The trie-based exclusion logic matches literal characters from the property names. However, JSON allows keys to be represented using escape sequences (e.g., "\u0061" instead of "a"). The current implementation transitions to the catch-all kBasicStringSub rule as soon as a backslash is encountered (Path 3 in BuildTrieBody), which means that any excluded property name can be bypassed by using escape sequences in the JSON input. This would re-introduce the original issue where the wrong type is accepted for a defined property.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an xgrammar JSON-schema conversion bug where additionalProperties: true (and unevaluatedProperties: true) could bypass type enforcement for defined but non-required properties by introducing an “additional key” pattern that excludes known property names.

Changes:

  • Add GetKeyPatternExcluding() to generate a trie-based key pattern that matches any JSON string key except the schema’s declared property names.
  • Use the excluding key pattern for additional/unevaluated property alternatives inside GetPartialRuleForProperties().
  • Add a Python regression test covering type enforcement with additionalProperties: true and optional typed properties.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/python/test_json_schema_converter.py Adds regression coverage for #208 around optional typed properties with additionalProperties: true.
cpp/json_schema_converter.h Declares a new helper for generating an excluding key pattern.
cpp/json_schema_converter.cc Implements trie-based excluding key rule generation and swaps it into additional-property handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +321 to +322
/*! \brief Get a key pattern that excludes specific property names. */
std::string GetKeyPatternExcluding(
Comment on lines 1492 to 1556
continue
if num_properties < lower_bound or num_properties > upper_bound:
is_accepted = False
else:
is_accepted = (
all(prop in instance for prop in required_properties)
and not have_additional
)
check_schema_with_instance(
schema, instance, is_accepted=is_accepted, any_whitespace=False
)

# Case 2.2. additional properties allowed
for required_properties in required_properties_instance:
schema = {
**base_schema,
"required": required_properties,
"additionalProperties": {"type": "string"},
}
for instance, num_properties, have_additional in instances:
is_accepted = all(prop in instance for prop in required_properties)
check_schema_with_instance(
schema, instance, is_accepted=is_accepted, any_whitespace=False
)

for lower_bound in range(8):
schema = {
**base_schema,
"minProperties": lower_bound,
"required": required_properties,
"additionalProperties": {"type": "string"},
}
for instance, num_properties, have_additional in instances:
if lower_bound > len(base_schema["properties"]):
continue
if num_properties < lower_bound:
is_accepted = False
else:
is_accepted = all(prop in instance for prop in required_properties)
check_schema_with_instance(
schema, instance, is_accepted=is_accepted, any_whitespace=False
)

for upper_bound in range(lower_bound, 8):
schema = {
**base_schema,
"minProperties": lower_bound,
"maxProperties": upper_bound,
"required": required_properties,
"additionalProperties": {"type": "string"},
}
for instance, num_properties, have_additional in instances:
if lower_bound > len(base_schema["properties"]) or upper_bound < len(
required_properties
):
if lower_bound > len(
base_schema["properties"]
) or upper_bound < len(required_properties):
continue
if num_properties < lower_bound or num_properties > upper_bound:
is_accepted = False
else:
is_accepted = all(prop in instance for prop in required_properties)
is_accepted = all(
prop in instance for prop in required_properties
)
check_schema_with_instance(
schema, instance, is_accepted=is_accepted, any_whitespace=False
)
Signed-off-by: Linzhang Li <yuchuan.7streams@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

incorrect mask with additionalProperties:true and non-required properties

3 participants