Skip to content

fix(json-schema): fix bugs, add missing conversions, and improve parity#3109

Open
AbdelrahmanHafez wants to merge 39 commits into
hapijs:masterfrom
AbdelrahmanHafez:fix/json-schema-bugs-and-missing-conversions
Open

fix(json-schema): fix bugs, add missing conversions, and improve parity#3109
AbdelrahmanHafez wants to merge 39 commits into
hapijs:masterfrom
AbdelrahmanHafez:fix/json-schema-bugs-and-missing-conversions

Conversation

@AbdelrahmanHafez
Copy link
Copy Markdown

@AbdelrahmanHafez AbdelrahmanHafez commented Apr 13, 2026

re #3108

This PR addresses the bugs and missing conversions reported in #3108, plus several additional gaps I found while working through the codebase.

  • alternatives.match('all') was mapped to anyOf instead of allOf
  • Rule-level jsonSchema handlers were called even when rule args contained refs, which could produce invalid schemas (e.g. string.min(Joi.ref('x')) throwing instead of being gracefully skipped)

Missing conversions added:

  • number.precision() -> multipleOf
  • string.alphanum() -> pattern: '^[a-zA-Z0-9]+$'
  • string.token(), string.hex(), string.base64(), and string.dataUri() -> validating pattern output instead of non-standard format values, with option-aware parity for hex() / base64() / dataUri()
  • string.ip() -> full regex pattern matching @hapi/address behavior, including CIDR/version options and IPvFuture literals
  • string.hostname() -> hostname-or-IP pattern matching Joi runtime instead of bare format: 'hostname'
  • string.domain() -> full regex pattern matching @hapi/address behavior, with support for all domain options (minDomainSegments, maxDomainSegments, allowUnderscore, allowFullyQualified, allowUnicode, tlds allow/deny). Uses pattern instead of format: 'hostname' since hostname accepts single-label names while Joi's domain() requires at least 2 segments. This now also covers astral Unicode labels and punycode-backed Unicode TLD display variants that round-trip to the same canonical ASCII TLD.
  • date.timestamp('javascript') and date.timestamp('unix') -> type: 'number' with ECMA-262 §21.4.1.1 range bounds (±100 million days)
  • binary encoding flag -> contentEncoding, with standard transfer-encoding mapping (hex -> base16, base64 / base64url preserved, charset-style Node encodings omitted)
  • invalid() values -> not: { enum: [...] }, composable via allOf when combined with other not-based constraints, including invalid(null) when the schema could otherwise accept null
  • .example() values -> examples array
  • .meta() -> supported JSON Schema annotation keywords (title, format, readOnly, writeOnly, deprecated, examples, $comment, contentEncoding, contentMediaType, contentSchema), with deduplication when meta examples overlap with .example() values

Object dependencies, all 7 dependency types are now converted:

  • with() -> dependentRequired
  • without() -> dependentSchemas with properties: { peer: false }
  • and() -> bidirectional dependentRequired (each peer requires all others)
  • nand() -> not: { properties: { ...peers: true }, required: [...peers] }
  • or() -> anyOf with required per peer
  • xor() -> oneOf with required per peer
  • oxor() -> oneOf with a "none present" branch + one branch per peer

Multiple dependencies of the same type compose via allOf. All representations are AJV strict mode compatible (strictRequired, strictTypes).

Preferences support, schema-level preferences (via .prefs()) now propagate to JSON Schema output:

  • presence: 'required' -> marks all properties as required
  • presence: 'forbidden' -> root schema emits false; nested property presence uses the child schema's effective prefs so nested forbidden-presence is preserved correctly
  • allowUnknown: true -> omits additionalProperties: false
  • stripUnknown: true -> emits additionalProperties: false in output mode only (input mode accepts unknowns, output mode has them stripped). Correctly handles stripUnknown: { arrays: true } without affecting object properties.
  • noDefaults: true -> suppresses required marking for properties with defaults in output mode

Other fixes:

  • Declared forbidden keys are represented as false property schemas, so they stay forbidden even when unknown keys are otherwise allowed
  • Non-exclusive allow() / valid() exceptions now emit enum branches whenever the base schema would otherwise reject an explicitly allowed value, including same-type conflicts like string().min(5).allow('abc') and object().min(1).allow({})
  • Pattern-emitting string rules now compose via allOf instead of overwriting each other, so combinations like .pattern(/foo/).hostname() preserve every active constraint
  • strip result flag -> output schema uses false property schemas for stripped declared keys, while still keeping $defs intact for linked child schemas
  • Joi.link('#id') without .shared() now correctly registers the linked schema in $defs (previously produced broken $ref pointing to nonexistent $defs entry)
  • eligible child when() conditionals inside Joi.object({...}) now hoist to object-level if / then / else (or allOf for multiple conditionals), which preserves cross-field linkage for literal sibling refs, simple object paths like settings.mode, and literal switch branches on hoistable ref paths instead of widening them to child-local anyOf; more complex shapes such as schema conditions, non-literal switch predicates, adjusted/mapped refs, and fixed array-index refs still intentionally fall back to the lossy anyOf approximation for now
  • Custom types created via Joi.extend() now inherit the base JSON Schema type, so renamed built-in types like base: Joi.string() / number() / array() / object() still emit the correct standard JSON Schema type, including when nested inside other schemas or combined with prefs
  • Exclusive valid() / only() output now preserves Joi semantics more closely: null stays in exclusive enums, conflicting base validators are only retained when every allowed value still satisfies them, and mixed enums that include objects fall back to enum-only output instead of emitting unsound type constraints
  • Joi.date() valid() / allow() / invalid() values now emit canonical JSON-native enum values (date-time strings, millisecond timestamps, or unix timestamps depending on format), which keeps the emitted schema portable and aligned with Joi's accepted date inputs
  • array().ordered(...) now emits minItems from the last explicitly required ordered position instead of the total ordered length, preserving Joi's optional ordered-slot semantics while still capping tuple-only arrays with maxItems
  • Follow-up cleanup removed the temporary coverage exclusions by adding regression tests for the reachable composite paths and simplifying dead defensive branches instead

All JSON Schema output is covered by tests against AJV with Draft 2020-12. Standard JSON Schema formats are exercised through ajv-formats, while explicit custom test-only formats still go through the helper's custom format allowlist. Most schemas are validated with strict mode enabled; optional ordered() tuple positions intentionally use strictTuples: false, since they are valid JSON Schema but AJV's strict tuple lint expects fully required tuples. 100% code coverage maintained.

I realize this might be a lot of changes, but I think the conversions make sense and add support for more complex schemas. Feedback welcome.

One note on philosophy: I tried to keep the emitted JSON Schema as standard as possible while still matching Joi runtime behavior as closely as possible. There are a few places where JSON Schema is inherently a lossy target compared to Joi, so in those cases I preferred the smallest honest approximation over something overly clever. One concrete example is raw Joi.date(), which is more permissive than format: 'date-time' because its string acceptance follows JS date parsing semantics.

I also tried to keep the built-in output standard for the selected target. The one built-in OpenAPI-ism I found while working through this was format: 'binary' on Joi.binary(), which is now gone for the draft-2020-12 target. I did keep x-constraint for date comparisons. It's not standard vocabulary, but it is still valid JSON Schema as an extension keyword, and I think it's worth keeping because it preserves useful Joi semantics that the standard vocab can't express cleanly.

Edit: more follow-up parity work landed after the original description.

  • plain Joi.date() now emits type: ['string', 'number'] with ECMA-262 timestamp bounds, which matches Joi's default acceptance of both ISO-ish strings and JS millisecond timestamps more closely
  • Joi.date().iso() now emits a Joi-derived pattern instead of format: 'date-time', since Joi's ISO acceptance is not identical to RFC 3339 / JSON Schema date-time
  • date default() / example() / meta({ examples / contentSchema / ... }) annotations now canonicalize Date instances into JSON-native values instead of leaking live JS Date objects into the emitted schema
  • Joi.date().default('now') and date function defaults are omitted from JSON Schema output, because they do not have a faithful portable JSON Schema representation
  • non-date schemas no longer leak raw Date instances through valid() / invalid() JSON Schema output either; those values are either canonicalized under Joi.date() or dropped when there is no sound JSON representation
  • Joi.binary() no longer emits the OpenAPI format: 'binary' for the draft-2020-12 target, it stays type: 'string', maps hex to RFC 4648 base16, preserves base64 / base64url, and omits charset-style Node encodings that do not have an honest JSON Schema contentEncoding value
  • the JSON Schema test helper now uses ajv-formats for standard format keywords, so emitted email, uuid, uri, date-time, and duration formats are exercised as real format validators rather than compile-only passthroughs

Open question: @standard-schema/spec defines options.target as required on both jsonSchema.input and jsonSchema.output, but the current runtime silently defaults to 'draft-2020-12' when it is omitted. That splits the contract between TS callers (compile error) and JS callers (silent default). Strict is spec-faithful and guards against silent version pinning if more targets land later; permissive is ergonomic and preserves current behavior. Worth a conscious call either way, happy to tighten it if preferred.

Edit: the stacked cleanup/refactor from #3110 is now folded into this branch as well.

  • JSON Schema conversion helpers were extracted into dedicated modules under lib/json-schema/
  • The refactor is intended to keep the core Joi type files thinner without changing the tested behavior
  • The full suite still passes after the move

…d preferences

- Fix alternatives.match('all') producing anyOf instead of allOf
- Skip rule jsonSchema handlers when args contain Refs
- Handle _invalids as not: { enum: [...] }
- Exclude forbidden keys and stripped keys (output mode) from properties
- Add all object dependency handlers (with, without, and, nand, or, xor, oxor)
- Add string.alphanum pattern and string.domain pattern with options
- Add number.precision as multipleOf
- Add date.timestamp with ECMA-262 range limits
- Register id'd child schemas in $defs for Joi.link() without .shared()
- Respect preferences: allowUnknown, stripUnknown, presence, noDefaults

re hapijs#3108
@AbdelrahmanHafez AbdelrahmanHafez marked this pull request as ready for review April 13, 2026 07:23
@AbdelrahmanHafez
Copy link
Copy Markdown
Author

I'm stress testing this still to find any potential edge cases, will work on it in the upcoming few days. Marking as draft for now, will convert back to ready-for-review when done.

@AbdelrahmanHafez AbdelrahmanHafez marked this pull request as draft April 17, 2026 09:43
@Marsup
Copy link
Copy Markdown
Collaborator

Marsup commented Apr 22, 2026

Just judging by the size of your PR, is your AI leading you astray? That's really a lot of additional code. I'm grateful, but I'm starting to wonder if all of it is vital for a proper json-schema.

@AbdelrahmanHafez AbdelrahmanHafez marked this pull request as ready for review April 22, 2026 17:05
@AbdelrahmanHafez
Copy link
Copy Markdown
Author

Hi @Marsup, agreed the scope grew past what #3108 strictly asked for. I used Codex GPT 5.4 on this, mostly for filling in parity coverage once I started chasing edge cases.

Happy to split this into smaller PRs if that helps review, just let me know which subset you'd want to tackle first (the original #3108 fixes, the refactor into lib/json-schema/, the string pattern derivations, when() hoisting, etc.). Also happy to drop anything you don't want, or for you to push directly to the branch.

For context on the size: I tried to make the JSON Schema input/output behave as closely to Joi's runtime as possible, and documented the per-decision rationales in the PR body. The conversion code is isolated under lib/json-schema/ so it doesn't bloat the core type files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants