|
| 1 | +--- |
| 2 | +title: "Better Schema-Compatible Data Generation" |
| 3 | +date: 2026-02-08 09:00:00 +1200 |
| 4 | +tags: [project, support] |
| 5 | +toc: true |
| 6 | +pin: false |
| 7 | +--- |
| 8 | + |
| 9 | +_JsonSchema.Net.DataGeneration_ has received some significant upgrades. In this post, I'll go over what's changed, and how you can use this package to enhance your schema development workflow. |
| 10 | + |
| 11 | +## New and improved! |
| 12 | + |
| 13 | +<!-- Replaced Fare with internal regex value generation to significantly improve regex support |
| 14 | +Improved conditional support |
| 15 | +Added propertyNames support |
| 16 | +Increased test coverage to find and fix bugs |
| 17 | +Added generation failure error reporting --> |
| 18 | + |
| 19 | +First let's cover the small stuff. |
| 20 | + |
| 21 | +I added a bunch of tests that identified a few bugs, and added support for `propertyNames`. |
| 22 | + |
| 23 | +There is also added support for the `allof`/`if`/`then` pattern [Jason Desrosiers](https://github.com/jdesrosiers) came up with to implement the OpenAPI `discriminator` keyword. (You can see this pattern in action in Jason's [excellent post](https://json-schema.org/blog/posts/validating-openapi-and-json-schema#validating) on the JSON Schema blog.) |
| 24 | + |
| 25 | +### Regex improvements |
| 26 | + |
| 27 | +In previous versions, generation of strings that matched regular expressions was performed by the [Fare](https://github.com/moodmosaic/Fare) library. While great, this library does lack some important features specific to this kind of generation. |
| 28 | + |
| 29 | +When building strings that match JSON Schema requirements, different branches of a schema could have different requirements of the same instance. This means that in order to get Fare to work right, the library has to create composite expressions, and often those composite expressions weren't supported by Fare. |
| 30 | + |
| 31 | +This led me to drop Fare and implement my own regular expression support that can handle the unique requirements I needed. |
| 32 | + |
| 33 | +> While I have been impressed with the latest state of AI coding, I still don't fully trust it. That said, I will admit that a large part of this new regular expression support was AI-generated, but it is also heavily tested, so I'm confident that it works for the application. I'm not sure of the limits, though. If you find them, please open an issue. |
| 34 | +{: .prompt-info } |
| 35 | + |
| 36 | +The new implementation incorporates other keywords, like `minLength`, into the regular expression requirements, and even supports anti-requirements, like a `pattern` keyword inside of a `not` keyword. |
| 37 | + |
| 38 | +### Error reporting |
| 39 | + |
| 40 | +I think this is the coolest addition to this library. When data generation fails, now it tell you why! |
| 41 | + |
| 42 | +The generation results error message is now descriptive of the error that occurred, and there are you properties that give information about where in the problem occurred: |
| 43 | + |
| 44 | +- `Location` gives you where in the instance the generation failed. |
| 45 | +- `SchemaLocations` gives you where in the schema the error occured. |
| 46 | + |
| 47 | +Generally a failure to generate data is the result of either a conflict in the schema |
| 48 | + |
| 49 | +```json |
| 50 | +{ |
| 51 | + "allOf": [ |
| 52 | + { "type": "string" }, |
| 53 | + { "type": "number" } |
| 54 | + ] |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +or a feature just isn't supported. |
| 59 | + |
| 60 | +The nice thing is that they're all reported now. |
| 61 | + |
| 62 | +## Why use data generation? |
| 63 | + |
| 64 | +While there are likely many use cases for data generation, the most helpful application in my mind is testing your schemas. Being able to see what kinds of data your schemas allow enables you to find gaps that can allow invalid data into your systems. |
| 65 | + |
| 66 | +### A very real failure mode |
| 67 | + |
| 68 | +Say you're building a user registration endpoint. You write a JSON Schema for the request body, wire it up with _JsonSchema.Api_ to support automatic request validation, and ship it. The schema looks like this: |
| 69 | + |
| 70 | +```json |
| 71 | +{ |
| 72 | + "type": "object", |
| 73 | + "properties": { |
| 74 | + "name": { "type": "string" }, |
| 75 | + "email": { "type": "string" }, |
| 76 | + "age": { "type": "integer" } |
| 77 | + }, |
| 78 | + "required": ["name", "email", "age"] |
| 79 | +} |
| 80 | +``` |
| 81 | + |
| 82 | +A client hits the endpoint and passes this: |
| 83 | + |
| 84 | +```json |
| 85 | +{ |
| 86 | + "name": "", |
| 87 | + "email": "x", |
| 88 | + "age": -5847, |
| 89 | + "password": "hunter2", |
| 90 | + "admin": true |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +Schema validation passes and the request comes through into your controller. But tThat payload has an empty name, an invalid email, a nonsensical age, and extra properties that your endpoint never asked for. If any of that data gets trusted downstream, you now have a production issue caused by a "valid" request. |
| 95 | + |
| 96 | +The schema is doing what it was told. The problem is that it doesn't yet express what you meant. |
| 97 | + |
| 98 | +So you go back and tighten things up: |
| 99 | + |
| 100 | +```json |
| 101 | +{ |
| 102 | + "type": "object", |
| 103 | + "properties": { |
| 104 | + "name": { "type": "string", "minLength": 1 }, |
| 105 | + "email": { "type": "string", "format": "email" }, |
| 106 | + "age": { "type": "integer", "minimum": 0, "maximum": 150 } |
| 107 | + }, |
| 108 | + "required": ["name", "email", "age"], |
| 109 | + "additionalProperties": false |
| 110 | +} |
| 111 | +``` |
| 112 | + |
| 113 | +Now that same request gets rejected immediately. |
| 114 | + |
| 115 | +This is where generation helps. Instead of trying to invent every weird edge case yourself, you generate samples that are valid for your schema and inspect them. If the samples include data your API can't safely handle, the schema needs more constraints. |
| 116 | + |
| 117 | +The new error reporting helps here, too. If you've created conflicting constraints (for example in an `allOf`) and generation can't produce data, it tells you where and why it failed, helping you to identify and resolve the problem. |
| 118 | + |
| 119 | +## Wrapping up |
| 120 | + |
| 121 | +Most of these updates came from real use: writing schemas, finding edge cases, adding tests, and fixing what those tests exposed. |
| 122 | + |
| 123 | +If you're already using this package, updating should give you better output and much better diagnostics when something goes wrong. If you haven't used it yet, this release is a solid place to start. |
| 124 | + |
| 125 | +_If you aren't generating revenue, you like the work I put out, and you would still like to support the project, please consider [becoming a sponsor](https://github.com/sponsors/gregsdennis)!_ |
0 commit comments