Skip to content

Commit 47d419b

Browse files
committed
add post about datagen improvements
1 parent 755841d commit 47d419b

2 files changed

Lines changed: 126 additions & 1 deletion

File tree

_posts/2026/2026-02-14-schemagen-aot.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ The last thing to consider was configurability. There are two deciding factors
4949
- What can be defined at compile-time?
5050
- How does this interact with runtime configuration?
5151

52-
For the first, I could only include options that could be defined on the `[GenerateJsonSchema]` attribute. So it now has optional parameters that you can set to configure property naming, property order, and whether conditionals produce strict property requirements. I wouldn't be able to support custom generators or refiners, or set up a schema registry to support external references.
52+
For the first, I could only include options that could be defined on the `[GenerateJsonSchema]` attribute. So it now has optional parameters that you can set to configure property naming, property order, and whether conditionals produce strict property requirements. I wouldn't be able to support custom generators or refiners, or set up a schema registry to support external references. Maybe I'll figure these things out in a future release, but for now, they're unsupported.
5353

5454
For the second, I had to concede that since the configuration needs to be available at compile time (when the schemas are generated), any runtime configuration just wouldn't be applicable.
5555

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
---
2+
title: "Better Schema-Compatible Data Generation"
3+
date: 2026-02-08 09:00:00 +1200
4+
tags: [project, support]
5+
toc: true
6+
pin: false
7+
---
8+
9+
_JsonSchema.Net.DataGeneration_ has received some significant upgrades. In this post, I'll go over what's changed, and how you can use this package to enhance your schema development workflow.
10+
11+
## New and improved!
12+
13+
<!-- Replaced Fare with internal regex value generation to significantly improve regex support
14+
Improved conditional support
15+
Added propertyNames support
16+
Increased test coverage to find and fix bugs
17+
Added generation failure error reporting -->
18+
19+
First let's cover the small stuff.
20+
21+
I added a bunch of tests that identified a few bugs, and added support for `propertyNames`.
22+
23+
There is also added support for the `allof`/`if`/`then` pattern [Jason Desrosiers](https://github.com/jdesrosiers) came up with to implement the OpenAPI `discriminator` keyword. (You can see this pattern in action in Jason's [excellent post](https://json-schema.org/blog/posts/validating-openapi-and-json-schema#validating) on the JSON Schema blog.)
24+
25+
### Regex improvements
26+
27+
In previous versions, generation of strings that matched regular expressions was performed by the [Fare](https://github.com/moodmosaic/Fare) library. While great, this library does lack some important features specific to this kind of generation.
28+
29+
When building strings that match JSON Schema requirements, different branches of a schema could have different requirements of the same instance. This means that in order to get Fare to work right, the library has to create composite expressions, and often those composite expressions weren't supported by Fare.
30+
31+
This led me to drop Fare and implement my own regular expression support that can handle the unique requirements I needed.
32+
33+
> While I have been impressed with the latest state of AI coding, I still don't fully trust it. That said, I will admit that a large part of this new regular expression support was AI-generated, but it is also heavily tested, so I'm confident that it works for the application. I'm not sure of the limits, though. If you find them, please open an issue.
34+
{: .prompt-info }
35+
36+
The new implementation incorporates other keywords, like `minLength`, into the regular expression requirements, and even supports anti-requirements, like a `pattern` keyword inside of a `not` keyword.
37+
38+
### Error reporting
39+
40+
I think this is the coolest addition to this library. When data generation fails, now it tell you why!
41+
42+
The generation results error message is now descriptive of the error that occurred, and there are you properties that give information about where in the problem occurred:
43+
44+
- `Location` gives you where in the instance the generation failed.
45+
- `SchemaLocations` gives you where in the schema the error occured.
46+
47+
Generally a failure to generate data is the result of either a conflict in the schema
48+
49+
```json
50+
{
51+
"allOf": [
52+
{ "type": "string" },
53+
{ "type": "number" }
54+
]
55+
}
56+
```
57+
58+
or a feature just isn't supported.
59+
60+
The nice thing is that they're all reported now.
61+
62+
## Why use data generation?
63+
64+
While there are likely many use cases for data generation, the most helpful application in my mind is testing your schemas. Being able to see what kinds of data your schemas allow enables you to find gaps that can allow invalid data into your systems.
65+
66+
### A very real failure mode
67+
68+
Say you're building a user registration endpoint. You write a JSON Schema for the request body, wire it up with _JsonSchema.Api_ to support automatic request validation, and ship it. The schema looks like this:
69+
70+
```json
71+
{
72+
"type": "object",
73+
"properties": {
74+
"name": { "type": "string" },
75+
"email": { "type": "string" },
76+
"age": { "type": "integer" }
77+
},
78+
"required": ["name", "email", "age"]
79+
}
80+
```
81+
82+
A client hits the endpoint and passes this:
83+
84+
```json
85+
{
86+
"name": "",
87+
"email": "x",
88+
"age": -5847,
89+
"password": "hunter2",
90+
"admin": true
91+
}
92+
```
93+
94+
Schema validation passes and the request comes through into your controller. But tThat payload has an empty name, an invalid email, a nonsensical age, and extra properties that your endpoint never asked for. If any of that data gets trusted downstream, you now have a production issue caused by a "valid" request.
95+
96+
The schema is doing what it was told. The problem is that it doesn't yet express what you meant.
97+
98+
So you go back and tighten things up:
99+
100+
```json
101+
{
102+
"type": "object",
103+
"properties": {
104+
"name": { "type": "string", "minLength": 1 },
105+
"email": { "type": "string", "format": "email" },
106+
"age": { "type": "integer", "minimum": 0, "maximum": 150 }
107+
},
108+
"required": ["name", "email", "age"],
109+
"additionalProperties": false
110+
}
111+
```
112+
113+
Now that same request gets rejected immediately.
114+
115+
This is where generation helps. Instead of trying to invent every weird edge case yourself, you generate samples that are valid for your schema and inspect them. If the samples include data your API can't safely handle, the schema needs more constraints.
116+
117+
The new error reporting helps here, too. If you've created conflicting constraints (for example in an `allOf`) and generation can't produce data, it tells you where and why it failed, helping you to identify and resolve the problem.
118+
119+
## Wrapping up
120+
121+
Most of these updates came from real use: writing schemas, finding edge cases, adding tests, and fixing what those tests exposed.
122+
123+
If you're already using this package, updating should give you better output and much better diagnostics when something goes wrong. If you haven't used it yet, this release is a solid place to start.
124+
125+
_If you aren't generating revenue, you like the work I put out, and you would still like to support the project, please consider [becoming a sponsor](https://github.com/sponsors/gregsdennis)!_

0 commit comments

Comments
 (0)