Skip to content

Commit 275ae58

Browse files
committed
update schema datagen docs
1 parent 56bd1c9 commit 275ae58

1 file changed

Lines changed: 130 additions & 11 deletions

File tree

_docs/schema/datagen/schema-datagen.md

Lines changed: 130 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -29,20 +29,94 @@ it can generate a JSON document like
2929

3030
Under the covers, the library uses the fabulous [Bogus](https://github.com/bchavez/Bogus) library, which is commonly used to generate random test data, and a few other tricks.
3131

32+
## Use Cases {#schema-datagen-use-cases}
33+
34+
### Schema Debugging {#schema-datagen-debugging}
35+
36+
One of the more practical uses of a data generator is checking whether a schema actually says what you think it says. The generator just follows the rules, so if the output looks wrong, the schema isn't strict enough.
37+
38+
#### Missing `required` {#schema-datagen-debug-required}
39+
40+
Suppose you want a user record that always has a `username`:
41+
42+
```json
43+
{
44+
"type": "object",
45+
"properties": {
46+
"username": { "type": "string" },
47+
"email": { "type": "string", "format": "email" }
48+
}
49+
}
50+
```
51+
52+
`properties` only describes what a property looks like _if it shows up_. It doesn't make the property show up. So the generator is perfectly happy producing:
53+
54+
```json
55+
{}
56+
```
57+
58+
or
59+
60+
```json
61+
{ "email": "someone@example.com" }
62+
```
63+
64+
Both are valid. Adding `"required": ["username"]` is what actually makes `username` mandatory, and the generator will reflect that.
65+
66+
#### Overly Permissive Types {#schema-datagen-debug-types}
67+
68+
A schema for an age field written as:
69+
70+
```json
71+
{ "type": "number" }
72+
```
73+
74+
will cheerfully produce `3.14` or `-7.9`. Those are valid numbers, just not valid ages. The schema should be:
75+
76+
```json
77+
{
78+
"type": "integer",
79+
"minimum": 0,
80+
"maximum": 130
81+
}
82+
```
83+
84+
#### `additionalProperties` Surprises {#schema-datagen-debug-additional}
85+
86+
Without `"additionalProperties": false`, the generator can (and will) tack on extra properties beyond whatever is listed in `properties`:
87+
88+
```json
89+
{
90+
"type": "object",
91+
"properties": {
92+
"id": { "type": "integer" }
93+
}
94+
}
95+
```
96+
97+
might produce:
98+
99+
```json
100+
{
101+
"id": 42,
102+
"xQ7": true,
103+
"lorem": "ipsum dolor"
104+
}
105+
```
106+
107+
If you only want `id`, say so with `"additionalProperties": false`.
108+
32109
## Capabilities {#schema-datagen-capabilities}
33110

34111
This library is quite powerful. It supports most JSON Schema keywords, including `if`/`then`/`else` and aggregation keywords (`oneOf`, `allOf`, etc.).
35112

36113
It currently does not support:
37114

38-
- anything complex involving RegEx\*
39115
- `$dynamicRef`
40116
- annotation / metadata keywords (e.g. `title`, `description`)
41117
- `content*` keywords
42118
- `dependencies` / `dependent*` keywords
43119

44-
*\* There are some libraries which provide limited RegEx-based string generation, but these do not support look-aheads which are required to combine multiple RegEx's with boolean logic. This functionality is required to support them alongside the aggregation keywords. I opted to just not support them at all until I can find a sufficient library.*
45-
46120
Everything else _should_ be mostly supported. Feel free to [open an issue](https://github.com/gregsdennis/json-everything/issues/new/choose) if you find something isn't working as you expect.
47121

48122
> `$ref` support does not check for infinite loops such as occur with schemas like `{ "$ref": "#" }`. If your schema includes a reference like this, a stack overflow exception is likely.
@@ -60,14 +134,16 @@ If a format is specified, it will be used.
60134

61135
#### `pattern` {#schema-datagen-pattern}
62136

63-
Regular expressions specified via `pattern` are supported in a very limited capacity. Only simple subschemas with a single `pattern` is supported.
137+
Regular expressions specified via `pattern` support combined constraint evaluation, including scenarios where multiple required patterns must be satisfied together.
138+
139+
Supported scenarios include:
64140

65-
- Combining multiple regular expressions using `allOf`, `anyOf`, or `oneOf` is not supported.
66-
- Inverting regular expressions using `not` is not supported.
67-
- Any regular expression not supported by the [FARE library](https://github.com/moodmosaic/Fare) is not supported.
68-
- Combining `pattern` with `minLength`/`maxLength` is not supported. RegEx supports length requirements, so they should be specified within the expression.
141+
- multiple `pattern` constraints across composed schemas
142+
- forbidden patterns via `not`
143+
- interactions between `pattern` and `minLength`/`maxLength`
144+
- interactions between `pattern` and `format`
69145

70-
If the above scenarios are detected, a `NotSupportedException` will be thrown.
146+
Some highly complex or mutually incompatible regex combinations may still be impossible to satisfy. In those cases, generation fails with [detailed error information](#schema-datagen-error-reporting).
71147

72148
### Numerics {#schema-datagen-numbers}
73149

@@ -140,11 +216,54 @@ The result object has several properties:
140216
- `Result` holds the value as a `JsonElement`, if successful
141217
- `ErrorMessage` holds any error message, if unsuccessful
142218
- `InnerResults` holds result objects from nested generations. This can be useful for debugging.
219+
- `Location` (if available) identifies where generation failed in the target instance, as a `JsonPointer`
220+
- `SchemaLocations` (if available) identifies one or more schema locations related to the failure, also as `JsonPointer`s
221+
222+
## Error Reporting {#schema-datagen-error-reporting}
223+
224+
When generation fails, start with the top-level `GenerationResult` returned by `.GenerateData()`:
225+
226+
- If `IsSuccess` is `false`, inspect `ErrorMessage` and `InnerResults`.
227+
- `InnerResults` contains nested failures from branches, properties, array items, or composed schemas.
228+
- Leaf failures can provide:
229+
- `Location` for the relative instance path that failed
230+
- `SchemaLocations` for the schema path(s) involved in that failure
231+
232+
In practice, a single generation failure can contain multiple nested reasons. Walking the `InnerResults` tree is the best way to produce a full error report.
233+
234+
```c#
235+
void PrintFailures(GenerationResult result, string indent = "")
236+
{
237+
if (result.IsSuccess) return;
238+
239+
if (!string.IsNullOrWhiteSpace(result.ErrorMessage))
240+
{
241+
Console.WriteLine($"{indent}Reason: {result.ErrorMessage}");
242+
if (result.Location != null)
243+
Console.WriteLine($"{indent}At: {result.Location}");
244+
245+
if (result.SchemaLocations is { Count: > 0 })
246+
{
247+
Console.WriteLine($"{indent}Schema path(s):");
248+
foreach (var schemaLocation in result.SchemaLocations)
249+
Console.WriteLine($"{indent}- {schemaLocation}");
250+
}
251+
}
252+
253+
if (result.InnerResults == null) return;
254+
foreach (var inner in result.InnerResults)
255+
PrintFailures(inner, indent + " ");
256+
}
257+
258+
var schema = JsonSchema.FromFile("myFile.json");
259+
var generationResult = schema.GenerateData();
260+
261+
if (!generationResult.IsSuccess)
262+
PrintFailures(generationResult);
263+
```
143264

144265
# Summary {#schema-datagen-summary}
145266

146267
So, uh, yeah. I guess that's it really.
147268

148-
The generation isn't 100%, but most of the time it will succeed in producing a value for schemas that can have one. You may want to validate the value against the schema as a sanity check.
149-
150269
Happy generating.

0 commit comments

Comments
 (0)