You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.qmd
+64-61Lines changed: 64 additions & 61 deletions
Original file line number
Diff line number
Diff line change
@@ -8,17 +8,19 @@ authors:
8
8
affiliations:
9
9
- name: Verbundzentrale des GBV (VZG)
10
10
abstract: |
11
-
This document specifies a data format to report validation errors of digital objects.
11
+
This document specifies a data format to report validation errors of digital objects with error positions independent from specifid document models.
12
12
---
13
13
14
14
# Introduction
15
15
16
+
> _All data is wrong, but some data is wrong on multiple levels._
17
+
16
18
Data validation is a crucial part of management of data quality and interoperability. Validation is applied in many ways and contexts, for instance input forms and editors with visual feedback or schema languages with formal error reports. The diversity of use cases imply a variety of error results. Existing standards for error reporting such as such as [JUnit XML](https://github.com/testmoapp/junitxml) and [Test Anything Protocol](https://testanything.org/) have narrow use cases in software development.
17
19
18
20
The specification of **Data Validation Error Format** has two goals:
19
21
20
-
- unify how validation errors are reported by different validators
21
-
-address positions of errors in validated documents, independent from document formats
22
+
- unify how validation errors are reported by different applications
23
+
-reference positions of errors in validated documents, independent from document models
22
24
23
25
Last but not least the format should help to better separate validation and presentation of validation results, so both can be solved by different applications.
24
26
@@ -49,9 +51,9 @@ Every document conforms to a **document model**. For instance JSON documents con
49
51
Eventually all documents are given as digital objects, encoded as sequence of bytes. Encodings using a sequence of characters are also called textual data formats, in contrast to binary data formats.
50
52
:::
51
53
52
-
An [error position](#sec-positions) is given in form of one or more **locators**, each having a [**dimension**](#sec-dimensions) and an **address**. Each dimension refers to a **locator format** for a set of document models. For instance [JSON Pointer] refers to JSON, character and line numbers refer to character strings with defined line breaks, and offsets refer to sequences of elements (@fig-encodings-and-locators). Other examples of locator formats include [XPath] for XML, and row/column for tabular data.
54
+
An [error position](#positions) is given in form of one or more **locators**, each having a [**dimension**](#dimensions) and an **address**. Each dimension refers to a **locator format** for a set of document models. For instance [JSON Pointer] refers to JSON, character and line numbers refer to character strings with defined line breaks, and offsets refer to sequences of elements (@fig-encodings-and-locators). Other examples of locator formats include [XPath] for XML, and row/column for [tabular data](#tabular-document-models).
53
55
54
-
Locators can also contain **nested errors** to address a more specific position within another position and to support error positions in nested documents such as archive files.
56
+
Locators can also contain **nested errors** to reference a more specific position within another position and to support error positions in nested documents such as archive files.
55
57
56
58
::: {#fig-encodings-and-locators}
57
59
@@ -153,70 +155,90 @@ A similar document could be invalid on byte level. The following table illustrat
153
155
154
156
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 ([RFC 2119] and [RFC 8174]) when, and only when, they appear in all capitals, as shown here.
155
157
156
-
Only section @sec-errors to @sec-dimensions, excluding examples and notes, and the [list of normative references](#normative-references) are normative parts of this specification.
158
+
Only section [2](#errors) to [4](#dimensions), excluding examples and notes, and the [list of normative references](#normative-references) are normative parts of this specification.
157
159
160
+
<!--
158
161
Specific support of Data Validation Error Format by an application depends on:
159
162
160
-
1. the set of supported [**dimensions**](#sec-dimensions), and
161
-
2. whether [**positions**](#positions) are supported in both full ([**locators**](#locators)) and condense form ([**locator maps**](#locator-map)), or only the latter.
163
+
1. the set of supported [**dimensions**](#dimensions), and
164
+
2. whether [**positions**](#positions) are supported only in condense form ([**locator maps**](#locator-map)), or also in full form (array of [**locators**](#locators))
162
165
163
166
Both MUST be documented by applications.
167
+
-->
164
168
165
-
# Errors {#sec-errors}
169
+
# Errors
166
170
167
-
An **Error** is a JSON object with:
171
+
An **Error** is a JSON object with the following constraints:
168
172
169
-
- optional (but RECOMMENDED) field `message` with an **error message**, being a non-empty string.
170
-
Applications MAY use a default value for error messages.
173
+
- an error SHOULD have a field `message` with an **error message**, being a non-empty string.
174
+
Applications MAY use a default value for error messages. Language and localization of error
175
+
messages is out of the scope of this specification.
171
176
172
-
-optional field `types` with an array of **error types**, each being a non-empty string.
173
-
Error types can be used for grouping errors or to reference a cause or constraint being violated
174
-
with the error. Error types SHOULD be either URIs ([RFC 3986]) or local identifiers
175
-
with same syntax as the name of a [dimension](#dimensions) or
177
+
-an error MAY have field `types` with an array of **error types**, each being a non-empty string.
178
+
Error types can be used for grouping errors and to reference a cause or constraint being violated
179
+
by the error. Error types SHOULD be URIs ([RFC 3986]) local identifiers
180
+
with same syntax as the name of a [dimension](#dimensions).
176
181
177
-
- optional field `level` with an **error level**, being one of the strings `error`, `warning`, or `info`.
178
-
Application MUST use default value `error` if this field is not given.
182
+
- an error MAY have field `level` with an **error level**, being one of the strings `error`, `warning`, or `info`. Application MUST use default value `error` if this field is not given.
179
183
180
-
-optional field `position` with a [**position**](#positions).
184
+
-an error MAY have field `position` with a [**position**](#positions).
181
185
Applications MUST NOT differentiate between no position and an empty position (an empty array or an empty JSON object).
182
186
187
+
Applications MUST use individual errors for individual positions of the kind of observation represented by the error. For instance a malformed character ocurring two times in a document results in two errors.
188
+
183
189
::: {.callout-note}
184
-
Language and localization of error messages is out of the scope of this specification.
190
+
By this definition the error `{}` is allowed and equivalent to `{"level":"error"}`.
185
191
:::
186
192
187
193
# Positions
188
194
189
-
An error can have a **position**. A position is given
195
+
The position of an error is given
190
196
191
-
- either in **full form**as JSON array of [**locators**](#locators),
197
+
- either in **condense form**with a [**locator map**](#locator-maps),
192
198
193
-
- or in **condense form**with a [**locator map**](#locator-maps).
199
+
- or in **full form**as JSON array of [**locators**](#locators).
194
200
195
-
Every locator map can be transformed to an equivalent array of locators. The reverse transformation is only possible if there is at most one locator per dimension and no locator has nested errors.
201
+
Every locator map can be transformed to an equivalent array of locators. The reverse transformation is only possible if no locator has [nested errors](#locators) and there is not more then one locator per dimension.
196
202
197
-
::: {.callout-note}
198
-
Locators of the same positions should refer to roughly the "same" part of a document or at least have a common intersection. This requirement is difficult to formalize because locators refer to different document models, so it is no normative part of this specification.
199
-
:::
203
+
A position with multiple locator of the same dimension does nor imply multiple errors but it references multiple elements involved in the same error (for instance a mismatch between two elements). Locators of different dimensions in the same position SHOULD refer to the the same elements or have a common intersection.
200
204
201
205
[locator format]: #locator-formats
202
206
[locator map]:: #locator-maps
203
207
204
-
## Locators
208
+
## Locator maps
209
+
210
+
A **locator map** is a JSON object that maps names of [**dimensions**](#dimensions) to [**addresses**](#dimensions).
211
+
212
+
```{#lst-locator-map .json lst-cap="A simple locator map indicating the position line 7, character 42"}
213
+
{ "line": "7", "char": "42" }
214
+
```
205
215
206
-
A **Locator** is a JSON object with
216
+
A locator map can be transformed to an equivalent array of [locators](#locators) with key and value of the JSON object entries mapped to field `dimension` and `address` of each locator.
207
217
208
-
- mandatory field `dimension` with the name of a [**dimension**](#dimensions)
218
+
```{#lst-locator-map .json lst-cap="Equivalent array of locators"}
219
+
[
220
+
{ "dimension": "line", "address": "7" },
221
+
{ "dimension": "char", "address": "42" }
222
+
]
223
+
```
209
224
210
-
- mandatory field `address` with the **address**, being a string conforming to the **locator format** identified by the name of the **dimension**.
225
+
Applications MAY restrict their support of Data Validation Error Format to positions in condense form being locator maps.
211
226
212
-
- optional field `errors` with an array of nested [**errors**](#sec-errors) within the located part of a document.
227
+
## Locators
213
228
229
+
A locator references an element of a document. A **Locator** is a JSON object with the following constraints:
230
+
231
+
- the locator MUST have a field `dimension` with the name of a [**dimension**](#dimensions). Some dimensions imply a document model on elements referenced by locators of this dimension.
232
+
233
+
- the locator MUST have a field `address` with the **address**, being a string conforming to the **locator format** identified by the name of the **dimension**.
234
+
235
+
- the locator MAY have a field `errors` with an array of [**errors**](#errors) within the located element (**nested errors**).
Nested errors allow to reference locations within nested documents (@lst-nested-example and @lst-nested-example-2):
241
+
Nested errors allow to reference locations within elements of a document. Positions of nested errors MUST be relative to the element referenced by their parent locator (@lst-nested-example and @lst-nested-example-2):
220
242
221
243
```{#lst-nested-example .json lst-cap="An error in line 2 of file `example.txt` in archive `archive.zip`"}
222
244
{
@@ -256,36 +278,17 @@ Nested errors allow to reference locations within nested documents (@lst-nested-
256
278
}
257
279
```
258
280
259
-
## Locator maps
260
-
261
-
A **locator map** is a JSON object that maps names of [**dimensions**](#sec-dimensions) to [**addresses**](#sec-dimensions).
262
-
263
-
```{#lst-locator-map .json lst-cap="A simple locator map indicating the position line 7, character 42"}
264
-
{ "line": "7", "char": "42" }
265
-
```
266
-
267
-
A locator map can be transformed to an equivalent array of [locators](#locators) with key and value of the JSON object entries mapped to field `dimension` and `address` of each locator. An array of locators can be reduced to a locator map by dropping all nested errors and selecting only the first locator of each locator format.
268
-
269
-
```{#lst-locator-map .json lst-cap="Equivalent array of locators"}
270
-
[
271
-
{ "dimension": "line", "address": "7" },
272
-
{ "dimension": "char", "address": "42" }
273
-
]
274
-
```
275
-
276
-
Applications MAY restrict their support of Data Validation Error Format to positions with locator maps. In this case nested errors and positions with multiple locators per dimension are not supported.
277
-
278
-
# Dimensions {#sec-dimensions}
281
+
# Dimensions
279
282
280
-
A **dimension** is a defined method to address elements of a document. Each dimension has:
283
+
A **dimension** is a defined method to reference elements of a document. Each dimension has:
281
284
282
285
- a unique **name**, being a string that start with lowercase letter `a` to `z`, optionally followed by a sequence of lowercase letters and digits `0` to `9`.
283
286
284
-
- a **locator format**, being a formal language of Unicode strings to encode references to parts of a document. The sets of strings of the language are called **addresses**.
287
+
- a **locator format**, being a formal language of Unicode strings to encode references to elements of a document. The sets of strings of the language are called **addresses**.
285
288
286
289
- a **document model** matching the **locator format**.
287
290
288
-
Some dimensions imply a document model on addressed elements. For instance a [line number]addresses a character string and a [JSON Pointer]addresses a JSON value.
291
+
Some dimensions imply a document model on referenced elements (element model). For instance a [line number]references a character string and a [JSON Pointer]references a JSON value.
289
292
290
293
Applications SHOULD support the following dimensions. The [appendix](#sec-additional-dimensions) contains a non-normative note on additional dimensions not fully specified yet.
291
294
@@ -303,10 +306,10 @@ name | locator format | document model
303
306
`jsonpointer` | [JSON Pointer] | JSON value | JSON value
304
307
`xpath` | [XML Element Locator] | XML or compatible hierarchies | XML
305
308
306
-
The **identifier** locator format with name `id` and locator values being arbitrary Unicode strings subsumes every other locator format because locators of same value refer to the same element. It can be used for any kind of formalized reference to elements of a document, but its main use case are record identifiers, unique names and similar identifier systems.
309
+
The **identifier** locator format with name `id` and locator values being arbitrary Unicode strings subsumes every other locator format because locators of same value refererence the same element. It can be used for any kind of formalized reference to elements of a document, but its main use case are record identifiers, unique names and similar identifier systems.
307
310
308
311
:::{.callout-note}
309
-
Dimensions are a subset of query languages. A dimension value locates to *one* element from a document. A query language (e.g. JSONPath, full XPath...) can locate a set of elements.
312
+
Dimensions are a subset of query languages. A dimension value refererences one element from a document. A query language (e.g. JSONPath, full XPath...) can locate a set of elements.
310
313
:::
311
314
312
315
## Sequential document models
@@ -375,7 +378,7 @@ DIGIT = %x30-39
375
378
```
376
379
377
380
:::{.callout-note}
378
-
Tabular selection locator is a proper subset of [RFC 7111] URI Fragment Identifier.
381
+
Tabular selection locator is a proper subset of [RFC 7111] URI Fragment Identifier, excluding [multi-selections].
379
382
:::
380
383
381
384
## Hierarchical document models
@@ -401,10 +404,10 @@ Position = %31-%39 *DIGIT
401
404
DIGIT = %x30-39 ; 0...9
402
405
```
403
406
404
-
Applications MAY append the string `[1]` to an XML Element Locator if it does not end with a `Position` to ensure that a single element is referenced.
407
+
Applications MUST NOT use one XML Element Locators to reference multiple XML elements. For this reason applications MAY always append the string `[1]` to an XML Element Locator if it does not end with a `Position`.
405
408
406
409
::: {.callout-note}
407
-
XML Element Locator is a proper subset of (X)Path Expressions from [XPath] specifications.
410
+
XML Element Locator is a proper subset of (X)Path Expressions from [XPath] specifications, limited to reference individual XML elements or attributes.
0 commit comments