Skip to content

Commit 787f462

Browse files
committed
More proofreading
1 parent 24a79a7 commit 787f462

4 files changed

Lines changed: 94 additions & 70 deletions

File tree

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,10 @@ Locators can contain **nested errors** within the addressed part of a document:
7575
}
7676
~~~
7777

78+
## Building the specification document
79+
80+
Requires `quarto`.
81+
7882
## Implementations
7983

8084
See [directory `examples`](examples) for an example implementation of an XML validator supporting this error format.

_quarto.yml

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,26 @@ repo-url: https://github.com/gbv/validation-error-format
1414
resources:
1515
- schema.json
1616

17+
callout-appearance: simple
18+
number-sections: true
19+
number-depth: 2
20+
notebook-links: false
21+
crossref:
22+
lst-prefix: "Example"
23+
lst-title: "Example"
24+
lst-cap-location: bottom
25+
1726
format:
1827
html:
1928
highlight-style: kate
20-
notebook-links: false
21-
number-sections: true
22-
number-depth: 2
2329
css: style.css
24-
crossref:
25-
lst-prefix: "Example"
26-
lst-title: "Example"
27-
lst-cap-location: bottom
30+
# pdf:
31+
# toc: false
32+
# code-tools:
33+
# source: false
34+
# sansfont: Linux Biolinum O
35+
# mainfont: Linux Biolinum O
36+
# keeptex: true
37+
38+
execute:
39+
freeze: true

index.qmd

Lines changed: 64 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,19 @@ authors:
88
affiliations:
99
- name: Verbundzentrale des GBV (VZG)
1010
abstract: |
11-
This document specifies a data format to report validation errors of digital objects.
11+
This document specifies a data format to report validation errors of digital objects with error positions independent from specifid document models.
1212
---
1313

1414
# Introduction
1515

16+
> _All data is wrong, but some data is wrong on multiple levels._
17+
1618
Data validation is a crucial part of management of data quality and interoperability. Validation is applied in many ways and contexts, for instance input forms and editors with visual feedback or schema languages with formal error reports. The diversity of use cases imply a variety of error results. Existing standards for error reporting such as such as [JUnit XML](https://github.com/testmoapp/junitxml) and [Test Anything Protocol](https://testanything.org/) have narrow use cases in software development.
1719

1820
The specification of **Data Validation Error Format** has two goals:
1921

20-
- unify how validation errors are reported by different validators
21-
- address positions of errors in validated documents, independent from document formats
22+
- unify how validation errors are reported by different applications
23+
- reference positions of errors in validated documents, independent from document models
2224

2325
Last but not least the format should help to better separate validation and presentation of validation results, so both can be solved by different applications.
2426

@@ -49,9 +51,9 @@ Every document conforms to a **document model**. For instance JSON documents con
4951
Eventually all documents are given as digital objects, encoded as sequence of bytes. Encodings using a sequence of characters are also called textual data formats, in contrast to binary data formats.
5052
:::
5153

52-
An [error position](#sec-positions) is given in form of one or more **locators**, each having a [**dimension**](#sec-dimensions) and an **address**. Each dimension refers to a **locator format** for a set of document models. For instance [JSON Pointer] refers to JSON, character and line numbers refer to character strings with defined line breaks, and offsets refer to sequences of elements (@fig-encodings-and-locators). Other examples of locator formats include [XPath] for XML, and row/column for tabular data.
54+
An [error position](#positions) is given in form of one or more **locators**, each having a [**dimension**](#dimensions) and an **address**. Each dimension refers to a **locator format** for a set of document models. For instance [JSON Pointer] refers to JSON, character and line numbers refer to character strings with defined line breaks, and offsets refer to sequences of elements (@fig-encodings-and-locators). Other examples of locator formats include [XPath] for XML, and row/column for [tabular data](#tabular-document-models).
5355

54-
Locators can also contain **nested errors** to address a more specific position within another position and to support error positions in nested documents such as archive files.
56+
Locators can also contain **nested errors** to reference a more specific position within another position and to support error positions in nested documents such as archive files.
5557

5658
::: {#fig-encodings-and-locators}
5759

@@ -153,70 +155,90 @@ A similar document could be invalid on byte level. The following table illustrat
153155

154156
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 ([RFC 2119] and [RFC 8174]) when, and only when, they appear in all capitals, as shown here.
155157

156-
Only section @sec-errors to @sec-dimensions, excluding examples and notes, and the [list of normative references](#normative-references) are normative parts of this specification.
158+
Only section [2](#errors) to [4](#dimensions), excluding examples and notes, and the [list of normative references](#normative-references) are normative parts of this specification.
157159

160+
<!--
158161
Specific support of Data Validation Error Format by an application depends on:
159162
160-
1. the set of supported [**dimensions**](#sec-dimensions), and
161-
2. whether [**positions**](#positions) are supported in both full ([**locators**](#locators)) and condense form ([**locator maps**](#locator-map)), or only the latter.
163+
1. the set of supported [**dimensions**](#dimensions), and
164+
2. whether [**positions**](#positions) are supported only in condense form ([**locator maps**](#locator-map)), or also in full form (array of [**locators**](#locators))
162165
163166
Both MUST be documented by applications.
167+
-->
164168

165-
# Errors {#sec-errors}
169+
# Errors
166170

167-
An **Error** is a JSON object with:
171+
An **Error** is a JSON object with the following constraints:
168172

169-
- optional (but RECOMMENDED) field `message` with an **error message**, being a non-empty string.
170-
Applications MAY use a default value for error messages.
173+
- an error SHOULD have a field `message` with an **error message**, being a non-empty string.
174+
Applications MAY use a default value for error messages. Language and localization of error
175+
messages is out of the scope of this specification.
171176

172-
- optional field `types` with an array of **error types**, each being a non-empty string.
173-
Error types can be used for grouping errors or to reference a cause or constraint being violated
174-
with the error. Error types SHOULD be either URIs ([RFC 3986]) or local identifiers
175-
with same syntax as the name of a [dimension](#dimensions) or
177+
- an error MAY have field `types` with an array of **error types**, each being a non-empty string.
178+
Error types can be used for grouping errors and to reference a cause or constraint being violated
179+
by the error. Error types SHOULD be URIs ([RFC 3986]) local identifiers
180+
with same syntax as the name of a [dimension](#dimensions).
176181

177-
- optional field `level` with an **error level**, being one of the strings `error`, `warning`, or `info`.
178-
Application MUST use default value `error` if this field is not given.
182+
- an error MAY have field `level` with an **error level**, being one of the strings `error`, `warning`, or `info`. Application MUST use default value `error` if this field is not given.
179183

180-
- optional field `position` with a [**position**](#positions).
184+
- an error MAY have field `position` with a [**position**](#positions).
181185
Applications MUST NOT differentiate between no position and an empty position (an empty array or an empty JSON object).
182186

187+
Applications MUST use individual errors for individual positions of the kind of observation represented by the error. For instance a malformed character ocurring two times in a document results in two errors.
188+
183189
::: {.callout-note}
184-
Language and localization of error messages is out of the scope of this specification.
190+
By this definition the error `{}` is allowed and equivalent to `{"level":"error"}`.
185191
:::
186192

187193
# Positions
188194

189-
An error can have a **position**. A position is given
195+
The position of an error is given
190196

191-
- either in **full form** as JSON array of [**locators**](#locators),
197+
- either in **condense form** with a [**locator map**](#locator-maps),
192198

193-
- or in **condense form** with a [**locator map**](#locator-maps).
199+
- or in **full form** as JSON array of [**locators**](#locators).
194200

195-
Every locator map can be transformed to an equivalent array of locators. The reverse transformation is only possible if there is at most one locator per dimension and no locator has nested errors.
201+
Every locator map can be transformed to an equivalent array of locators. The reverse transformation is only possible if no locator has [nested errors](#locators) and there is not more then one locator per dimension.
196202

197-
::: {.callout-note}
198-
Locators of the same positions should refer to roughly the "same" part of a document or at least have a common intersection. This requirement is difficult to formalize because locators refer to different document models, so it is no normative part of this specification.
199-
:::
203+
A position with multiple locator of the same dimension does nor imply multiple errors but it references multiple elements involved in the same error (for instance a mismatch between two elements). Locators of different dimensions in the same position SHOULD refer to the the same elements or have a common intersection.
200204

201205
[locator format]: #locator-formats
202206
[locator map]:: #locator-maps
203207

204-
## Locators
208+
## Locator maps
209+
210+
A **locator map** is a JSON object that maps names of [**dimensions**](#dimensions) to [**addresses**](#dimensions).
211+
212+
```{#lst-locator-map .json lst-cap="A simple locator map indicating the position line 7, character 42"}
213+
{ "line": "7", "char": "42" }
214+
```
205215

206-
A **Locator** is a JSON object with
216+
A locator map can be transformed to an equivalent array of [locators](#locators) with key and value of the JSON object entries mapped to field `dimension` and `address` of each locator.
207217

208-
- mandatory field `dimension` with the name of a [**dimension**](#dimensions)
218+
```{#lst-locator-map .json lst-cap="Equivalent array of locators"}
219+
[
220+
{ "dimension": "line", "address": "7" },
221+
{ "dimension": "char", "address": "42" }
222+
]
223+
```
209224

210-
- mandatory field `address` with the **address**, being a string conforming to the **locator format** identified by the name of the **dimension**.
225+
Applications MAY restrict their support of Data Validation Error Format to positions in condense form being locator maps.
211226

212-
- optional field `errors` with an array of nested [**errors**](#sec-errors) within the located part of a document.
227+
## Locators
213228

229+
A locator references an element of a document. A **Locator** is a JSON object with the following constraints:
230+
231+
- the locator MUST have a field `dimension` with the name of a [**dimension**](#dimensions). Some dimensions imply a document model on elements referenced by locators of this dimension.
232+
233+
- the locator MUST have a field `address` with the **address**, being a string conforming to the **locator format** identified by the name of the **dimension**.
234+
235+
- the locator MAY have a field `errors` with an array of [**errors**](#errors) within the located element (**nested errors**).
214236

215237
```{#lst-locator .json lst-cap="A simple locator"}
216238
{ "dimension": "line", "address": "7" }
217239
```
218240

219-
Nested errors allow to reference locations within nested documents (@lst-nested-example and @lst-nested-example-2):
241+
Nested errors allow to reference locations within elements of a document. Positions of nested errors MUST be relative to the element referenced by their parent locator (@lst-nested-example and @lst-nested-example-2):
220242

221243
```{#lst-nested-example .json lst-cap="An error in line 2 of file `example.txt` in archive `archive.zip`"}
222244
{
@@ -256,36 +278,17 @@ Nested errors allow to reference locations within nested documents (@lst-nested-
256278
}
257279
```
258280

259-
## Locator maps
260-
261-
A **locator map** is a JSON object that maps names of [**dimensions**](#sec-dimensions) to [**addresses**](#sec-dimensions).
262-
263-
```{#lst-locator-map .json lst-cap="A simple locator map indicating the position line 7, character 42"}
264-
{ "line": "7", "char": "42" }
265-
```
266-
267-
A locator map can be transformed to an equivalent array of [locators](#locators) with key and value of the JSON object entries mapped to field `dimension` and `address` of each locator. An array of locators can be reduced to a locator map by dropping all nested errors and selecting only the first locator of each locator format.
268-
269-
```{#lst-locator-map .json lst-cap="Equivalent array of locators"}
270-
[
271-
{ "dimension": "line", "address": "7" },
272-
{ "dimension": "char", "address": "42" }
273-
]
274-
```
275-
276-
Applications MAY restrict their support of Data Validation Error Format to positions with locator maps. In this case nested errors and positions with multiple locators per dimension are not supported.
277-
278-
# Dimensions {#sec-dimensions}
281+
# Dimensions
279282

280-
A **dimension** is a defined method to address elements of a document. Each dimension has:
283+
A **dimension** is a defined method to reference elements of a document. Each dimension has:
281284

282285
- a unique **name**, being a string that start with lowercase letter `a` to `z`, optionally followed by a sequence of lowercase letters and digits `0` to `9`.
283286

284-
- a **locator format**, being a formal language of Unicode strings to encode references to parts of a document. The sets of strings of the language are called **addresses**.
287+
- a **locator format**, being a formal language of Unicode strings to encode references to elements of a document. The sets of strings of the language are called **addresses**.
285288

286289
- a **document model** matching the **locator format**.
287290

288-
Some dimensions imply a document model on addressed elements. For instance a [line number] addresses a character string and a [JSON Pointer] addresses a JSON value.
291+
Some dimensions imply a document model on referenced elements (element model). For instance a [line number] references a character string and a [JSON Pointer] references a JSON value.
289292

290293
Applications SHOULD support the following dimensions. The [appendix](#sec-additional-dimensions) contains a non-normative note on additional dimensions not fully specified yet.
291294

@@ -303,10 +306,10 @@ name | locator format | document model
303306
`jsonpointer` | [JSON Pointer] | JSON value | JSON value
304307
`xpath` | [XML Element Locator] | XML or compatible hierarchies | XML
305308

306-
The **identifier** locator format with name `id` and locator values being arbitrary Unicode strings subsumes every other locator format because locators of same value refer to the same element. It can be used for any kind of formalized reference to elements of a document, but its main use case are record identifiers, unique names and similar identifier systems.
309+
The **identifier** locator format with name `id` and locator values being arbitrary Unicode strings subsumes every other locator format because locators of same value refererence the same element. It can be used for any kind of formalized reference to elements of a document, but its main use case are record identifiers, unique names and similar identifier systems.
307310

308311
:::{.callout-note}
309-
Dimensions are a subset of query languages. A dimension value locates to *one* element from a document. A query language (e.g. JSONPath, full XPath...) can locate a set of elements.
312+
Dimensions are a subset of query languages. A dimension value refererences one element from a document. A query language (e.g. JSONPath, full XPath...) can locate a set of elements.
310313
:::
311314

312315
## Sequential document models
@@ -375,7 +378,7 @@ DIGIT = %x30-39
375378
```
376379

377380
:::{.callout-note}
378-
Tabular selection locator is a proper subset of [RFC 7111] URI Fragment Identifier.
381+
Tabular selection locator is a proper subset of [RFC 7111] URI Fragment Identifier, excluding [multi-selections].
379382
:::
380383

381384
## Hierarchical document models
@@ -401,10 +404,10 @@ Position = %31-%39 *DIGIT
401404
DIGIT = %x30-39 ; 0...9
402405
```
403406

404-
Applications MAY append the string `[1]` to an XML Element Locator if it does not end with a `Position` to ensure that a single element is referenced.
407+
Applications MUST NOT use one XML Element Locators to reference multiple XML elements. For this reason applications MAY always append the string `[1]` to an XML Element Locator if it does not end with a `Position`.
405408

406409
::: {.callout-note}
407-
XML Element Locator is a proper subset of (X)Path Expressions from [XPath] specifications.
410+
XML Element Locator is a proper subset of (X)Path Expressions from [XPath] specifications, limited to reference individual XML elements or attributes.
408411
:::
409412

410413
[QName]: https://www.w3.org/TR/REC-xml-names/#NT-QName

style.css

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,12 @@ figcaption {
99
.listing > figure > figcaption {
1010
text-align: left;
1111
}
12-
.callout.callout-style-default .callout-body,
13-
.callout.callout-style-default > div.callout-header {
12+
.callout.callout-style-simple .callout-body {
1413
font-size: 1rem;
1514
}
15+
a {
16+
text-decoration: none;
17+
}
18+
a:hover {
19+
text-decoration: underline;
20+
}

0 commit comments

Comments
 (0)