Skip to content

Commit 914c9f6

Browse files
committed
add a new UAPI.16 File Manifest spec
1 parent cd215e9 commit 914c9f6

1 file changed

Lines changed: 270 additions & 0 deletions

File tree

specs/file-manifest.md

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
---
2+
title: UAPI.16 File Manifest
3+
layout: posts
4+
version: 1.0
5+
weight: 6
6+
aliases:
7+
- /UAPI.16
8+
- /16
9+
---
10+
11+
# UAPI.16 File Manifest
12+
13+
| Version | Changes |
14+
|---------|-----------------|
15+
| 1.0 | Initial Release |
16+
17+
This format stores information about static, immutable data resources that can be acquired over the network
18+
and stored on a disk.
19+
20+
This manifest file format is inspired by the traditional UNIX `SHA256SUMS` and BSD `mtree(5)` file formats,
21+
but has a larger set of features:
22+
23+
1. It allows declaring the encoding of data files (i.e. `gzip` compression and suchlike).
24+
2. It allows referencing remote URLs as source for acquiring file data.
25+
3. It allows inline specification of file data (suitable for short data only).
26+
4. It allows referencing arbitrary local files as source for acquiring file data.
27+
5. It allows extracting data "slices" from data sources, via offset and range.
28+
6. It may store both encoded and decoded data sizes.
29+
7. Various fields of additional per-file metadata may be defined, including various fields for GPT partition table metadata.
30+
8. It's an extensible format permitting vendors and projects to add their own per-file and per-manifest fields.
31+
32+
This format is designed with tools such as `mkosi` and `systemd-sysupdate` in mind, but it's intended to be
33+
generally useful as a way to describe collections of files that may be acquired over the network or from
34+
local storage, and verified cryptographically.
35+
36+
## Manifest File Name and Media Type
37+
38+
When stored in a file system directory – alongside the data files it references – the manifest file should be
39+
named `Uapi16ManifestFile`.
40+
41+
Similarly, when placed on an HTTP server it's a good idea to name the last component of the URL
42+
`/Uapi16ManifestFile`, however this is just a suggestion in order to minimize surprises and maximize
43+
discoverability.
44+
45+
When served via an HTTP server it's recommended to use the media type `application/vnd.uapi.16.file.manifest`.
46+
47+
## Manifest Object
48+
49+
This is the top-level object for a manifest file. It's designed to be
50+
extensible and self-identifies as a manifest.
51+
52+
```json
53+
{
54+
"mediaType" : "application/vnd.uapi.16.file.manifest",
55+
"files" : [ ]
56+
}
57+
```
58+
59+
### Notes
60+
61+
The `files` field is an array of file objects, see below.
62+
63+
Files may appear in any order in the manifest, but it's recommended to sort them alphabetically by their name.
64+
65+
The `name` fields of file objects are supposed to be uniquely assigned within each manifest file.
66+
67+
## File Object
68+
69+
This is an object that encodes information about an individual file.
70+
71+
```json
72+
{
73+
"name" : "",
74+
"dataEncoding" : "gzip",
75+
"encodedDataSize" : 2011,
76+
"dataSize" : 4711,
77+
"dataFile" : "",
78+
"dataUrl" : "http://…",
79+
"dataLiteral" : "TmV2ZXIgZ29ubmEgZ2l2ZSB5b3UgdXAsIG5ldmVyIGdvbm5hIGxldCB5b3UgZG93bg==",
80+
"sliceOffset" : 22,
81+
"sliceSize" : 33,
82+
"sha256" : "",
83+
"gptLabel" : "",
84+
"gptTypeUuid" : "",
85+
"gptFlagNoAuto" : true,
86+
"gptFlagGrowFileSystem" : true,
87+
"readOnly" : true,
88+
"validAfterUSec": 4711,
89+
"validBeforeUSec" : 4711,
90+
"tags" : ["tag1", "tag2", ],
91+
"revoked" : false,
92+
"steppingStone": false,
93+
}
94+
```
95+
96+
### Notes
97+
98+
All fields but `name` are optional. If a field shall not be set it can either be set to `null` or simply
99+
omitted in the JSON object, both ways shall be considered equivalent.
100+
101+
If not otherwise indicated all fields are supposed to be of type string.
102+
103+
The `name` must be a valid UTF-8 POSIX filename, may not contain control characters (ASCII 0…31, 127), may
104+
not contain a slash, and may not be identical to "`.`", "`..`", "``" (but may contain these strings). It may
105+
have a maximum length of 255 bytes. Files whose names do not match these rules cannot be encoded in manifest
106+
files defined by this specification.
107+
108+
Only one of `dataFile`, `dataUrl`, `dataLiteral` should be set. `dataFile` references a file in the same
109+
place as the manifest file itself, `dataUrl` a full URL, and `dataLiteral` inline Base64. `dataUrl` may use
110+
`http://…` and `https://…` schemes only.
111+
112+
`dataSize` is the size of the decoded blob, `encodedDataSize` of the encoded blob. If `dataEncoding` is not
113+
specified, `encodedDataSize` should not be specified. Both are unsigned integer type.
114+
115+
`dataEncoding` should use the same encoding specifiers as HTTP.
116+
117+
`sliceOffset`, `sliceSize` (unsigned integers) are applied to the *decoded* data blob (i.e. after
118+
decompression). If not specified explicitly, they should default to a zero offset, and a slice size equal to
119+
the full decoded size.
120+
121+
`sha256` is the SHA256 hash of the specified slice, formatted in 64 hexadecimal characters. Parsers should
122+
parse this case-insensitively.
123+
124+
The `gpt*` fields encode fields that we need when placing these resources in a GPT partition table entry. The
125+
`gptFlag*` fields are of boolean type. The `gptTypeUuid` field takes a UUID in traditional string formatting.
126+
127+
The `gptLabel` field may not be longer than 72 characters. If `gptLabel` is unspecified, and the data is
128+
written to a GPT partition it should use the value of `name` as label instead. Note that this might require
129+
mangling, as GPT partition labels have smaller size limits than file names.
130+
131+
`readOnly` (a boolean) is relevant for initializing the read-only bit of GPT partition table entries. It
132+
should also be reflected in the POSIX 'w' access mode bit when storing the data in a regular file on disk. If
133+
unspecified, defaults to false.
134+
135+
`valid*USec` is in µs since UNIX epoch UTC, and shall be an unsigned integer. It defines a validity time
136+
window of the resource. The data should not be accessed or consumed outside of the specified time window. If
137+
`validAfterUSec` is not specified it should default to 0, if `validBeforeUSec` is not specified it should
138+
default to the largest supported unsigned integer value.
139+
140+
If neither `dataFile`, nor `dataUrl`, nor `dataLiteral` are specified, the `name` field should be used as
141+
fallback equivalent to it being repeated in `dataFile`.
142+
143+
`dataLiteral` may be encoded either in regular Base64 or in URL-safe Base64, consumers must be able to deal
144+
with either.
145+
146+
`tags` is an array of arbitrary usecase-specific strings. This may be used for filtering, for example to
147+
categorize files in various ways (for example, files could be tagged "unstable", "stable", "lto", to indicate
148+
stability levels, or similar).
149+
150+
`revoked` is a boolean. If true, the file should not be requested anymore, and if it has been acquired
151+
before, appropriate steps should be taken to invalidate/remove it from use, if possible (details are
152+
implementation-specific). If unspecified, defaults to `false`.
153+
154+
`steppingStone` is a boolean, defaulting to `false`. When this manifest file is processed by a software
155+
update tool, and lists multiple versions of the same resource in individual files, then the update tool
156+
should ensure that files where this property is set to `true` are never skipped for version updates, and
157+
installed and activated before considering any later versions of the same resource.
158+
159+
Additional hash algorithms may be defined in future.
160+
161+
Sizes, offsets, timestamps shall use positive integer JSON numbers, and may use values above 2⁵³. (This means
162+
– strictly speaking – the format cannot be processed losslessly by JSON parsers that exclusively use 64bit
163+
floating point for JSON numbers, if there are any files of such excessive sizes. Since it's unlikely that
164+
files included in a UAPI.16 manifest file are this large this should not be a practical limitation.)
165+
166+
## Relationship to SHA256SUMS
167+
168+
A UAPI manifest file that only uses the fields `name` and `sha256` can be mapped 1:1 to a SHA256SUMS file
169+
and back.
170+
171+
## Relationship to `mtree(5)`
172+
173+
Manifests following this specification should be mappable 1:1 to
174+
[`mtree(5)`](https://man.freebsd.org/cgi/man.cgi?mtree(5)) as long as:
175+
176+
1. Only the `name`, `dataSize`, `sha256` fields are used in the files objects, as per this specification.
177+
178+
2. Only the `size`, `sha256` fields are used in the `mtree(5)` files, as well as `type` is set to
179+
`file`. Only regular filenames may be specified as path, i.e. no `/` may be included.
180+
181+
## Acquiring a File Remotely
182+
183+
When acquiring a file listed in a UAPI.16 File Manifest from a web service, the following logic should be
184+
implemented.
185+
186+
1. If `revoked` is set to true, the download should immediately fail.
187+
2. If `validAfterUSec` is set and greater than the current time, the download should immediately fail.
188+
3. If `validBeforeUSec` is set and lower than the current time, the download should immediately fail.
189+
4. If `dataLiteral` is set, it should be Base64 decoded, continue in step 8.
190+
5. Otherwise, if `dataUrl` is set, the data from the URL should be acquired, continue in step 8.
191+
6. Otherwise, if `dataFile` is set, the data should be acquired from the same URL as the manifest itself, however with the last component of the URL replaced by the file name encoded in `dataFile`, following the same semantics as HTML relative links. Continue in step 8.
192+
7. Otherwise, the data should be acquired as in step 5, but the `name` string should be used as file name to suffix the URL with.
193+
8. If `encodedDataSize` or `dataSize` are set: the size of the downloaded data shall be checked against `encodedDataSize` (if set) or `dataSize` (otherwise).
194+
9. If `dataEncoding` is set: the downloaded data shall be decoded according to the algorithm indicated in `dataEncoding`.
195+
10. If `dataSize` is set: the resulting decoded data shall be checked against `dataSize`.
196+
11. The byte range indicated by `sliceOffset` (if not set: 0) and `sliceSize` (if not set, to the end of the file) shall be extracted from the decoded data.
197+
12. The hash of the extracted data shall be matched against the hash encoded in `sha256` (if set).
198+
199+
If all these steps succeed the extracted data from step 11 is the result of the operation.
200+
201+
Of course, many of the steps described above should typically be done together rather than serially for
202+
robustness, efficiency and security reasons. For example, if `dataEncoding` is not used, it is recommended to
203+
include the `sliceOffset` and `sliceSize` fields in HTTP range request fields already (in order to avoid
204+
downloading redundant data). Moreover, downloads should fail immediately once the encoded or decoded data
205+
goes beyond the indicated encoded or decoded data sizes. Then, the decoding should be done on-the-fly while
206+
the data is downloaded, and the checksum should be calculated on-the-fly too.
207+
208+
## Extensibility
209+
210+
Additional fields can be defined freely by implementors. Each such extension field should be named in the
211+
style of `x<Vendor>Foobar` to minimize risk of conflicts. Example `xAmutableFrobnicator` or
212+
`xMyProjectWeight`.
213+
214+
## Future Revisions
215+
216+
This specification is intended to be incrementally improved. Additional fields may be defined at any time,
217+
with additional, optional metadata. Should a breaking change be necessary the media type will be changed.
218+
219+
A number of future extensions are envisioned:
220+
221+
* Inline cryptographic signatures
222+
223+
## Example
224+
225+
```json
226+
{
227+
"mediaType" : "application/vnd.uapi.manifest",
228+
"files" : [
229+
{
230+
"name" : "FooOS.raw",
231+
"dataEncoding": "gzip",
232+
"encodedDataSize": 5642649603,
233+
"dataSize" : 7523532800,
234+
"sha256" : "922a9bae0e02b4ffac3e5ed5054230d0689b9c2e25b0178ba82b925f2a0c3e48",
235+
"validBeforeUSec" : 1776856773123234
236+
},
237+
{
238+
"name" : "FooOS_esp.raw",
239+
"dataFile" : "FooOS.raw",
240+
"dataEncoding": "gzip",
241+
"encodedDataSize": 5642649603,
242+
"dataSize" : 7523532800,
243+
"sliceOffset" : 2097152,
244+
"sliceSize" : 149175808,
245+
"gptLabel" : "EFI System Partition",
246+
"gptTypeUuid" : "c12a7328-f81f-11d2-ba4b-00a0c93ec93b",
247+
"sha256" : "5dcfd837a4868550cc61c256d9567a974e32a20985afa9e100b8b96755a20cae",
248+
"validBeforeUSec" : 1776856773123234
249+
},
250+
{
251+
"name" : "FooOS_root.raw",
252+
"dataFile" : "FooOS.raw",
253+
"dataEncoding": "gzip",
254+
"encodedDataSize": 5642649603,
255+
"dataSize" : 7523532800,
256+
"sliceOffset" : 351272960,
257+
"sliceSize" : 234003200,
258+
"sha256" : "00b70a0813e15f309828d3a36156283cba87576c26755e0b2d4cf0951eff8163",
259+
"gptLabel" : "FooOS_root",
260+
"gptTypeUuid": "4f68bce3-e8cd-4db1-96e7-fbcaf984b709",
261+
"readOnly": true,
262+
"validBeforeUSec" : 1776856773123234
263+
}
264+
]
265+
}
266+
```
267+
268+
The above provides three separate files `FooOS.raw`, `FooOS-esp.raw`, `FooOS-root.raw`. The latter two are
269+
defined as slices of the former, each encapsulating an individual partition. The data file is encoded via
270+
`gzip`.

0 commit comments

Comments
 (0)