Skip to content

Commit 525b142

Browse files
authored
Merge pull request package-url#589 from package-url/581-rename-rst-to-md
Rename PURL-SPECIFICATION.rst to md package-url#581
2 parents c53ba0e + 27bf322 commit 525b142

17 files changed

Lines changed: 1744 additions & 0 deletions

docs/how-to-build.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
## How to build a `purl` string from its components
2+
3+
Building a `purl` ASCII string works from left to right, from `type` to
4+
`subpath`.
5+
6+
Note: some extra type-specific normalizations are required.
7+
See the "Registered types section" for details.
8+
9+
To build a `purl` string from its components:
10+
11+
12+
- Start a `purl` string with the "pkg:" `scheme` as a lowercase ASCII string
13+
14+
- Append the `type` string to the `purl` as an unencoded lowercase ASCII string
15+
16+
- Append '/' to the `purl`
17+
18+
- If the `namespace` is not empty:
19+
20+
- Strip the `namespace` from leading and trailing '/'
21+
- Split on '/' as segments
22+
- Apply type-specific normalization to each segment if needed
23+
- UTF-8-encode each segment if needed in your programming language
24+
- Percent-encode each segment
25+
- Join the segments with '/'
26+
- Append this to the `purl`
27+
- Append '/' to the `purl`
28+
- Strip the `name` from leading and trailing '/'
29+
- Apply type-specific normalization to the `name` if needed
30+
- UTF-8-encode the `name` if needed in your programming language
31+
- Append the percent-encoded `name` to the `purl`
32+
33+
- If the `namespace` is empty:
34+
35+
- Apply type-specific normalization to the `name` if needed
36+
- UTF-8-encode the `name` if needed in your programming language
37+
- Append the percent-encoded `name` to the `purl`
38+
39+
- If the `version` is not empty:
40+
41+
- Append '@' to the `purl`
42+
- UTF-8-encode the `version` if needed in your programming language
43+
- Append the percent-encoded version to the `purl`
44+
45+
- If the `qualifiers` are not empty and not composed only of key/value pairs
46+
where the `value` is empty:
47+
48+
- Append '?' to the `purl`
49+
- Build a list from all key/value pair:
50+
51+
- Discard any pair where the `value` is empty.
52+
- UTF-8-encode each `value` if needed in your programming language
53+
- If the `key` is `checksum` and this is a list of checksums join this
54+
list with a ',' to create this qualifier `value`
55+
- Create a string by joining the lowercased `key`, the equal '=' sign and
56+
the percent-encoded `value` to create a qualifier
57+
58+
- Sort this list of qualifier strings lexicographically
59+
- Join this list of qualifier strings with a '&' ampersand
60+
- Append this string to the `purl`
61+
62+
- If the `subpath` is not empty and not composed only of empty, '.' and '..'
63+
segments:
64+
65+
- Append '#' to the `purl`
66+
- Strip the `subpath` from leading and trailing '/'
67+
- Split this on '/' as segments
68+
- Discard empty, '.' and '..' segments
69+
- Percent-encode each segment
70+
- UTF-8-encode each segment if needed in your programming language
71+
- Join the segments with '/'
72+
- Append this to the `purl`

docs/how-to-parse.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
## How to parse a `purl` string into its components
2+
3+
Parsing a `purl` ASCII string into its components works from right to left,
4+
from `subpath` to `type`.
5+
6+
Note: some extra type-specific normalizations are required.
7+
See the "Registered types section" for details.
8+
9+
To parse a `purl` string in its components:
10+
11+
- Split the `purl` string once from right on '#'
12+
13+
- The left side is the `remainder`
14+
- Strip the right side from leading and trailing '/'
15+
- Split this on '/'
16+
- Discard any empty string segment from that split
17+
- Percent-decode each segment
18+
- Discard any '.' or '..' segment from that split
19+
- UTF-8-decode each segment if needed in your programming language
20+
- Join segments back with a '/'
21+
- This is the `subpath`
22+
23+
- Split the `remainder` once from right on '?'
24+
25+
- The left side is the `remainder`
26+
- The right side is the `qualifiers` string
27+
- Split the `qualifiers` on '&'. Each part is a `key=value` pair
28+
- For each pair, split the `key=value` once from left on '=':
29+
30+
- The `key` is the lowercase left side
31+
- The `value` is the percent-decoded right side
32+
- UTF-8-decode the `value` if needed in your programming language
33+
- Discard any key/value pairs where the value is empty
34+
- If the `key` is `checksum`, split the `value` on ',' to create
35+
a list of checksums
36+
37+
- This list of key/value is the `qualifiers` object
38+
39+
- Split the `remainder` once from left on ':'
40+
41+
- The left side lowercased is the `scheme`
42+
- The right side is the `remainder`
43+
44+
- Strip all leading and trailing '/' characters (e.g., '/', '//', '///' and
45+
so on) from the `remainder`
46+
47+
- Split this once from left on '/'
48+
- The left side lowercased is the `type`
49+
- The right side is the `remainder`
50+
51+
- Split the `remainder` once from right on '@'
52+
53+
- The left side is the `remainder`
54+
- Percent-decode the right side. This is the `version`.
55+
- UTF-8-decode the `version` if needed in your programming language
56+
- This is the `version`
57+
58+
- Split the `remainder` once from right on '/'
59+
60+
- The left side is the `remainder`
61+
- Strip all leading characters (e.g., '/', '//' and so on)
62+
from the right side
63+
- Percent-decode the right side. This is the `name`
64+
- UTF-8-decode this `name` if needed in your programming language
65+
- Apply type-specific normalization to the `name` if needed
66+
- This is the `name`
67+
68+
- Split the `remainder` on '/'
69+
70+
- Strip all leading '/' characters (e.g., '/', '//' and so on)
71+
from that split
72+
- Discard any empty segment from that split
73+
- Percent-decode each segment
74+
- UTF-8-decode each segment if needed in your programming language
75+
- Apply type-specific normalization to each segment if needed
76+
- Join segments back with a '/'
77+
- This is the `namespace`

docs/known-qualifiers.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
## Known `purl` `qualifiers` key/value pairs
2+
3+
Note: Do not abuse `qualifiers`: it can be tempting to use many qualifier
4+
keys but their usage should be limited to the bare minimum for proper package
5+
identification to ensure that a `purl` stays compact and readable in most cases.
6+
7+
Additional, separate external attributes stored outside of a `purl` are the
8+
preferred mechanism to convey extra long and optional information such as a
9+
download URL, VCS URL or checksums in an API, database or web form.
10+
11+
With this warning, the known `key` and `value` defined here are valid for use in
12+
all package types:
13+
14+
- `vers` allows the specification of a version range.
15+
The value MUST adhere to the `Version Range Specification`.
16+
This qualifier is mutually exclusive with the `version` component.
17+
For example:
18+
19+
pkg:pypi/django?vers=vers:pypi%2F%3E%3D1.11.0%7C%21%3D1.11.1%7C%3C2.0.0
20+
21+
- `repository_url` is an extra URL for an alternative, non-default package
22+
repository or registry. When a package does not come from the default public
23+
package repository for its `type` a `purl` may be qualified with this extra
24+
URL. The default repository or registry of a `type` is documented in the
25+
"Registered `purl` types" section.
26+
27+
- `download_url` is an extra URL for a direct package web download URL to
28+
optionally qualify a `purl`.
29+
30+
- `vcs_url` is an extra URL for a package version control system URL to
31+
optionally qualify a `purl`. The syntax for this URL should be as defined in
32+
Python pip or the SPDX specification. See
33+
https://github.com/spdx/spdx-spec/blob/cfa1b9d08903/chapters/3-package-information.md#37-package-download-location
34+
35+
- TODO: incorporate the details from SPDX here.
36+
37+
- `file_name` is an extra file name of a package archive.
38+
39+
- `checksum` is a qualifier for one or more checksums stored as a
40+
comma-separated list. Each item in the `value` is in form of
41+
`lowercase_algorithm:hex_encoded_lowercase_value` such as
42+
`sha1:ad9503c3e994a4f611a4892f2e67ac82df727086`.
43+
For example (with checksums truncated for brevity):
44+
45+
checksum=sha1:ad9503c3e994a4f,sha256:41bf9088b3a1e6c1ef1d

docs/purl-spec-toc.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Package-URL Specification
2+
3+
The Package URL core specification defines a versioned and formalized format,
4+
syntax, and rules used to represent and validate `purl`.
5+
6+
A `purl` or package URL is an attempt to standardize existing approaches to
7+
reliably identify and locate software packages.
8+
9+
A `purl` is a URL string used to identify and locate a software package in a
10+
mostly universal and uniform way across programming languages, package managers,
11+
packaging conventions, tools, APIs and databases.
12+
13+
A `purl` is useful to reliably reference the same software package
14+
using a simple and expressive syntax and conventions based on familiar URLs.
15+
16+
17+
The Package-URL specification is organized in these documents:
18+
19+
- What is `purl` aka. package URL? -- https://github.com/package-url/purl-spec/blob/docs/standard/summary.md
20+
- Rules for each `purl` component -- https://github.com/package-url/purl-spec/blob/docs/standard/components.md
21+
- Character encoding -- https://github.com/package-url/purl-spec/blob/docs/standard/characters-and-encoding.md
22+
- How to build a `purl` string from its components -- https://github.com/package-url/purl-spec/blob/docs/how-to-build.md
23+
- How to parse a `purl` string into its components -- https://github.com/package-url/purl-spec/blob/docs/how-to-parse.md
24+
- Known `purl` `qualifiers` key/value pairs -- https://github.com/package-url/purl-spec/blob/docs/known-qualifiers.md
25+
- Tests -- https://github.com/package-url/purl-spec/blob/docs/tests.md

docs/standard/about.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# About this Specification
2+
3+
The document at [https://tc54.org/ecmaXXX/](https://tc54.org/ecmaXXX/) is the most accurate and up-to-date Package-URL specification.
4+
5+
This document is available as [a single page](https://ecma-tc54.github.io/ECMA-xxx-PURL/) and as [multiple pages](https://ecma-tc54.github.io/ECMA-xxx-PURL/multipage/).
6+
7+
# Contributing to this Specification
8+
9+
This specification is developed on GitHub with the help of the Package-URL community. There are a number of ways to contribute to the development of this specification:
10+
11+
* GitHub Repository: [https://github.com/Ecma-TC54/ECMA-xxx-PURL](https://github.com/Ecma-TC54/ECMA-xxx-PURL)
12+
* Issues: [All Issues](https://github.com/Ecma-TC54/ECMA-xxx-PURL/issues), [File a New Issue](https://github.com/Ecma-TC54/ECMA-xxx-PURL/issues/new)
13+
* Pull Requests: [All Pull Requests](https://github.com/Ecma-TC54/ECMA-xxx-PURL/pulls), [Create a New Pull Request](https://github.com/Ecma-TC54/ECMA-xxx-PURL/pulls/new)
14+
* Editors:
15+
* [John Horan](mailto:jmhoran@aboutcode.org)
16+
* [Michael Herzog](mailto:mjherzog@aboutcode.org)
17+
* [Philippe Ombredanne](mailto:pombredanne@aboutcode.org)
18+
* [Steve Springett](mailto:steve.springett@owasp.org)
19+
* Community:
20+
* Chat: [Slack Channel](https://cyclonedx.slack.com/archives/C06KTE3BWEB)
21+
22+
Refer to the [colophon](https://ecma-tc54.github.io/ECMA-xxx-PURL/#sec-colophon) for more information on how this document is created.
23+
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
## Permitted characters
2+
3+
A canonical `purl` is composed of these permitted ASCII characters:
4+
5+
- the Alphanumeric Characters: `A to Z`, `a to z`, `0 to 9`,
6+
- the Punctuation Characters: `.-_~` (period '.',
7+
dash '-', underscore '_' and tilde '~'),
8+
- the Percent Character: `%` (percent sign '%'), and
9+
- the Separator Characters `:/@?=&#` (colon ':', slash '/', at sign '@',
10+
question mark '?', equal sign '=', ampersand '&' and pound sign '#').
11+
12+
13+
## Separators
14+
15+
This is how each of the Separator Characters is used:
16+
17+
- ':' (colon) is the separator between `scheme` and `type`
18+
- '/' (slash) is the separator between `type`, `namespace` and `name`
19+
- '/' (slash) is the separator between `subpath` segments
20+
- '@' (at sign) is the separator between `name` and `version`
21+
- '?' (question mark) is the separator before `qualifiers`
22+
- '=' (equals) is the separator between a `key` and a `value` of a
23+
`qualifier`
24+
- '&' (ampersand) is the separator between `qualifiers` (each being a
25+
`key=value` pair)
26+
- '#' (number sign) is the separator before `subpath`
27+
28+
## Character encoding
29+
30+
- In the "Rules for each `purl` component" section, each component
31+
defines when and how to apply percent-encoding and decoding to its content.
32+
- When percent-encoding is required by a component definition, the component
33+
string MUST first be encoded as UTF-8.
34+
- In the component string, each "data octet" MUST be replaced by the
35+
percent-encoded "character triplet" applying the percent-encoding mechanism
36+
defined in [RFC 3986 section 2.1](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1),
37+
including the RFC definition of "data octet" and "character triplet",
38+
and using these definitions for RFC's "allowed set" and "delimiters":
39+
40+
- "allowed set" is composed of the Alphanumeric Characters and the
41+
Punctuation Characters
42+
- "delimiters" is composed of the Separator Characters
43+
44+
- The following characters MUST NOT be percent-encoded:
45+
46+
- the Alphanumeric Characters,
47+
- the Punctuation Characters,
48+
- the Separator Characters when being used as `purl` separators,
49+
- the colon ':', whether used as a Separator Character or otherwise, and
50+
- the percent sign '%' when used to represent a percent-encoded character.
51+
52+
- Where the space ' ' is permitted, it MUST be percent-encoded as '%20'.
53+
- With the exception of the percent-encoding mechanism, the rules regarding
54+
percent-encoding are defined by this specification alone.
55+
56+
## Case folding
57+
58+
References to "lowercase" in this specification refer to the **culture-invariant**
59+
full case mapping defined in
60+
[Section 3.13.2 of the Unicode Standard](https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G34078).
61+
62+
When applied to the ASCII character set, this operation converts uppercase
63+
Latin letters (`A to Z`) to their corresponding lowercase forms (`a to z`).
64+
All other ASCII characters remain unchanged.

0 commit comments

Comments
 (0)