Skip to content

Commit 40ade46

Browse files
committed
Add draft tags design
Draft design for replacing the hardcoded namespace concept with extensible tags declared by package authors and derived by tag providers.
1 parent 5ac1834 commit 40ade46

1 file changed

Lines changed: 378 additions & 0 deletions

File tree

docs/designs/tags.md

Lines changed: 378 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,378 @@
1+
# Tags: Extensible Model Grouping via Entry Points
2+
3+
Replace the hardcoded `namespace` concept (`"overture"`, `"annex"`) with tags --
4+
string labels declared by package authors and derived by tag providers. Tags
5+
become the filtering and grouping mechanism for model discovery, driven by
6+
package authors rather than central coordination.
7+
8+
Theme remains a first-class structural field. It maps to data partitioning and
9+
entry point naming, distinct from tags which are descriptive metadata.
10+
11+
## Phase 1: Basic Tagging
12+
13+
Replace namespace with tags, move discovery to `system`, update CLI.
14+
15+
### Data Model
16+
17+
`ModelKey` drops `namespace`, gains `tags`:
18+
19+
```python
20+
@dataclass(frozen=True, slots=True)
21+
class ModelKey:
22+
theme: str | None
23+
type: str
24+
tags: frozenset[str]
25+
class_name: str
26+
```
27+
28+
### Entry Point Format
29+
30+
Names change from `namespace:theme:type` to `theme:type` (or just `type` for
31+
non-themed models), with optional `#tag1,tag2` suffix. The group remains
32+
`overture.models`.
33+
34+
```toml
35+
[project]
36+
keywords = ["some-org-tag"]
37+
38+
[project.entry-points."overture.models"]
39+
"buildings:building" = "overture.schema.buildings:Building"
40+
"buildings:building_part#draft" = "overture.schema.buildings:BuildingPart"
41+
```
42+
43+
A model's tags are the union of:
44+
45+
- `[project].keywords` from the distribution metadata (read via
46+
`entry_point.dist.metadata["Keywords"]`)
47+
- Per-model `#` tags from the entry point name
48+
49+
Note: `[project].keywords` is also used for PyPI search, so schema tags and
50+
PyPI keywords share a namespace. This was considered and accepted -- in
51+
practice, schema-relevant keywords and PyPI discoverability keywords overlap
52+
naturally, and the `#` per-model mechanism covers cases where finer control is
53+
needed.
54+
55+
### Tag Prefix Reservation
56+
57+
The `system:` prefix is reserved for tag providers. Discovery rejects
58+
author-declared tags (keywords and `#`) that start with `system:`. Static
59+
sources containing `system:*` tags produce an error (or warning + discard).
60+
61+
### Discovery
62+
63+
`discover_models` and `ModelKey` move to `overture.schema.system.discovery`:
64+
65+
```python
66+
def discover_models(tags: set[str] | None = None) -> dict[ModelKey, type[BaseModel]]:
67+
```
68+
69+
When `tags` is provided, any-match semantics apply: models whose effective tags
70+
intersect the filter set are included. When `None`, all models are returned.
71+
72+
Implementation:
73+
74+
1. Iterate `overture.models` entry points
75+
2. Split name on `#` to separate `theme:type` from per-model tags
76+
3. Split the prefix on `:` -- one part = type only (theme is `None`), two
77+
parts = theme + type
78+
4. Load the model class via `entry_point.load()`
79+
5. Read `entry_point.dist.metadata["Keywords"]` for package-level tags
80+
6. Union package tags and per-model tags, rejecting any `system:*` tags
81+
7. Build `ModelKey` with `frozenset` tags
82+
83+
### CLI
84+
85+
`--namespace` and `--overture-types` are removed. `--tag` (repeatable) replaces
86+
them. `--theme` and `--type` remain.
87+
88+
Repeated `--tag` flags use OR: `--tag foo --tag bar` matches models with
89+
either tag. Repeated `--theme` flags also use OR. Filters compose as AND
90+
across dimensions: `--tag foo --theme buildings` takes the OR result from tags
91+
and intersects it with the OR result from themes.
92+
93+
`list-types` groups by theme and displays tags per model:
94+
95+
```
96+
buildings
97+
building overture
98+
building_part overture, draft
99+
places
100+
place overture
101+
transportation
102+
connector overture
103+
segment overture
104+
(unthemed)
105+
sources overture
106+
```
107+
108+
### Migration
109+
110+
Existing `namespace:theme:type` entry points become `theme:type` (or `type`).
111+
The `namespace` field is removed from `ModelKey`. The `overture` namespace
112+
does not need replacement as a keyword or `#` tag -- Phase 2 introduces a tag
113+
provider in `core` that derives the `overture` tag from `OvertureFeature`
114+
inheritance. The `annex` namespace has no equivalent; annex packages either
115+
declare their own keywords or rely on tag providers.
116+
117+
### What Moves
118+
119+
- `discover_models`, `ModelKey` move to `overture.schema.system.discovery`
120+
- CLI imports from `system` directly
121+
122+
---
123+
124+
## Phase 2: Tag Providers
125+
126+
Derived tags from model introspection. The `overture` tag (currently a hardcoded
127+
namespace) becomes a derived tag produced by a provider in `core`.
128+
129+
### Provider Mechanism
130+
131+
A new entry point group `overture.tag_providers` registers callables. The entry
132+
point key is informational (describes the provider's purpose) and not
133+
interpreted by discovery.
134+
135+
Provider signature:
136+
137+
```python
138+
(model_class: type[BaseModel], key: ModelKey, tags: set[str]) -> set[str]
139+
```
140+
141+
Providers receive the accumulated tag set and return a tag set (either the
142+
same object mutated or a new set). Tags accumulate across providers: each
143+
provider sees the result of all prior providers plus the static tags. The
144+
caller copies the set before passing it to each provider and diffs the copy
145+
against the return value to detect additions and removals. Providers can
146+
remove static tags (from keywords and `#`) and tags added by earlier
147+
providers.
148+
149+
**Unresolved: provider ordering.** Since providers chain, execution order
150+
affects results. Leading candidate: alphabetical by entry point name with
151+
rc.d-style numeric prefixes (`10_extensions`, `50_overture`, `90_approved`).
152+
Since the entry point key is informational, providers self-register their
153+
ordering. Published guidance would define what the priority ranges mean (e.g.,
154+
0-19 for structural tags, 20-49 for derived tags, 50-79 for classification,
155+
80-99 for approval/policy). Ranges are illustrative -- actual boundaries TBD.
156+
157+
Providers run after static tag resolution, before filtering. Only tag
158+
providers registered by `overture-schema-system` may add `system:`-prefixed
159+
tags. Discovery enforces this by checking `entry_point.dist.name` before
160+
accepting `system:` additions. This is a convention boundary, not a security
161+
mechanism -- any package could name itself `overture-schema-system`. It
162+
prevents accidental `system:` claims from third-party providers and
163+
ecosystem-specific packages (like `core`).
164+
165+
### Two-Phase Discovery
166+
167+
With tag providers, `discover_models` becomes:
168+
169+
1. **Load phase**: load models, resolve static tags (keywords + `#`)
170+
2. **Enrich phase**: load all `overture.tag_providers` entry points, run each
171+
provider (passing a copy of the current tags), diff input/output to detect
172+
changes, enforce `system:` prefix rules
173+
3. Freeze final `set[str]` tags into `frozenset[str]` on `ModelKey`, apply
174+
filter, return
175+
176+
### The `overture` Tag Provider
177+
178+
Lives in `core`, registered as an entry point:
179+
180+
```toml
181+
# packages/overture-schema-core/pyproject.toml
182+
[project.entry-points."overture.tag_providers"]
183+
overture = "overture.schema.core.tags:overture_provider"
184+
```
185+
186+
```python
187+
from overture.schema.core import OvertureFeature
188+
189+
def overture_provider(
190+
model_class: type[BaseModel], key: ModelKey, tags: set[str]
191+
) -> set[str]:
192+
if isinstance(model_class, type) and issubclass(model_class, OvertureFeature):
193+
tags.add("overture")
194+
return tags
195+
```
196+
197+
Theme packages do not need `"overture"` in `[project].keywords`. The tag is
198+
derived from `OvertureFeature` inheritance by this provider.
199+
200+
---
201+
202+
## Phase 3: Extension Support
203+
204+
Layer extension registration and discovery onto the tagging system. The
205+
extension mechanism itself (how extensions modify or compose with base models)
206+
is TBD and designed separately. This phase builds on Roel's proposal but that
207+
design is still subject to change. What follows covers registration, discovery,
208+
and CLI display only.
209+
210+
### Extension Primitives
211+
212+
`@extends` decorator and `Extends()` annotation move from `core` to
213+
`overture.schema.system`. These declare that a model augments one or more base
214+
models:
215+
216+
```python
217+
@extends(Building, Place)
218+
class OperatingHours(BaseModel):
219+
hours: ...
220+
timezone: ...
221+
```
222+
223+
Or as a `NewType` with annotation:
224+
225+
```python
226+
OperatingHours = Annotated[OperatingHoursModel, Extends(Building, Place)]
227+
```
228+
229+
### Extension Tag Provider
230+
231+
Lives in `system`, registered as an entry point:
232+
233+
```toml
234+
# packages/overture-schema-system/pyproject.toml
235+
[project.entry-points."overture.tag_providers"]
236+
extensions = "overture.schema.system.tags:extension_provider"
237+
```
238+
239+
```python
240+
def extension_provider(
241+
model_class: type[BaseModel], key: ModelKey, tags: set[str]
242+
) -> set[str]:
243+
if extends_classes(model_class):
244+
tags.add("system:extension")
245+
return tags
246+
```
247+
248+
Package authors register extensions as regular `overture.models` entry points.
249+
Extensions follow the same naming rules as other models: `theme:type` for
250+
themed extensions, or just `type` for unthemed ones. The `system:extension`
251+
tag is derived automatically; authors never declare it.
252+
253+
### Extension Discovery Helper
254+
255+
Extensions declare which base models they augment (via `@extends` /
256+
`Extends()`), but callers typically need the reverse: given a base model, which
257+
extensions apply to it? A convenience function in `system` builds this reverse
258+
mapping:
259+
260+
```python
261+
def extension_graph(
262+
models: dict[ModelKey, type[BaseModel]],
263+
) -> dict[type[BaseModel], list[type[BaseModel]]]:
264+
"""Map base models to the extensions that apply to them."""
265+
```
266+
267+
Filters `models` for the `system:extension` tag, calls `extends_classes()` on
268+
each to get its declared base models, and inverts the relationship. The CLI
269+
uses this to show registered extensions alongside the models they extend
270+
without reimplementing the traversal.
271+
272+
### CLI Display
273+
274+
The CLI treats `system:extension` as a presentation hint:
275+
276+
**`list-types`**: extensions appear in a separate section listing which models
277+
they extend. Core models show a compact cross-reference to available extensions.
278+
279+
```
280+
buildings
281+
building overture
282+
+ capacity, operating-hours
283+
building_part overture, draft
284+
285+
places
286+
place overture
287+
+ operating-hours
288+
289+
extensions
290+
capacity system:extension
291+
extends: building
292+
operating-hours system:extension
293+
extends: building, place
294+
```
295+
296+
**`describe-type` on a core model**: includes a "Registered extensions" section
297+
after the model's own fields.
298+
299+
```
300+
building (buildings) overture
301+
302+
Fields:
303+
geometry Geometry
304+
subtype BuildingSubtype | None
305+
...
306+
307+
Registered extensions:
308+
capacity max_occupancy, floor_area, ...
309+
operating-hours hours, timezone, ...
310+
```
311+
312+
**`describe-type` on an extension**: shows which models it extends and full
313+
field definitions.
314+
315+
---
316+
317+
## Phase 4: Manifest-Driven Approval (Sketch)
318+
319+
A tag provider that checks models against a curated manifest, enabling
320+
organizations to certify which models (and extensions) they endorse.
321+
322+
### Concept
323+
324+
A tag provider reads a manifest of approved entries and adds an approval tag
325+
to matching models. This lets an organization certify models without requiring
326+
changes to the model packages themselves.
327+
328+
### Shape
329+
330+
The manifest lives in a dedicated package (e.g.,
331+
`overture-schema-official-extensions`) that depends on nothing but `system`.
332+
It registers a tag provider:
333+
334+
```toml
335+
# packages/overture-schema-official-extensions/pyproject.toml
336+
[project.entry-points."overture.tag_providers"]
337+
approved = "overture.schema.official_extensions:approved_provider"
338+
```
339+
340+
The manifest can match against class names (entry point values), entry point
341+
names, and/or package names with optional version specifiers. Package names
342+
are harder to collide than class names, making them a stronger identifier for
343+
third-party extensions.
344+
345+
```python
346+
APPROVED = {
347+
# by class name
348+
"overture.schema.buildings:Building",
349+
"overture.schema.buildings:BuildingPart",
350+
# by package name (with optional version specifier)
351+
"acme-schema-parcels>=1.0",
352+
}
353+
354+
def approved_provider(
355+
model_class: type[BaseModel], key: ModelKey, tags: set[str]
356+
) -> set[str]:
357+
if _matches_manifest(key):
358+
tags.add("approved")
359+
return tags
360+
```
361+
362+
Note: `approved` is an unprefixed tag (not `system:approved`) since this
363+
provider lives outside `system`.
364+
365+
Note: `_matches_manifest` needs more than `ModelKey` to match against package
366+
names. `ModelKey` carries `class_name` (the entry point value) but not the
367+
distribution/package name. The provider will need access to additional context
368+
(e.g., the entry point object itself, or an enriched key). The right mechanism
369+
is TBD.
370+
371+
### Use Cases
372+
373+
- **Overture release certification**: only models in the manifest are tagged
374+
`approved` for a given release. The manifest package is versioned alongside
375+
the release.
376+
- **Organizational policy**: a company publishes their own manifest package
377+
with their own approved set (including approved extensions).
378+
- **CLI filtering**: `--tag approved` shows only certified models.

0 commit comments

Comments
 (0)