Skip to content

Commit 877b7c0

Browse files
authored
Merge pull request #4 from knowledgefutures/tr/record-level-schemas
2 parents bd9d2e0 + 3b99fee commit 877b7c0

29 files changed

Lines changed: 2066 additions & 3192 deletions

README.md

Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ src/
6868
│ │ ├── collections.ts # Collection CRUD
6969
│ │ ├── versions.ts # Version push/pull/diff + privacy filtering
7070
│ │ ├── files.ts # Content-addressed file storage
71+
│ │ ├── schemas.ts # Schema discovery, search, labeling
7172
│ │ └── health.ts # Health check
7273
│ └── server.ts # Fastify entry point
7374
├── db/
@@ -185,6 +186,56 @@ npm run secrets:decrypt # Decrypt .env.enc → .env
185186
npm run secrets:decrypt:dev # Decrypt .env.dev.enc → .env.dev
186187
```
187188

189+
## Schema System
190+
191+
Underlay uses **globally deduplicated, content-addressed schemas** for record validation and interoperability.
192+
193+
### How it works
194+
195+
- Each record type in a collection has its own JSON Schema, stored as an immutable, content-addressed row in the global `schemas` table.
196+
- A version declares its full set of type→schema bindings via the `version_schemas` join table.
197+
- If two collections define the same fields and types for a record type, they produce the same schema hash — alignment is automatic.
198+
- Schemas are never modified. Evolving a type produces a new hash and a new row.
199+
200+
### Push payload
201+
202+
```json
203+
{
204+
"schemas": {
205+
"Author": { "type": "object", "properties": { "name": { "type": "string" } } },
206+
"Pub": { "type": "object", "properties": { "title": { "type": "string" }, "authorId": { "type": "string", "x-ref-type": "Author" } } }
207+
},
208+
"changes": { "added": [...] }
209+
}
210+
```
211+
212+
### Relationship annotations
213+
214+
Fields that hold record IDs of another type use `"x-ref-type": "TypeName"` to document the relationship. This enables linked-record navigation in the UI and helps LLMs understand the relational graph.
215+
216+
### Schema labeling
217+
218+
Schemas can be labeled post-hoc with human-readable names or URIs (e.g. `schema.org/Person`, `dc.author.v1`). Labels enable discovery across collections without upfront coordination.
219+
220+
- `POST /api/schemas/:id/labels` — Add a label
221+
- `DELETE /api/schemas/:id/labels/:label` — Remove a label
222+
- `GET /api/schemas?label=...` — Search by label
223+
- Labels are injected as `x-underlay-labels` in schema exports (opt-out via `?raw=true`)
224+
225+
### Schema discovery API
226+
227+
| Endpoint | Purpose |
228+
|----------|--------|
229+
| `GET /api/schemas` | Global search (filter by `q`, `slug`, `label`, `schema_hash`) |
230+
| `GET /api/schemas/:id` | Single schema with labels + usage info |
231+
| `GET /api/collections/:owner/:slug/schemas` | Collection's schemas (with label enrichment) |
232+
233+
### Versioning semantics
234+
235+
- **Major bump**: Schema set changed (type added, removed, or schema modified)
236+
- **Minor bump**: Records changed, schema set identical
237+
- **Patch bump**: Only metadata changed (readme, message)
238+
188239
## Maintenance Checklist
189240

190241
When adding or changing features, update these locations:
@@ -198,13 +249,15 @@ When adding or changing features, update these locations:
198249
| Quick start | `src/pages/docs/quickstart.astro` | Getting started tutorial |
199250
| Self-hosting | `src/pages/docs/self-host.astro` | Deployment instructions |
200251
| DB schema | `src/db/schema.ts``npm run db:generate` | Schema changes need a migration |
252+
| Schema discovery | `src/api/routes/schemas.ts` | Schema search, labeling, cross-referencing |
201253
| Encrypted secrets | `.env.enc` / `.env.dev.enc` | Re-encrypt after changing .env files |
202254

203255
### Privacy features
204256

205-
The system supports three levels of privacy (type-level, field-level, record-level) via `"private": true` annotations. When changing how privacy works, update:
206-
- `src/api/routes/versions.ts` — filtering logic
257+
The system supports three levels of privacy (type-level, field-level, record-level) via `"private": true` annotations in per-type schemas. When changing how privacy works, update:
258+
- `src/api/routes/versions.ts` — filtering logic (reads from `version_schemas` JOIN `schemas`)
207259
- `src/api/routes/files.ts` — file access checks
260+
- `src/api/routes/schemas.ts` — public schema filtering
208261
- `public/.well-known/ai.txt` — Privacy section
209262
- `src/pages/docs/concepts.astro` — Privacy section
210263
- `src/pages/docs/api/versions.astro` — Push endpoint docs

public/.well-known/ai.txt

Lines changed: 49 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,8 @@ Delete endpoints require "admin" scope.
3535
- Version: an immutable snapshot containing a JSON Schema, records, and file references. Sequential integer numbers, auto-derived semver.
3636
- Record: a flat JSON object with { id, type, data }. Records reference other records by id and files by hash.
3737
- File: a binary blob stored by SHA-256 hash, referenced in record data as {"$file": "sha256:<hex>"}.
38-
- Schema: a JSON Schema document describing the record types in a version. Schema changes trigger a major version bump.
38+
- Schema: a JSON Schema document for a single record type, stored as a global, immutable, content-addressed entity. Each type gets its own schema. Schema changes trigger a major version bump.
39+
- Schema labeling: schemas can be labeled post-hoc with URIs or names (e.g. "schema.org/Person") for cross-collection discovery.
3940

4041
---
4142

@@ -109,33 +110,39 @@ Authorization: Bearer ul_<key>
109110
"message": "Daily archive 2026-04-27",
110111
"app_id": "my-app",
111112
"actor_id": "my-app:cron-job",
112-
"schema": {
113-
"type": "object",
114-
"properties": {
115-
"Article": {
116-
"type": "object",
117-
"properties": {
118-
"title": {"type": "string"},
119-
"body": {"type": "string"},
120-
"publishedAt": {"type": "string", "format": "date-time"}
121-
}
113+
"schemas": {
114+
"Article": {
115+
"type": "object",
116+
"properties": {
117+
"title": {"type": "string"},
118+
"body": {"type": "string"},
119+
"publishedAt": {"type": "string", "format": "date-time"},
120+
"authorId": {"type": "string", "x-ref-type": "Author"}
121+
}
122+
},
123+
"Author": {
124+
"type": "object",
125+
"properties": {
126+
"name": {"type": "string"},
127+
"email": {"type": "string", "private": true}
122128
}
123129
}
124130
},
125131
"changes": {
126132
"added": [
127-
{"id": "article-42", "type": "Article", "data": {"title": "New Paper", "body": "...", "publishedAt": "2026-04-27T00:00:00Z"}}
133+
{"id": "article-42", "type": "Article", "data": {"title": "New Paper", "body": "...", "publishedAt": "2026-04-27T00:00:00Z", "authorId": "author-1"}}
128134
],
129135
"updated": [
130-
{"id": "article-10", "type": "Article", "data": {"title": "Updated Title", "body": "...", "publishedAt": "2025-01-01T00:00:00Z"}}
136+
{"id": "article-10", "type": "Article", "data": {"title": "Updated Title", "body": "...", "publishedAt": "2025-01-01T00:00:00Z", "authorId": "author-2"}}
131137
],
132138
"removed": ["article-5"]
133139
}
134140
}
135141

136142
Field reference:
137143
- base_version: the version number you diffed against. null for first push. Used for optimistic locking.
138-
- schema: required on first push and whenever the schema changes. Omit if unchanged.
144+
- schemas: per-type JSON Schema map. Required on first push. If omitted on subsequent pushes and records validate against carried-forward schemas, they are carried forward automatically.
145+
- schemas[TypeName]: JSON Schema for that record type. Use "x-ref-type" to annotate foreign key fields.
139146
- message: human-readable commit message (optional).
140147
- app_id: identifier for the pushing application (optional).
141148
- actor_id: identifier for the user or process that triggered the push (optional).
@@ -157,7 +164,7 @@ Missing files (422 — records reference files not yet uploaded):
157164
→ Upload the listed files, then retry the push.
158165

159166
### First push (no existing versions)
160-
Set base_version to null. Put all records in changes.added. Include the schema.
167+
Set base_version to null. Put all records in changes.added. Include schemas for all types.
161168

162169
---
163170

@@ -182,11 +189,26 @@ The schema declares which fields are references, so tools can resolve them at re
182189

183190
---
184191

192+
## Schema Discovery
193+
194+
Schemas are globally deduplicated by content hash. If two collections use the same type shape, they share the same schema row. This enables cross-collection discovery.
195+
196+
GET /api/schemas → search schemas (?q=text&slug=TypeName&label=uri&schema_hash=sha256:...&limit=50&offset=0)
197+
GET /api/schemas/:id → single schema with labels and usage info
198+
GET /api/collections/:owner/:slug/schemas → schemas for latest version (?version=N for specific, ?raw=true to skip label enrichment)
199+
POST /api/schemas/:id/labels → add label {"label": "schema.org/Person"} (requires write scope)
200+
DELETE /api/schemas/:id/labels/:label → remove label (requires admin scope)
201+
202+
When schemas are returned via the collection schemas endpoint, known labels are injected as "x-underlay-labels" on the schema body (opt-out with ?raw=true).
203+
204+
---
205+
185206
## Versioning
186207

187208
- Versions are sequential integers (1, 2, 3, ...).
188209
- Semver is derived automatically: schema change → major, record change → minor, metadata-only → patch.
189-
- Each version has a content-addressed hash computed from schema + sorted records + sorted file hashes.
210+
- Schema change = any type's schema_id changed, or types added/removed between versions.
211+
- Each version has a content-addressed hash computed from sorted type→schema_hash map + sorted records + sorted file hashes.
190212
- Versions are immutable once created.
191213

192214
---
@@ -206,27 +228,24 @@ Underlay supports fine-grained privacy at three levels: types, fields, and indiv
206228
Private data is stored alongside public data in the same version but is only visible to the collection owner.
207229

208230
### Private Types
209-
Mark an entire type as private in the schema. All records of that type are hidden from public readers.
231+
Mark an entire type as private in its schema. All records of that type are hidden from public readers.
210232

211-
"schema": {
212-
"type": "object",
213-
"properties": {
214-
"Article": {
215-
"type": "object",
216-
"properties": { "title": {"type": "string"} }
217-
},
218-
"InternalNote": {
219-
"type": "object",
220-
"private": true,
221-
"properties": { "note": {"type": "string"}, "articleId": {"type": "string"} }
222-
}
233+
"schemas": {
234+
"Article": {
235+
"type": "object",
236+
"properties": { "title": {"type": "string"} }
237+
},
238+
"InternalNote": {
239+
"type": "object",
240+
"private": true,
241+
"properties": { "note": {"type": "string"}, "articleId": {"type": "string"} }
223242
}
224243
}
225244

226245
Public readers see only the Article type. InternalNote is completely hidden (including from the schema response).
227246

228247
### Private Fields
229-
Mark individual fields as private within a type. The type itself is visible, but those fields are stripped for public readers.
248+
Mark individual fields as private within a type's schema. The type itself is visible, but those fields are stripped for public readers.
230249

231250
"Author": {
232251
"type": "object",

src/api/routes/collections.ts

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,6 @@ export async function collectionsRoutes(app: FastifyInstance) {
188188
id: schema.versions.id,
189189
number: schema.versions.number,
190190
semver: schema.versions.semver,
191-
schema: schema.versions.schema,
192191
recordCount: schema.versions.recordCount,
193192
fileCount: schema.versions.fileCount,
194193
totalBytes: schema.versions.totalBytes,
@@ -417,6 +416,20 @@ export async function collectionsRoutes(app: FastifyInstance) {
417416
// Pipe tar → gzip → response
418417
const outputStream = pack.pipe(gzip);
419418

419+
// Load schemas for this version
420+
const versionSchemaEntries = await db
421+
.select({
422+
slug: schema.versionSchemas.slug,
423+
schemaBody: schema.schemas.schema,
424+
})
425+
.from(schema.versionSchemas)
426+
.innerJoin(schema.schemas, eq(schema.versionSchemas.schemaId, schema.schemas.id))
427+
.where(eq(schema.versionSchemas.versionId, version.id));
428+
429+
const schemasMap = Object.fromEntries(
430+
versionSchemaEntries.map((e) => [e.slug, e.schemaBody]),
431+
);
432+
420433
// Add manifest.json
421434
const manifest = {
422435
collection: { owner, slug, name: collection.name, description: collection.description },
@@ -430,7 +443,7 @@ export async function collectionsRoutes(app: FastifyInstance) {
430443
totalBytes: version.totalBytes,
431444
createdAt: version.createdAt,
432445
},
433-
schema: version.schema,
446+
schemas: schemasMap,
434447
};
435448
const manifestBuf = Buffer.from(JSON.stringify(manifest, null, 2));
436449
pack.entry({ name: "manifest.json", size: manifestBuf.length }, manifestBuf);

src/api/routes/files.ts

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ async function isFilePubliclyAccessible(
3535

3636
// Get the latest version
3737
const [latest] = await db
38-
.select({ id: schema.versions.id, schema: schema.versions.schema })
38+
.select({ id: schema.versions.id })
3939
.from(schema.versions)
4040
.where(eq(schema.versions.collectionId, collection.id))
4141
.orderBy(sql`${schema.versions.number} desc`)
@@ -54,14 +54,22 @@ async function isFilePubliclyAccessible(
5454

5555
if (!vf) return false;
5656

57-
// Get schema to determine private types and fields
58-
const schemaDoc = (latest.schema ?? {}) as Record<string, any>;
57+
// Load version schemas to determine private types and fields
58+
const schemaEntries = await db
59+
.select({
60+
slug: schema.versionSchemas.slug,
61+
schemaBody: schema.schemas.schema,
62+
})
63+
.from(schema.versionSchemas)
64+
.innerJoin(schema.schemas, eq(schema.versionSchemas.schemaId, schema.schemas.id))
65+
.where(eq(schema.versionSchemas.versionId, latest.id));
66+
5967
const privateTypes = new Set<string>();
60-
const props = schemaDoc?.properties as Record<string, any> | undefined;
61-
if (props) {
62-
for (const [typeName, typeDef] of Object.entries(props)) {
63-
if (typeDef?.private === true) privateTypes.add(typeName);
64-
}
68+
const typeSchemaMap = new Map<string, Record<string, any>>();
69+
for (const entry of schemaEntries) {
70+
const body = entry.schemaBody as Record<string, any>;
71+
typeSchemaMap.set(entry.slug, body);
72+
if (body?.private === true) privateTypes.add(entry.slug);
6573
}
6674

6775
// Find public records that reference this file hash
@@ -83,7 +91,8 @@ async function isFilePubliclyAccessible(
8391
if (privateTypes.has(rec.type)) continue;
8492

8593
// Get private fields for this type
86-
const typeProps = props?.[rec.type]?.properties as Record<string, any> | undefined;
94+
const typeSchema = typeSchemaMap.get(rec.type);
95+
const typeProps = typeSchema?.properties as Record<string, any> | undefined;
8796
if (!typeProps) return true; // no schema constraints, allow
8897

8998
const privateFields = new Set<string>();

0 commit comments

Comments
 (0)