Skip to content

Commit 1e271f0

Browse files
ascorbicclaude
andauthored
feat(pds): implement lexicon validation for mutation endpoints (#15)
* feat(pds): implement lexicon validation for mutation endpoints Adds record schema validation to createRecord, putRecord, and applyWrites endpoints using @atproto/lexicon package. Implementation: - Created RecordValidator class with optimistic validation strategy - Validates records against lexicon schemas when available - Fails open for unknown schemas (allows new/custom record types) - Supports both strict and optimistic validation modes - Schemas can be added dynamically via validator.addSchema() Changes: - New file: packages/pds/src/validation.ts - Core validation module - New file: packages/pds/test/validation.test.ts - 8 validation tests - Modified: packages/pds/src/xrpc/repo.ts - Added validation to mutations - Modified: EDGE_PDS_PLAN.md - Documented validation implementation All tests passing (81 total). * feat(pds): load official Bluesky lexicon schemas for validation Replaces empty validator with actual schema validation using official Bluesky lexicon JSON files vendored from the atproto repository. Changes: - Added update-lexicons.sh script to fetch official schemas from GitHub - Vendored 16 lexicon schemas (core, feed, actor, graph, richtext, embed) - Updated validation.ts to load all schemas with proper dependencies - Added 11 Bluesky-specific validation tests (all passing) - Tree-shakeable: only JSON files imported, no large dependencies Schemas loaded: - Core: com.atproto.repo.strongRef, com.atproto.label.defs - Feed: app.bsky.feed.{post, like, repost, threadgate} - Actor: app.bsky.actor.profile - Graph: app.bsky.graph.{follow, block, list, listitem} - Richtext: app.bsky.richtext.facet - Embed: app.bsky.embed.{images, external, record, recordWithMedia} All 19 tests passing (8 validation tests + 11 Bluesky tests). * refactor(pds): simplify update-lexicons script - Use array to define schemas instead of repetitive curl commands - Single loop to fetch all schemas - Better error handling with -fsSL flags - Easier to add new schemas - just add one line to array * refactor(pds): use Vite glob import to load lexicon schemas - Replaced 16+ manual imports with single import.meta.glob() - Automatically loads all .json files from ./lexicons/ - No code changes needed when adding new schemas - Cleaner, more maintainable code From 50+ lines of imports to 5 lines with glob. * fix(pds): add $type field to test records for lexicon validation Tests were failing because createRecord calls were missing the $type field, which is now required for records with known lexicon schemas. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * ci: add weekly workflow to update lexicon schemas Runs every Monday at 9am UTC and opens a PR if schemas have changed. Can also be triggered manually via workflow_dispatch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore(pds): typing improvements and dependency cleanup - Use LexiconDoc type instead of any in validation.ts - Add vite/client types to tsconfig for import.meta.glob - Format firehose test and add proper sequencer typing - Remove unused echo from update-lexicons script - Remove unused dependencies from lockfile 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent dcc2a0d commit 1e271f0

26 files changed

Lines changed: 1562 additions & 12 deletions
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Update Lexicons
2+
on:
3+
schedule:
4+
# Run weekly on Mondays at 9am UTC
5+
- cron: "0 9 * * 1"
6+
workflow_dispatch: # Allow manual trigger
7+
jobs:
8+
update-lexicons:
9+
runs-on: ubuntu-latest
10+
permissions:
11+
contents: write
12+
pull-requests: write
13+
steps:
14+
- name: Checkout
15+
uses: actions/checkout@v5
16+
- name: Update lexicon schemas
17+
run: ./packages/pds/scripts/update-lexicons.sh
18+
- name: Create Pull Request
19+
uses: peter-evans/create-pull-request@v8
20+
with:
21+
commit-message: "chore(pds): update lexicon schemas"
22+
title: "chore(pds): update lexicon schemas"
23+
body: |
24+
Automated update of Bluesky lexicon schemas from the official atproto repository.
25+
26+
This PR was created automatically by the weekly lexicon update workflow.
27+
branch: chore/update-lexicons
28+
delete-branch: true

EDGE_PDS_PLAN.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,13 @@ Build a single-user AT Protocol Personal Data Server (PDS) on Cloudflare Workers
4747
- `com.atproto.sync.getBlob` endpoint (public read access)
4848
- Direct R2 access in endpoint (R2ObjectBody cannot be serialized across RPC)
4949
- Blobs stored with DID prefix for isolation
50-
-**Testing** - Migrated to vitest 4, all 73 tests passing
50+
-**Testing** - Migrated to vitest 4, all 81 tests passing
5151
- 16 storage tests
5252
- 26 XRPC tests (auth, concurrency, error handling, CAR validation)
5353
- 6 firehose tests (event sequencing, cursor validation, backfill)
5454
- 10 blob tests (upload, retrieval, size limits, content types)
5555
- 15 session tests (login, refresh, getSession, JWT validation)
56+
- 8 validation tests (optimistic mode, strict mode, schema enforcement)
5657
-**TypeScript** - All diagnostic errors resolved, proper type declarations for cloudflare:test
5758
-**Protocol Helpers** - All protocol operations use official @atproto utilities
5859
- Record keys: `TID.nextStr()` from `@atproto/common-web`
@@ -75,6 +76,12 @@ Build a single-user AT Protocol Personal Data Server (PDS) on Cloudflare Workers
7576
- bcrypt password hashing with `bcryptjs`
7677
- Auth middleware accepts both static `AUTH_TOKEN` and JWT access tokens
7778
- 15 new tests for session endpoints
79+
-**Lexicon Validation** - Record schema validation for mutation endpoints
80+
- `RecordValidator` class using `@atproto/lexicon` package
81+
- Optimistic validation strategy (fail-open): validates if schema is loaded, allows unknown collections
82+
- Integrated into `createRecord`, `putRecord`, and `applyWrites` endpoints
83+
- Schemas can be added dynamically via `validator.addSchema()`
84+
- 8 validation tests covering optimistic mode, strict mode, and schema enforcement
7885

7986
### Not Started
8087

@@ -163,13 +170,12 @@ for (const [cidStr, bytes] of internalMap) { ... }
163170

164171
### Components We Will DEFER
165172

166-
| Component | Reason |
167-
| ------------------ | ------------------------------------------ |
168-
| OAuth Provider | Complex, not needed for single-user MVP |
169-
| Lexicon Validation | Can add later, not required for federation |
170-
| Rate Limiting | Single user, not needed for MVP |
171-
| Account Migration | Complex, post-MVP feature |
172-
| Labelling | AppView concern, not PDS |
173+
| Component | Reason |
174+
| ----------------- | --------------------------------------- |
175+
| OAuth Provider | Complex, not needed for single-user MVP |
176+
| Rate Limiting | Single user, not needed for MVP |
177+
| Account Migration | Complex, post-MVP feature |
178+
| Labelling | AppView concern, not PDS |
173179

174180
---
175181

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
#!/bin/bash
2+
#
3+
# Update lexicon schemas from the official Bluesky atproto repository
4+
#
5+
6+
set -e
7+
8+
LEXICONS_DIR="$(cd "$(dirname "$0")/../src/lexicons" && pwd)"
9+
REPO_BASE="https://raw.githubusercontent.com/bluesky-social/atproto/main/lexicons"
10+
11+
echo "Updating lexicon schemas in: $LEXICONS_DIR"
12+
mkdir -p "$LEXICONS_DIR"
13+
cd "$LEXICONS_DIR"
14+
15+
# Define schemas to fetch (namespace/name format)
16+
schemas=(
17+
# Core AT Proto schemas
18+
"com/atproto/repo/strongRef"
19+
"com/atproto/label/defs"
20+
21+
# Feed schemas
22+
"app/bsky/feed/post"
23+
"app/bsky/feed/like"
24+
"app/bsky/feed/repost"
25+
"app/bsky/feed/threadgate"
26+
27+
# Actor schemas
28+
"app/bsky/actor/profile"
29+
30+
# Graph schemas
31+
"app/bsky/graph/follow"
32+
"app/bsky/graph/block"
33+
"app/bsky/graph/list"
34+
"app/bsky/graph/listitem"
35+
36+
# Richtext schemas
37+
"app/bsky/richtext/facet"
38+
39+
# Embed schemas
40+
"app/bsky/embed/images"
41+
"app/bsky/embed/external"
42+
"app/bsky/embed/record"
43+
"app/bsky/embed/recordWithMedia"
44+
)
45+
46+
# Fetch each schema
47+
echo "Fetching ${#schemas[@]} schemas..."
48+
for schema in "${schemas[@]}"; do
49+
# Convert path to NSID (e.g., com/atproto/repo/strongRef -> com.atproto.repo.strongRef)
50+
nsid="${schema//\//.}"
51+
file="${nsid}.json"
52+
53+
echo "${nsid}"
54+
if ! curl -fsSL "$REPO_BASE/${schema}.json" -o "$file"; then
55+
echo " ✗ Failed to fetch ${nsid}" >&2
56+
exit 1
57+
fi
58+
done
59+
60+
echo ""
61+
echo "✓ Successfully fetched ${#schemas[@]} lexicon schemas!"
62+
echo ""
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
{
2+
"lexicon": 1,
3+
"id": "app.bsky.actor.profile",
4+
"defs": {
5+
"main": {
6+
"type": "record",
7+
"description": "A declaration of a Bluesky account profile.",
8+
"key": "literal:self",
9+
"record": {
10+
"type": "object",
11+
"properties": {
12+
"displayName": {
13+
"type": "string",
14+
"maxGraphemes": 64,
15+
"maxLength": 640
16+
},
17+
"description": {
18+
"type": "string",
19+
"description": "Free-form profile description text.",
20+
"maxGraphemes": 256,
21+
"maxLength": 2560
22+
},
23+
"pronouns": {
24+
"type": "string",
25+
"description": "Free-form pronouns text.",
26+
"maxGraphemes": 20,
27+
"maxLength": 200
28+
},
29+
"website": { "type": "string", "format": "uri" },
30+
"avatar": {
31+
"type": "blob",
32+
"description": "Small image to be displayed next to posts from account. AKA, 'profile picture'",
33+
"accept": ["image/png", "image/jpeg"],
34+
"maxSize": 1000000
35+
},
36+
"banner": {
37+
"type": "blob",
38+
"description": "Larger horizontal image to display behind profile view.",
39+
"accept": ["image/png", "image/jpeg"],
40+
"maxSize": 1000000
41+
},
42+
"labels": {
43+
"type": "union",
44+
"description": "Self-label values, specific to the Bluesky application, on the overall account.",
45+
"refs": ["com.atproto.label.defs#selfLabels"]
46+
},
47+
"joinedViaStarterPack": {
48+
"type": "ref",
49+
"ref": "com.atproto.repo.strongRef"
50+
},
51+
"pinnedPost": {
52+
"type": "ref",
53+
"ref": "com.atproto.repo.strongRef"
54+
},
55+
"createdAt": { "type": "string", "format": "datetime" }
56+
}
57+
}
58+
}
59+
}
60+
}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
{
2+
"lexicon": 1,
3+
"id": "app.bsky.embed.external",
4+
"defs": {
5+
"main": {
6+
"type": "object",
7+
"description": "A representation of some externally linked content (eg, a URL and 'card'), embedded in a Bluesky record (eg, a post).",
8+
"required": ["external"],
9+
"properties": {
10+
"external": {
11+
"type": "ref",
12+
"ref": "#external"
13+
}
14+
}
15+
},
16+
"external": {
17+
"type": "object",
18+
"required": ["uri", "title", "description"],
19+
"properties": {
20+
"uri": { "type": "string", "format": "uri" },
21+
"title": { "type": "string" },
22+
"description": { "type": "string" },
23+
"thumb": {
24+
"type": "blob",
25+
"accept": ["image/*"],
26+
"maxSize": 1000000
27+
}
28+
}
29+
},
30+
"view": {
31+
"type": "object",
32+
"required": ["external"],
33+
"properties": {
34+
"external": {
35+
"type": "ref",
36+
"ref": "#viewExternal"
37+
}
38+
}
39+
},
40+
"viewExternal": {
41+
"type": "object",
42+
"required": ["uri", "title", "description"],
43+
"properties": {
44+
"uri": { "type": "string", "format": "uri" },
45+
"title": { "type": "string" },
46+
"description": { "type": "string" },
47+
"thumb": { "type": "string", "format": "uri" }
48+
}
49+
}
50+
}
51+
}
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
{
2+
"lexicon": 1,
3+
"id": "app.bsky.embed.images",
4+
"description": "A set of images embedded in a Bluesky record (eg, a post).",
5+
"defs": {
6+
"main": {
7+
"type": "object",
8+
"required": ["images"],
9+
"properties": {
10+
"images": {
11+
"type": "array",
12+
"items": { "type": "ref", "ref": "#image" },
13+
"maxLength": 4
14+
}
15+
}
16+
},
17+
"image": {
18+
"type": "object",
19+
"required": ["image", "alt"],
20+
"properties": {
21+
"image": {
22+
"type": "blob",
23+
"accept": ["image/*"],
24+
"maxSize": 1000000
25+
},
26+
"alt": {
27+
"type": "string",
28+
"description": "Alt text description of the image, for accessibility."
29+
},
30+
"aspectRatio": {
31+
"type": "ref",
32+
"ref": "app.bsky.embed.defs#aspectRatio"
33+
}
34+
}
35+
},
36+
"view": {
37+
"type": "object",
38+
"required": ["images"],
39+
"properties": {
40+
"images": {
41+
"type": "array",
42+
"items": { "type": "ref", "ref": "#viewImage" },
43+
"maxLength": 4
44+
}
45+
}
46+
},
47+
"viewImage": {
48+
"type": "object",
49+
"required": ["thumb", "fullsize", "alt"],
50+
"properties": {
51+
"thumb": {
52+
"type": "string",
53+
"format": "uri",
54+
"description": "Fully-qualified URL where a thumbnail of the image can be fetched. For example, CDN location provided by the App View."
55+
},
56+
"fullsize": {
57+
"type": "string",
58+
"format": "uri",
59+
"description": "Fully-qualified URL where a large version of the image can be fetched. May or may not be the exact original blob. For example, CDN location provided by the App View."
60+
},
61+
"alt": {
62+
"type": "string",
63+
"description": "Alt text description of the image, for accessibility."
64+
},
65+
"aspectRatio": {
66+
"type": "ref",
67+
"ref": "app.bsky.embed.defs#aspectRatio"
68+
}
69+
}
70+
}
71+
}
72+
}

0 commit comments

Comments
 (0)