Skip to content

fix: derive skill publish metadata from storage#2470

Open
jesse-merhi wants to merge 1 commit into
mainfrom
jesse/fix-skill-publish-metadata
Open

fix: derive skill publish metadata from storage#2470
jesse-merhi wants to merge 1 commit into
mainfrom
jesse/fix-skill-publish-metadata

Conversation

@jesse-merhi
Copy link
Copy Markdown
Member

Summary

  • Derive skill publish file size, content type, and sha256 from stored blobs inside publishVersionForUser.
  • Keep existing publish inputs working while enforcing file and total size limits against the stored bytes.
  • Add regression coverage for caller-supplied metadata that does not match storage.

Tests

bun run test -- convex/lib/skillPublish.test.ts --reporter=dot
bun run test -- convex/httpApiV1.handlers.test.ts --testNamePattern "publish json|publish multipart|Package publish|multipart package publish|multipart ClawPack|staged ClawPack" --reporter=dot
bun run test -- convex/httpApi.handlers.test.ts --testNamePattern "cliPublishHttp" --reporter=dot
bun run format:check
bun run lint
bunx tsc --noEmit --pretty false
bunx tsc -p packages/schema/tsconfig.json --noEmit --pretty false
bunx tsc -p packages/clawhub/tsconfig.json --noEmit --pretty false

Copilot AI review requested due to automatic review settings June 2, 2026 07:39
@jesse-merhi jesse-merhi requested review from a team and Patrick-Erichsen as code owners June 2, 2026 07:39
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clawhub Ready Ready Preview, Comment Jun 2, 2026 7:39am

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented Jun 2, 2026

Codex review: needs maintainer review before merge. Reviewed June 2, 2026, 3:44 AM ET / 07:44 UTC.

Summary
The branch updates skill publishing to load stored blobs, derive size, content type, and SHA-256 from storage, and adds regression tests for mismatched caller metadata.

Reproducibility: yes. from source inspection, not runtime execution: current main validates size and content type from args.files before reading storage, so mismatched caller metadata can affect enforcement. The PR adds focused tests for that path.

Review metrics: none identified.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Next step before merge

  • No ClawSweeper repair is needed; the remaining path is normal maintainer review and required-check consideration for a member-authored PR.

Security
Cleared: The diff strengthens the publish trust boundary by deriving metadata from stored blobs and does not add dependencies, secrets handling, workflow changes, or new third-party execution.

Review details

Best possible solution:

Land a central publish-layer fix that treats stored bytes as the source of truth while preserving the existing publish payload shape and focused regression coverage.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection, not runtime execution: current main validates size and content type from args.files before reading storage, so mismatched caller metadata can affect enforcement. The PR adds focused tests for that path.

Is this the best way to solve the issue?

Yes, deriving metadata inside publishVersionForUser is the narrow maintainable fix because UI, CLI, API, imports, and restores all converge there.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against dcbc38999f1a.

Label changes

Label changes:

  • add P2: This is a normal-priority publish integrity and size-limit fix with a bounded backend surface and regression tests.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate does not apply to this member-authored backend PR; the body lists targeted regression, HTTP, lint, and typecheck commands as validation context.

Label justifications:

  • P2: This is a normal-priority publish integrity and size-limit fix with a bounded backend surface and regression tests.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor proof gate does not apply to this member-authored backend PR; the body lists targeted regression, HTTP, lint, and typecheck commands as validation context.
Evidence reviewed

What I checked:

  • Live PR state: GitHub API shows the pull request is open, member-authored, mergeable but unstable, and changes two files with one commit at head 11d5fd0. (11d5fd011fcc)
  • Current main trust boundary: Current main sanitizes paths, filters macOS junk, then validates text-ness and size limits from caller-supplied file metadata before storage bytes are loaded. (convex/lib/skillPublish.ts:151, dcbc38999f1a)
  • PR implementation path: The PR routes publish files through derivePublishFilesFromStorage before downstream readme, scan, fingerprint, changelog, insert, backup, and webhook use. (convex/lib/skillPublish.ts:158, 11d5fd011fcc)
  • Storage-derived metadata helper: The new helper loads each blob from Convex storage, replaces size/content type with stored metadata, enforces text and size limits, and recomputes SHA-256 from blob bytes. (convex/lib/skillPublish.ts:493, 11d5fd011fcc)
  • Regression coverage: The added tests cover replacing caller-supplied size and SHA-256 with stored bytes and rejecting an oversized stored blob when caller metadata understates size. (convex/lib/skillPublish.test.ts:42, 11d5fd011fcc)
  • History provenance: Blame ties the current central publish validation block to Patrick Erichsen's reshape-skill-security work; recent publish-limit history also includes Jesse Merhi's multipart package publish work. (convex/lib/skillPublish.ts:98, 07fed45f425e)

Likely related people:

  • Patrick-Erichsen: Current main blame for publishVersionForUser validation and the v1 skill publish handler points to commit 07fed45, which reshaped skill verification and publish-adjacent security signals. (role: recent area contributor; confidence: high; commits: 07fed45f425e; files: convex/lib/skillPublish.ts, convex/httpApiV1/skillsV1.ts)
  • Jesse Merhi: Current main includes Jesse's recent publish-limit and upload-ticket work in commit dcbc389, which touches the same size-limit and publish trust-boundary area. (role: recent adjacent contributor; confidence: medium; commits: dcbc38999f1a; files: convex/lib/publishLimits.ts, convex/httpApiV1/packagesV1.ts, convex/uploads.ts)
  • Peter Steinberger: Commit 4ca5ba9 tightened ClawHub publish and API edge cases and touched the v1 skill API surface that calls publishVersionForUser. (role: earlier API publish contributor; confidence: medium; commits: 4ca5ba9d1edf; files: convex/httpApiV1/skillsV1.ts, docs/http-api.md)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces caller-supplied skill publish metadata with values derived from stored blobs, ensuring size, contentType, and sha256 reflect the actual storage contents and that file/total size limits cannot be bypassed by lying inputs.

Changes:

  • Extract a new derivePublishFilesFromStorage helper that loads each blob, validates content-type/size/total-size, and recomputes sha256.
  • Reorder publishVersionForUser to apply the storage-derived metadata before downstream consumers (readme detection, quality checks, etc.).
  • Add unit tests that cover happy-path metadata derivation and oversized-stored-blob rejection when caller metadata understates the size.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
convex/lib/skillPublish.ts Adds storage-driven metadata derivation, sha256 hashing, and rewires publish flow to use it.
convex/lib/skillPublish.test.ts Adds regression tests for derivePublishFilesFromStorage, including oversized-blob rejection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants