Skip to content

Commit 02c21e7

Browse files
Miriadbuilder
andcommitted
feat: add Cloudinary to Sanity asset migration tool
Sanity-first migration approach: - Phase 1: Discover Cloudinary references in Sanity documents - Phase 2: Extract unique Cloudinary URLs to migrate - Phase 3: Download from Cloudinary & upload to Sanity - Phase 4: Update document references (cloudinary.asset → image/file refs) - Phase 5: Generate migration report Features: - Handles cloudinary.asset plugin objects and plain URL strings - Supports both res.cloudinary.com/ajonp and media.codingcat.dev URLs - Resume support with incremental mapping persistence - Dry-run mode for previewing changes - Configurable concurrency and per-phase execution - Retry with exponential backoff Co-authored-by: builder <builder@miriad.systems>
1 parent 5bfa0e0 commit 02c21e7

File tree

5 files changed

+1197
-0
lines changed

5 files changed

+1197
-0
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,11 @@ next-env.d.ts
5151
# Firebase debug files
5252
firebase-debug.log
5353
firebase-debug.*.logpackage-lock.json
54+
55+
# Migration tool generated files
56+
scripts/migration/discovered-references.json
57+
scripts/migration/unique-cloudinary-urls.json
58+
scripts/migration/asset-mapping.json
59+
scripts/migration/migration-report.json
60+
scripts/migration/node_modules/
61+
scripts/migration/.env

scripts/migration/README.md

Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Cloudinary → Sanity Asset Migration (Sanity-First)
2+
3+
A production-grade Node.js tool that migrates Cloudinary assets to Sanity using
4+
a **Sanity-first** approach: it starts by scanning your Sanity documents to
5+
discover which Cloudinary assets are actually referenced, then migrates only
6+
those assets and rewrites all references.
7+
8+
## Why Sanity-First?
9+
10+
The previous approach enumerated **all** Cloudinary assets and uploaded them
11+
blindly. This was wasteful because:
12+
13+
- Many Cloudinary assets may not be referenced by any Sanity document
14+
- It uploaded assets that were never needed, wasting time and storage
15+
- It couldn't handle the Sanity Cloudinary plugin's `cloudinary.asset` type
16+
17+
The new approach:
18+
19+
1. **Discovers** what's actually used in Sanity
20+
2. **Extracts** a deduplicated list of Cloudinary URLs
21+
3. **Migrates** only what's needed
22+
4. **Updates** all references in-place
23+
5. **Reports** a full summary
24+
25+
---
26+
27+
## Prerequisites
28+
29+
| Requirement | Why |
30+
|---|---|
31+
| **Node.js ≥ 18** | Native `fetch` support & ES-module compatibility |
32+
| **Sanity project** | Project ID, dataset name, and a **write-enabled** API token |
33+
34+
> **Note:** Cloudinary API credentials are no longer required! The script
35+
> downloads assets directly from their public URLs. You only need Cloudinary
36+
> credentials if your assets are private/restricted.
37+
38+
## Quick Start
39+
40+
```bash
41+
# 1. Install dependencies
42+
cd migration
43+
npm install
44+
45+
# 2. Create your .env from the template
46+
cp env-example.txt .env
47+
# Then fill in your real credentials
48+
49+
# 3. Run the full migration (dry-run first!)
50+
npm run migrate:dry-run
51+
52+
# 4. Run for real
53+
npm run migrate
54+
```
55+
56+
## Environment Variables
57+
58+
Copy `env-example.txt` to `.env` and fill in:
59+
60+
| Variable | Required | Description |
61+
|---|---|---|
62+
| `SANITY_PROJECT_ID` || Sanity project ID |
63+
| `SANITY_DATASET` || Sanity dataset (e.g. `production`) |
64+
| `SANITY_TOKEN` || Sanity API token with **write** access |
65+
| `CLOUDINARY_CLOUD_NAME` | | Cloudinary cloud name (default: `ajonp`) |
66+
| `CONCURRENCY` | | Max parallel uploads (default: `5`) |
67+
| `DRY_RUN` | | Set to `true` to preview without writing |
68+
69+
## CLI Flags
70+
71+
```bash
72+
node migrate.mjs # Full migration, all phases
73+
node migrate.mjs --dry-run # Preview mode — no writes
74+
node migrate.mjs --phase=1 # Run only Phase 1
75+
node migrate.mjs --phase=1,2 # Run Phases 1 & 2
76+
node migrate.mjs --phase=3,4 # Run Phases 3 & 4 (uses cached data)
77+
node migrate.mjs --concurrency=10 # Override parallel upload limit
78+
```
79+
80+
## What Each Phase Does
81+
82+
### Phase 1 — Discover Cloudinary References in Sanity
83+
84+
Scans **all** Sanity documents (excluding built-in asset types) to find any
85+
that reference Cloudinary. Handles two types of references:
86+
87+
#### `cloudinary.asset` objects (Sanity Cloudinary Plugin)
88+
89+
The [sanity-plugin-cloudinary](https://github.com/sanity-io/sanity-plugin-cloudinary)
90+
stores assets as objects with `_type: "cloudinary.asset"` containing fields like
91+
`public_id`, `secure_url`, `resource_type`, `format`, etc.
92+
93+
#### Plain URL strings
94+
95+
Any string field containing:
96+
- `res.cloudinary.com/ajonp` (standard Cloudinary URL)
97+
- `media.codingcat.dev` (custom CNAME domain)
98+
99+
This includes both standalone URL fields and URLs embedded in text/markdown content.
100+
101+
**Output:** `discovered-references.json` — list of documents with their Cloudinary references.
102+
103+
### Phase 2 — Extract Unique Cloudinary URLs
104+
105+
Deduplicates all discovered references into a unique list of Cloudinary asset
106+
URLs that need to be migrated. Tracks which documents reference each URL.
107+
108+
**Output:** `unique-cloudinary-urls.json` — deduplicated URL list with metadata:
109+
```json
110+
{
111+
"cloudinaryUrl": "https://res.cloudinary.com/ajonp/image/upload/v123/folder/photo.jpg",
112+
"cloudinaryPublicId": "folder/photo",
113+
"resourceType": "image",
114+
"sourceDocIds": ["doc-abc", "doc-def"]
115+
}
116+
```
117+
118+
### Phase 3 — Download & Upload Assets
119+
120+
Downloads each unique Cloudinary asset and uploads it to Sanity's asset pipeline.
121+
122+
**Output:** `asset-mapping.json` — mapping between Cloudinary and Sanity:
123+
```json
124+
{
125+
"cloudinaryUrl": "https://res.cloudinary.com/ajonp/image/upload/v123/folder/photo.jpg",
126+
"cloudinaryPublicId": "folder/photo",
127+
"sanityAssetId": "image-abc123-1920x1080-jpg",
128+
"sanityUrl": "https://cdn.sanity.io/images/{projectId}/{dataset}/abc123-1920x1080.jpg",
129+
"sourceDocIds": ["doc-abc", "doc-def"]
130+
}
131+
```
132+
133+
- **Resume support**: assets already in the mapping are skipped automatically.
134+
- Retries failed downloads/uploads up to 3× with exponential back-off.
135+
136+
### Phase 4 — Update References
137+
138+
Patches Sanity documents to replace Cloudinary references with Sanity references:
139+
140+
| Reference Type | Action |
141+
|---|---|
142+
| `cloudinary.asset` object | Replaced with `{ _type: "image", asset: { _type: "reference", _ref: "..." } }` |
143+
| Full URL string | Replaced with Sanity CDN URL |
144+
| Embedded URL in text | URL swapped inline within the text |
145+
146+
All patches are applied inside **transactions** for atomicity (one transaction per document).
147+
148+
### Phase 5 — Report
149+
150+
Prints a summary to the console and writes a detailed report:
151+
152+
```
153+
══════════════════════════════════════════════════════════
154+
MIGRATION SUMMARY
155+
══════════════════════════════════════════════════════════
156+
Documents with refs: 42
157+
Total references found: 128
158+
cloudinary.asset objects: 35
159+
URL string fields: 61
160+
Embedded URLs in text: 32
161+
Unique Cloudinary URLs: 87
162+
Assets uploaded to Sanity: 87
163+
Document fields updated: 128
164+
Errors: 0
165+
══════════════════════════════════════════════════════════
166+
```
167+
168+
**Output:** `migration-report.json`
169+
170+
## Generated Files
171+
172+
| File | Phase | Description |
173+
|---|---|---|
174+
| `discovered-references.json` | 1 | Documents with Cloudinary references |
175+
| `unique-cloudinary-urls.json` | 2 | Deduplicated Cloudinary URLs to migrate |
176+
| `asset-mapping.json` | 3 | Cloudinary → Sanity asset mapping |
177+
| `migration-report.json` | 5 | Full migration report |
178+
179+
## Resuming an Interrupted Migration
180+
181+
The script is fully resumable:
182+
183+
1. **Phase 1** is skipped if `discovered-references.json` exists.
184+
2. **Phase 2** is skipped if `unique-cloudinary-urls.json` exists.
185+
3. **Phase 3** skips any asset already present in `asset-mapping.json`.
186+
4. **Phases 4–5** are idempotent — re-running them is safe.
187+
188+
To start completely fresh, delete the generated JSON files:
189+
190+
```bash
191+
rm -f discovered-references.json unique-cloudinary-urls.json asset-mapping.json migration-report.json
192+
```
193+
194+
## Troubleshooting
195+
196+
| Problem | Fix |
197+
|---|---|
198+
| `401 Unauthorized` from Sanity | Check `SANITY_TOKEN` has write permissions |
199+
| Download fails for private assets | Add Cloudinary credentials to `.env` and modify the download logic |
200+
| Script hangs | Check network; the script logs progress for every asset |
201+
| Partial migration | Just re-run — resume picks up where it left off |
202+
| `cloudinary.asset` not detected | Ensure the field has `_type: "cloudinary.asset"` in the document |
203+
| Custom CNAME not detected | Add your domain to `CLOUDINARY_PATTERNS` in the script |

scripts/migration/env-example.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Sanity credentials (required)
2+
SANITY_PROJECT_ID=your_project_id
3+
SANITY_DATASET=dev
4+
SANITY_TOKEN=your_sanity_token_with_write_access
5+
6+
# Cloudinary cloud name (optional, defaults to "ajonp")
7+
CLOUDINARY_CLOUD_NAME=ajonp
8+
9+
# Migration options (all optional)
10+
# Max parallel uploads (default: 5)
11+
CONCURRENCY=5
12+
13+
# Set to "true" to preview changes without writing anything
14+
DRY_RUN=false

0 commit comments

Comments
 (0)