Skip to content

Commit aa35776

Browse files
Merge branch 'simstudioai:main' into main
2 parents cba1939 + f6c9998 commit aa35776

349 files changed

Lines changed: 52059 additions & 11207 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
name: memory-load-check
3+
description: Review PRs and diffs for unbounded memory loading, concurrency explosions, oversized payload materialization, and missing pagination or byte caps. Use when reviewing cleanup jobs, background jobs, data imports/exports, file parsing, API fan-out, workflow execution payloads, large arrays/files, or any change that reads many rows, files, responses, logs, or external API pages into process memory.
4+
---
5+
6+
# Memory Load Check
7+
8+
Use this skill when a PR or diff could load unbounded data into a Node/Bun process, especially in cron routes, background tasks, API routes, workflow execution, file parsing, cleanup jobs, migrations, import/export flows, and external API integrations.
9+
10+
## Review Goal
11+
12+
Prove each changed path has explicit bounds for:
13+
- rows held in memory
14+
- bytes held in memory
15+
- concurrent promises, DB queries, HTTP calls, storage operations, and jobs
16+
- number of pages, batches, chunks, retries, and retained intermediate objects
17+
18+
If any bound depends only on current production size or "probably small" data, treat it as a finding.
19+
20+
## References
21+
22+
Read these when doing a deeper pass:
23+
- Node.js streams/backpressure: https://nodejs.org/learn/modules/backpressuring-in-streams
24+
- Node.js stream usage: https://nodejs.org/en/learn/modules/how-to-use-streams
25+
- Keyset/cursor pagination over offset scans: https://blog.sequinstream.com/keyset-cursors-not-offsets-for-postgres-pagination/
26+
- Postgres pagination tradeoffs: https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/
27+
28+
## Sim Helpers To Prefer
29+
30+
- `apps/sim/lib/cleanup/batch-delete.ts`
31+
- `chunkedBatchDelete`: bounded SELECT -> optional side effect -> DELETE loop.
32+
- `batchDeleteByWorkspaceAndTimestamp`: common workspace/timestamp cleanup wrapper.
33+
- `selectRowsByIdChunks`: chunks large ID sets and enforces an overall row cap.
34+
- `chunkArray`: use only after the input set itself is already bounded.
35+
- `apps/sim/lib/core/utils/stream-limits.ts`
36+
- `PayloadSizeLimitError`
37+
- `assertKnownSizeWithinLimit`
38+
- `assertContentLengthWithinLimit`
39+
- `readStreamToBufferWithLimit`
40+
- `readNodeStreamToBufferWithLimit`
41+
- `readResponseToBufferWithLimit`
42+
- `readResponseTextWithLimit`
43+
- Cleanup dispatcher pattern in `apps/sim/lib/billing/cleanup-dispatcher.ts`
44+
- page active workspaces with `WHERE id > afterId ORDER BY id LIMIT N`
45+
- dispatch concrete chunks (`workspaceIds`, retention, label) instead of one giant scope
46+
- prefer Trigger.dev queue/concurrency keys when available
47+
- execute inline fallback chunks sequentially, not with unbounded `Promise.all`
48+
- File parse route pattern in `apps/sim/app/api/files/parse/route.ts`
49+
- cap downloads and parsed output separately
50+
- preserve partial results when a later item exceeds the cap
51+
- never read untrusted response bodies without a byte cap
52+
- Large workflow value payloads
53+
- prefer durable references/manifests over inlining large arrays or files
54+
- materialize refs only behind an explicit byte budget
55+
56+
## Review Workflow
57+
58+
1. Identify every changed data source:
59+
- database queries
60+
- storage lists/downloads/uploads
61+
- external API pagination
62+
- file reads and HTTP responses
63+
- workflow logs, snapshots, payloads, arrays, and manifests
64+
- queues, cron routes, and background jobs
65+
2. For each source, write down the maximum cardinality and maximum bytes. If the code does not enforce one, it is unbounded.
66+
3. Trace whether data is processed incrementally or accumulated:
67+
- arrays from `select`, `findMany`, `Promise.all`, `map`, `filter`, `flatMap`
68+
- maps/sets keyed by all users, workspaces, executions, files, or rows
69+
- `Buffer.concat`, `response.arrayBuffer()`, `response.text()`, `JSON.stringify`, `JSON.parse`
70+
- queues of promises or job payloads built before dispatch
71+
4. Check concurrency separately from memory:
72+
- no `Promise.all(items.map(...))` unless `items` is already small and bounded
73+
- use chunks, sequential loops, queue concurrency, or a concurrency limiter
74+
- align concurrency with DB pool size, storage/API limits, and task queue semantics
75+
5. Verify SQL shape:
76+
- every bulk query has `LIMIT`
77+
- large pagination uses cursor/keyset style (`id > afterId`, timestamps plus unique ID), not deep `OFFSET`
78+
- `IN (...)` lists are chunked
79+
- side-effect rows selected before delete have per-batch and per-run caps
80+
6. Verify byte safety:
81+
- check `Content-Length` when available
82+
- stream with cumulative byte accounting
83+
- cap both input bytes and expanded output bytes
84+
- reject or reference oversized values before serializing large JSON responses
85+
7. Confirm failure behavior:
86+
- exceeding a cap should stop before loading more data
87+
- partial successful work should be preserved when the API contract expects it
88+
- retries should not duplicate huge in-memory state
89+
- cleanup jobs should make progress over future runs instead of widening one run
90+
91+
## Red Flags
92+
93+
- loads all active workspaces, users, executions, logs, files, messages, or subscriptions before filtering
94+
- builds a full `Map` or `Set` for a platform-wide scope
95+
- uses `Promise.all` over rows from an unbounded query
96+
- fetches all pages from an external API before processing
97+
- reads an entire file, HTTP response, or stream without a max byte budget
98+
- checks size only after `Buffer.concat`, `arrayBuffer`, `text`, `JSON.parse`, or parse expansion
99+
- chunks only after loading the complete dataset
100+
- paginates with unbounded/deep `OFFSET` on a mutable or large table
101+
- creates one queue job per row without batching or a queue-level concurrency key
102+
- accumulates per-row errors/results with no maximum
103+
- adds a cache, singleton, or module-level collection without eviction or size limits
104+
105+
## Preferred Fixes
106+
107+
- Move filters into SQL/API requests and select only needed columns.
108+
- Replace full-table loads with cursor/keyset pagination and a deterministic order.
109+
- Process one page/batch at a time; do not keep previous pages unless needed.
110+
- Add per-batch and per-run row caps so long backlogs drain across repeated jobs.
111+
- Split large ID lists with `selectRowsByIdChunks` or `chunkArray` after bounding the source.
112+
- Use `chunkedBatchDelete` for cleanup loops with row side effects.
113+
- Use stream-limit helpers for file/HTTP/body reads.
114+
- Store large workflow values as refs/manifests and materialize only within a caller budget.
115+
- Replace unbounded `Promise.all` with sequential chunk loops, queue concurrency, or a small limiter.
116+
- Include tests that prove caps stop work early and partial results or progress are preserved.
117+
118+
## Findings Format
119+
120+
Lead with concrete findings, ordered by risk:
121+
122+
```markdown
123+
## Findings
124+
125+
- **P1 Unbounded workspace load in cleanup dispatch** (`path/to/file.ts`)
126+
The new path calls `select().from(workspace)` without a limit, then builds maps for every row before dispatch. In production this scales with all active workspaces and can exhaust the app process. Page by `workspace.id` with a fixed limit and dispatch bounded chunks.
127+
128+
## Good Signals
129+
130+
- Uses `readResponseToBufferWithLimit` for external downloads.
131+
- Inline fallback processes chunks sequentially.
132+
133+
## Residual Risk
134+
135+
- The row cap is explicit, but no test currently proves the loop stops at the cap.
136+
```
137+
138+
Only say "good to go" when every changed source has explicit row, byte, and concurrency bounds or the boundedness is proven by a stable invariant.

.agents/skills/validate-integration/SKILL.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -232,13 +232,23 @@ If any tools support pagination:
232232
- [ ] Pagination response fields (`nextToken`, `cursor`, etc.) are included in tool outputs
233233
- [ ] Pagination subBlocks are set to `mode: 'advanced'`
234234

235-
## Step 7: Validate Error Handling
235+
## Step 7: Validate Memory Load Safety
236+
237+
If any tool lists, searches, exports, imports, downloads, uploads, paginates, batches, transforms arrays, or reads file/HTTP bodies, read `.agents/skills/memory-load-check/SKILL.md` and apply it to the integration.
238+
239+
- [ ] List/search tools expose API limits and do not auto-fetch every page into memory
240+
- [ ] Transform logic does not build unbounded arrays, maps, sets, or `Promise.all` fan-outs
241+
- [ ] File and HTTP body reads use explicit byte caps or existing stream-limit helpers
242+
- [ ] Large result payloads are summarized, paginated, referenced, or capped rather than raw-dumped
243+
- [ ] Pagination and download tests cover caps, early stop behavior, or partial-result preservation when relevant
244+
245+
## Step 8: Validate Error Handling
236246

237247
- [ ] `transformResponse` checks for error conditions before accessing data
238248
- [ ] Error responses include meaningful messages (not just generic "failed")
239249
- [ ] HTTP error status codes are handled (check `response.ok` or status codes)
240250

241-
## Step 8: Report and Fix
251+
## Step 9: Report and Fix
242252

243253
### Report Format
244254

@@ -297,6 +307,7 @@ After fixing, confirm:
297307
- [ ] Validated OAuth scopes use centralized utilities (getScopesForService, getCanonicalScopesForProvider) — no hardcoded arrays
298308
- [ ] Validated scope descriptions exist in `SCOPE_DESCRIPTIONS` within `lib/oauth/utils.ts` for all scopes
299309
- [ ] Validated pagination consistency across tools and block
310+
- [ ] Validated memory load safety using `.agents/skills/memory-load-check/SKILL.md` when tools list/search/download/import/export/batch data
300311
- [ ] Validated error handling (error checks, meaningful messages)
301312
- [ ] Validated registry entries (tools and block, alphabetical, correct imports)
302313
- [ ] Reported all issues grouped by severity

.github/workflows/migrations.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,4 @@ jobs:
3939
working-directory: ./packages/db
4040
env:
4141
DATABASE_URL: ${{ github.ref == 'refs/heads/main' && secrets.DATABASE_URL || github.ref == 'refs/heads/dev' && secrets.DEV_DATABASE_URL || secrets.STAGING_DATABASE_URL }}
42-
run: bunx drizzle-kit migrate --config=./drizzle.config.ts
42+
run: bun run ./scripts/migrate.ts

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,6 @@ i18n.cache
8585
.claude/worktrees/
8686
.claude/scheduled_tasks.lock
8787
.deepsec/
88+
89+
# Personal Cursor Skills
90+
.cursor/skills/ask-sim/

apps/docs/content/docs/en/self-hosting/environment-variables.mdx

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,11 +66,48 @@ import { Callout } from 'fumadocs-ui/components/callout'
6666
| `API_ENCRYPTION_KEY` | Encrypts stored API keys (32 hex chars): `openssl rand -hex 32` |
6767
| `COPILOT_API_KEY` | API key for copilot features |
6868
| `ADMIN_API_KEY` | Admin API key for GitOps operations |
69-
| `RESEND_API_KEY` | Email service for notifications |
7069
| `ALLOWED_LOGIN_DOMAINS` | Restrict signups to domains (comma-separated) |
7170
| `ALLOWED_LOGIN_EMAILS` | Restrict signups to specific emails (comma-separated) |
7271
| `DISABLE_REGISTRATION` | Set to `true` to disable new user signups |
7372

73+
## Email Providers
74+
75+
Configure one provider — the mailer auto-detects in priority order: **Resend → AWS SES → SMTP → Azure Communication Services**. If none are configured, emails are logged to the console instead.
76+
77+
| Variable | Description |
78+
|----------|-------------|
79+
| `FROM_EMAIL_ADDRESS` | Sender address (e.g. `Sim <noreply@example.com>`). Falls back to `noreply@EMAIL_DOMAIN`. |
80+
| `EMAIL_DOMAIN` | Default domain when `FROM_EMAIL_ADDRESS` is unset |
81+
| `EMAIL_VERIFICATION_ENABLED` | Set to `true` to require email verification on signup |
82+
83+
**Resend**
84+
85+
| Variable | Description |
86+
|----------|-------------|
87+
| `RESEND_API_KEY` | API key from [resend.com](https://resend.com) |
88+
89+
**AWS SES**
90+
91+
| Variable | Description |
92+
|----------|-------------|
93+
| `AWS_SES_REGION` | AWS region for SES (e.g. `us-east-1`). Credentials are resolved through the standard AWS SDK provider chain (env vars, IRSA, ECS/EC2 instance role, SSO). |
94+
95+
**SMTP** (works with MailHog, Postfix, SendGrid SMTP, etc.)
96+
97+
| Variable | Description |
98+
|----------|-------------|
99+
| `SMTP_HOST` | SMTP server hostname |
100+
| `SMTP_PORT` | `465` for implicit TLS, `587` for STARTTLS, `25` for plain |
101+
| `SMTP_USER` | Optional — omit for unauthenticated relays |
102+
| `SMTP_PASS` | Optional — omit for unauthenticated relays |
103+
| `SMTP_SECURE` | Set to `true` to force TLS on connect; auto-true on port 465 |
104+
105+
**Azure Communication Services**
106+
107+
| Variable | Description |
108+
|----------|-------------|
109+
| `AZURE_ACS_CONNECTION_STRING` | Azure Communication Services connection string |
110+
74111
## Example .env
75112

76113
```bash

apps/docs/content/docs/en/tools/azure_devops.mdx

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ Get the execution timeline for an Azure DevOps build — every stage, job, and t
280280
|`warningCount` | number | Number of warnings |
281281
|`startTime` | string | ISO 8601 start timestamp |
282282
|`finishTime` | string | ISO 8601 finish timestamp |
283-
|`failedRecords` | array | Subset of records where result === "failed" — use logId to fetch logs |
283+
|`failedRecords` | array | Subset of records where result is failed, partiallySucceeded, or succeededWithIssues — use logId to fetch logs |
284284
|`id` | string | Record GUID |
285285
|`name` | string | Step name |
286286
|`type` | string | Stage \| Phase \| Job \| Task |
@@ -333,7 +333,8 @@ Execute a WIQL query to search for work items in Azure DevOps and return full fi
333333
| --------- | ---- | ----------- |
334334
| `content` | string | Human-readable summary of matching work items |
335335
| `metadata` | object | Work items metadata |
336-
|`count` | number | Number of work items returned |
336+
|`count` | number | Number of work items returned \(after hydration\) |
337+
|`totalMatched` | number | Total number of work items matched by the WIQL query before hydration |
337338
|`workItems` | array | Array of work item details |
338339
|`id` | number | Work item ID |
339340
|`title` | string | Work item title |
@@ -372,15 +373,15 @@ Fetch full details of a single work item by ID from Azure DevOps, including titl
372373

373374
### `azure_devops_get_work_items_batch`
374375

375-
Fetch full details for multiple work items by ID from Azure DevOps in a single call. Pass comma-separated IDs (e.g.
376+
Fetch full details for multiple work items by ID from Azure DevOps. Pass comma-separated IDs (e.g.
376377

377378
#### Input
378379

379380
| Parameter | Type | Required | Description |
380381
| --------- | ---- | -------- | ----------- |
381382
| `organization` | string | Yes | Azure DevOps organization name |
382383
| `project` | string | Yes | Azure DevOps project name |
383-
| `ids` | string | Yes | Comma-separated work item IDs to fetch \(e.g. "123,456,789"\). Maximum 200 IDs. |
384+
| `ids` | string | Yes | Comma-separated work item IDs to fetch \(e.g. "123,456,789"\). Lists longer than 200 IDs are chunked automatically. |
384385

385386
#### Output
386387

@@ -389,6 +390,7 @@ Fetch full details for multiple work items by ID from Azure DevOps in a single c
389390
| `content` | string | Human-readable summary of the fetched work items |
390391
| `metadata` | object | Work items metadata |
391392
|`count` | number | Number of work items returned |
393+
|`totalRequested` | number | Total number of IDs requested \(across all chunks\) |
392394
|`workItems` | array | Array of work item details |
393395
|`id` | number | Work item ID |
394396
|`title` | string | Work item title |

apps/docs/content/docs/en/tools/meta.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,6 @@
195195
"upstash",
196196
"vercel",
197197
"video_generator",
198-
"vision",
199198
"wealthbox",
200199
"webflow",
201200
"whatsapp",

apps/docs/content/docs/en/tools/vision.mdx

Lines changed: 0 additions & 60 deletions
This file was deleted.

apps/docs/content/docs/en/triggers/azure_devops.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Trigger workflow when an Azure DevOps build fails, is canceled, or partially suc
3131
| `branch` | string | Source branch name \(refs/heads/ prefix stripped\) |
3232
| `commitSha` | string | Source commit SHA |
3333
| `triggeredBy` | string | Display name of the person who triggered the build |
34-
| `triggeredByEmail` | string | Email/unique name of the person who triggered the build |
34+
| `triggeredByEmail` | string | Email/unique name of the person who triggered the build, or null if not set |
3535
| `startTime` | string | Build start time \(ISO 8601\) |
3636
| `finishTime` | string | Build finish time \(ISO 8601\) |
3737
| `buildUrl` | string | API URL for the build resource |
@@ -72,12 +72,12 @@ Trigger workflow when a work item is created in Azure DevOps
7272
| `workItemType` | string | Work item type for Basic process \(e.g. Issue, Task, Epic\) |
7373
| `title` | string | Work item title |
7474
| `state` | string | Work item state for Basic process \(e.g. To Do, Doing, Done\) |
75-
| `createdBy` | string | Display name of the creator |
76-
| `assignedTo` | string | Assignee display name, or empty string if unassigned |
75+
| `createdBy` | string | Display name of the creator, or null if not set |
76+
| `assignedTo` | string | Assignee display name, or null if unassigned |
7777
| `priority` | number | Priority \(1–4\), or 0 if not set |
7878
| `areaPath` | string | Area path |
7979
| `iterationPath` | string | Iteration path |
80-
| `description` | string | Work item description \(HTML\), or empty string if not set |
80+
| `description` | string | Work item description \(HTML\), or null if not set |
8181
| `projectName` | string | Azure DevOps project name |
8282
| `workItemUrl` | string | API URL for the work item resource |
8383

0 commit comments

Comments
 (0)