Skip to content

[pull] main from tldraw:main#487

Merged
pull[bot] merged 6 commits intocode:mainfrom
tldraw:main
Apr 8, 2026
Merged

[pull] main from tldraw:main#487
pull[bot] merged 6 commits intocode:mainfrom
tldraw:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 8, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

AniKrisn and others added 6 commits April 8, 2026 10:28
In order to make `Cmd+Shift+V` work consistently across clipboard
contents, this PR makes the plain-text paste shortcut fall back to the
normal paste behavior when there's nothing pasteable as text on the
clipboard. Follow-up to #8347, which introduced the shortcut but
silently no-op'd when the clipboard contained a PNG (or any other file
with no `text/plain` data).

Before this fix:

- `Cmd+Shift+V` on a copied PNG → did nothing
- `Cmd+Shift+V` on a copied SVG (text) → pasted the SVG as text ✓

After:

- `Cmd+Shift+V` on a copied PNG → pastes the PNG (falling back to the
regular paste flow)
- `Cmd+Shift+V` on a copied SVG (text) → still pastes the SVG as text ✓

The change is small: previously, the plain-text branch always called
`preventDefault` and returned, even when there was no text to paste. Now
it only short-circuits when it actually handled the text — otherwise it
falls through to the regular paste handler below, which already knows
how to paste files.

### Change type

- [x] `bugfix`

### Test plan

1. Copy a PNG image (e.g., from Finder, or right-click → Copy Image on a
webpage)
2. `Cmd+Shift+V` in tldraw — verify the PNG is pasted onto the canvas
3. Copy an SVG as text from a webpage (or copy SVG markup)
4. `Cmd+Shift+V` in tldraw — verify the SVG still pastes as an SVG shape
(existing behavior)
5. Copy plain text from a webpage
6. `Cmd+Shift+V` — verify it still pastes as plain text without HTML
formatting

- [ ] Unit tests
- [ ] End to end tests

### Release notes

- Fixed `Cmd+Shift+V` doing nothing when the clipboard contained an
image (e.g., a PNG). It now falls back to pasting the image.

### Code changes

| Section   | LOC change |
| --------- | ---------- |
| Core code | +6 / -4    |
This PR to allow users to override SharePanel sub components as
suggested in #8288

Also updated the doc in order to list all TLUIcomponents and the props
types they rely on (it's more polite)!

### API changes
- Added `PeopleMenu`, `PeopleMenuItem`, `PeopleMenuFacePile` and
`UserPresenceEditor` to the list of components users can override via
the <Tldraw /> `components` property. This allows users to override
these components and customise how tldraw displays the list of users
currently editing a canvas.

### Change type

- [ ] `bugfix`
- [x] `improvement`
- [ ] `feature`
- [x] `api`
- [ ] `other`

### Release notes

- SharePanel components can now be overridden via the Tldraw component
API

---------

Co-authored-by: Steve Ruiz <steveruizok@gmail.com>
Enables OpenTelemetry metrics export from Zero Cache (replication
manager + view syncer) to Grafana Cloud, and adds a combined postgres
health check endpoint to the sync worker. Motivated by the April 7
incident where disk exhaustion from unbounded changelog growth caused a
2-hour outage with no prior alerting.

### OTEL setup

Zero natively supports OTEL — this PR configures the env vars so metrics
flow to Grafana Cloud on staging and production deploys (not previews).

**Fly.io template env vars** (non-secret, baked into toml):
- `OTEL_NODE_RESOURCE_DETECTORS=env,host,os`
-
`OTEL_RESOURCE_ATTRIBUTES=service.name=zero-rm|zero-vs,deployment.environment=<env>,service.version=<version>`

**Fly.io secrets** (set via `flyctl secrets set`):
- `OTEL_EXPORTER_OTLP_ENDPOINT` — Grafana Cloud OTLP endpoint
- `OTEL_EXPORTER_OTLP_HEADERS` — auth header with Grafana Cloud API
token

**CI secrets to add** (used by deploy script):
- `ZERO_OTEL_EXPORTER_OTLP_ENDPOINT` — e.g.
`https://otlp-gateway-prod-eu-west-2.grafana.net/otlp`
- `ZERO_OTEL_EXPORTER_OTLP_HEADERS` — e.g. `Authorization=Basic
<base64(instanceId:apiToken)>`

### Grafana Cloud setup

1. Go to **Connections → OpenTelemetry → JavaScript** in Grafana Cloud
2. Copy the OTLP endpoint URL and generate an API token
3. The `OTEL_EXPORTER_OTLP_HEADERS` value is `Authorization=Basic
<base64>` where the base64 is `<instanceId>:<apiToken>`
4. Add both values as CI secrets (`ZERO_OTEL_EXPORTER_OTLP_ENDPOINT`,
`ZERO_OTEL_EXPORTER_OTLP_HEADERS`) in both `deploy-staging` and
`deploy-production` environments
5. After deploying, verify metrics appear under **Explore → Metrics**
with prefix `zero.*`

### Metrics we get (built into Zero)

- `zero.replication`: upstream_lag, replica_lag, total_lag, events,
transactions
- `zero.replica`: db_size, wal_size, backup_lag
- `zero.sync`: active_clients, hydration times, poke times, CVR flush
times
- `zero.mutation`: crud, custom, push counters
- `zero.server`: uptime

### Health check endpoint

New combined endpoint `GET /health-check/postgres` runs 5 sub-checks in
a single request (to minimize updown.io invocation costs):

| Sub-check | What it detects | Threshold env var |
|---|---|---|
| db-size | Database approaching disk limit |
`HEALTH_CHECK_DB_SIZE_THRESHOLD_GB` (staging: 4, prod: 10) |
| changelog-size | Unbounded Zero CDC changelog growth |
`HEALTH_CHECK_CHANGELOG_SIZE_THRESHOLD_MB` (default: 1024) |
| wal-size | WAL retention from lagging replication slots |
`HEALTH_CHECK_WAL_SIZE_THRESHOLD_MB` (staging: 1024, prod: 2048) |
| replication-slots | Invalidated zero/tlpr slots (`lost`/`unreserved`)
| — |
| tlpr-replicator | tlpr slot missing or inactive | — |

Response format:
- **200**: `ok (db: 2.50 GB, changelog: 80 MB, wal: 200 MB, slots: 3 ok,
tlpr: active)`
- **500**: `FAIL db-size: 4.50 GB > 4 GB threshold; wal-size: zero_slot:
1500 MB > 1024 MB threshold`

The failure details appear in updown.io alert notifications, so each
sub-check failure is distinguishable.

### Updown.io / BetterStack monitors to add

**Staging:**
- `https://staging.tldraw.com/api/health-check/postgres`

**Production:**
- `https://www.tldraw.com/api/health-check/postgres`

Both require the `Authorization: Bearer <HEALTH_CHECK_BEARER_TOKEN>`
header.

### Manual steps after merge

1. Set up Grafana Cloud account and get OTLP credentials (see above)
2. Add `ZERO_OTEL_EXPORTER_OTLP_ENDPOINT` and
`ZERO_OTEL_EXPORTER_OTLP_HEADERS` as CI secrets in both `deploy-staging`
and `deploy-production` environments
3. Deploy to staging, verify metrics in Grafana
4. Add updown.io monitors for the URLs above (with bearer token header)
5. Deploy to production
6. Create Grafana dashboard for Zero Cache health (replication lag,
replica size, active clients, etc.)
7. Set up Grafana alerts: total_lag > 30s (warn) / > 120s (critical),
backup_lag > 5min

### Change type

- [x] `improvement`

### Test plan

1. Ran `curl localhost:3000/api/health-check/postgres` locally —
confirmed sub-checks execute and report correctly
2. Verified changelog table reference (`"zero_0/cdc"."changeLog"`)
exists in local postgres
3. OTEL metrics export requires Grafana Cloud credentials — will verify
on staging deploy

### Code changes

| Section | LOC change |
|---|---|
| Apps | +145 / -0 |
| Config/tooling | +12 / -0 |
Updates `@rocicorp/zero` from `1.0.0` to `1.2.0` across all four
packages. No breaking changes in either 1.1 or 1.2.

### Change type

- [x] `improvement`

### Test plan

1. `yarn typecheck` passes
2. `yarn lint` passes
3. Deploy preview with zero-cache

- [ ] Unit tests
- [ ] End to end tests

### Code changes

| Section        | LOC change   |
| -------------- | ------------ |
| Apps           | +4 / -4      |
| Core code      | +1 / -1      |
| Config/tooling | +216 / -12   |
The OTEL secrets were added to GitHub environment secrets but the deploy
workflow wasn't passing them to the deploy step, causing `makeEnv` to
throw on every deploy.

Relates to #8491

### Change type

- [x] `bugfix`

### Test plan

1. Deploy to staging should no longer fail with `Missing environment
variables: ZERO_OTEL_EXPORTER_OTLP_ENDPOINT,
ZERO_OTEL_EXPORTER_OTLP_HEADERS`

### Code changes

| Section | LOC change |
|---|---|
| Config/tooling | +2 / -0 |
Add `service.namespace=dotcom` to `OTEL_RESOURCE_ATTRIBUTES` on both the
replication manager and view syncer Fly.io templates. This is needed for
Grafana Cloud's Application Observability setup to match our services.

### Change type

- [x] `improvement`

### Test plan

1. Deploy to staging
2. Verify Grafana Cloud test connection passes with
`service.namespace=dotcom`

### Code changes

| Section | LOC change |
| ------- | ---------- |
| Apps    | +2 / -2    |
@pull pull Bot locked and limited conversation to collaborators Apr 8, 2026
@pull pull Bot added the ⤵️ pull label Apr 8, 2026
@pull pull Bot merged commit 4677565 into code:main Apr 8, 2026
@pull pull Bot had a problem deploying to deploy-production April 8, 2026 15:13 Failure
@pull pull Bot had a problem deploying to deploy-staging April 8, 2026 15:13 Failure
@pull pull Bot had a problem deploying to bemo-canary April 8, 2026 15:13 Failure
@pull pull Bot had a problem deploying to vsce publish April 8, 2026 15:13 Failure
@pull pull Bot had a problem deploying to deploy-staging April 8, 2026 15:13 Error
@pull pull Bot had a problem deploying to bemo-canary April 8, 2026 15:14 Failure
@pull pull Bot had a problem deploying to deploy-staging April 9, 2026 00:30 Failure
@pull pull Bot temporarily deployed to e2e-dotcom April 9, 2026 02:36 Inactive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants