fix(server): claude.json corruption when /home/automaker tmpfs fills (64M cap)

## Symptom

Agents fail with **"Authentication failed: Invalid or expired API key"** even though OAuth credentials are valid. Subsequent `claude -p` invocations show:

```
Claude configuration file at /home/automaker/.claude.json is corrupted: JSON Parse error: Unterminated string
```

This persists across container restarts and resists every recovery attempt — restoring backups, writing fresh config, copying from the host. Every claude invocation reports the file as corrupted within seconds.

## Root cause

`/home/automaker` is a **tmpfs capped at 64 MB** in the staging compose. The `.npm` cache fills it (~64M of npm content-addressable cache), leaving zero free bytes. Every subsequent write to `/home/automaker/.claude.json` is silently truncated by the kernel:

```
$ docker exec automaker-server df -h /home/automaker
Filesystem      Size  Used Avail Use%
tmpfs            64M   64M     0 100%

$ docker exec -i automaker-server sh -c 'cat > /home/automaker/.claude.json' < good-file.json
cat: write error: No space left on device
```

The Claude CLI reads the truncated file, sees invalid JSON, and reports "corruption" — masking the real disk-full failure.

## Impact

- Agents fail dispatch with confusing auth errors (the credentials are fine; the file just got truncated mid-write)
- Auto-mode keeps respawning failed agents, each one further corrupting the shared `.claude.json`
- "Recovery" by writing a clean file appears to work (`cat` shows valid JSON briefly) but the next `claude` invocation fails again
- Diagnosis is difficult — `df` is the only signal, and it's not in any standard health check

## Reproduction

1. Run automaker-staging with the published staging compose
2. Let auto-mode dispatch ~10–20 agents
3. Observe `/home/automaker` tmpfs fills with `.npm/_cacache` content
4. All subsequent claude invocations fail with "corrupted" errors

## Suggested fixes

**Pick one or layer them:**

1. **Bump the tmpfs size** — 64M is far too small with the npm cache living there. 512M or 1G would prevent this entirely.
2. **Move `.npm` cache off tmpfs** — set `NPM_CONFIG_CACHE=/home/automaker/.cache/npm` (which is on a real volume), or to `/tmp/npm-cache`.
3. **Move `.claude.json` to the persistent `.claude/` volume** — the file lives at `/home/automaker/.claude.json` (next to `.claude/`, not inside it). If it lived inside `.claude/` it would be on the persistent volume and survive disk pressure.
4. **Add a startup health check** that warns when `/home/automaker` is >80% full.
5. **Make agent-failure error messages distinguish between "credentials invalid" and "credentials file unreadable / truncated"** — currently both manifest as "Invalid or expired API key" which sends operators down the wrong rabbit hole.

## Workaround (immediate, applied to ava staging)

```bash
docker exec automaker-server find /home/automaker/.npm/_cacache -mindepth 1 -delete
docker exec -i automaker-server sh -c 'cat > /home/automaker/.claude/.credentials.json' < ~/.claude/.credentials.json
```

This frees ~36M and re-installs OAuth credentials. Agents resume working until the cache fills again.

## Discovery

Found while debugging mythxengine MYTHX-4 / MYTHX-5 / MYTHX-6 dispatch failures on 2026-05-06. Three agents in a row failed with auth errors despite a fresh OAuth token.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(server): claude.json corruption when /home/automaker tmpfs fills (64M cap) #3564

Symptom

Root cause

Impact

Reproduction

Suggested fixes

Workaround (immediate, applied to ava staging)

Discovery

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

fix(server): claude.json corruption when /home/automaker tmpfs fills (64M cap) #3564

Description

Symptom

Root cause

Impact

Reproduction

Suggested fixes

Workaround (immediate, applied to ava staging)

Discovery

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions