Summary
tinker checkpoint delete parallelizes server requests with ThreadPoolExecutor(max_workers=32) (cli/commands/checkpoint.py:919, 937-958). For large delete jobs this is slower than a serial loop — both in throughput and per-call latency — by about 2×. So the CLI's default concurrency hurts the workflow it's optimized for (cleaning up many checkpoints at once).
Reproduction
I had ~728 step-numbered LoRA checkpoints to clean up. Took two disjoint slices of 50 paths and timed deletion via the SDK:
from tinker import ServiceClient
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
client = ServiceClient().create_rest_client()
def delete_one(p):
client.delete_checkpoint_from_tinker_path(p).result()
# Parallel (workers=32, matching the CLI default)
t0 = time.time()
with ThreadPoolExecutor(max_workers=32) as pool:
for f in as_completed([pool.submit(delete_one, p) for p in sample_a]):
f.result()
print("parallel:", time.time() - t0)
# Serial
t0 = time.time()
for p in sample_b:
delete_one(p)
print("serial:", time.time() - t0)
| Mode |
n |
elapsed |
rate |
avg s/delete |
| Parallel (workers=32) |
50 |
19.90s |
2.51/s |
0.398s |
| Serial (workers=1) |
50 |
10.47s |
4.78/s |
0.209s |
This isn't just lower throughput — each individual request is ~2× slower under concurrent load. Consistent with server-side contention (lock / serialization of the delete path on the backend).
Sustained rate confirmed over a follow-on cleanup of 600+ more paths:
[50/631] ok=50 elapsed=10s rate=4.91/s
[150/631] ok=150 elapsed=30s rate=5.01/s
[250/631] ok=250 elapsed=48s rate=5.17/s
Steady ~5/s serial → ~125s for 628 deletes. Under the CLI default this would be ~250s.
Versions
- tinker SDK: 0.21.0 (CLI bundled)
- Python 3.13.5
Suggested fix
Either:
- Lower
_DELETE_CONCURRENCY in tinker/cli/commands/checkpoint.py:919 — based on the numbers above, 1 looks better than 32 here. Maybe benchmark at 2, 4, 8 to find the actual sweet spot — but 1 is already strictly better than the current default.
- Investigate the server-side contention — a ~2× per-request slowdown under 32-way concurrency suggests a hot lock that could be relaxed; if so the CLI's parallelism would be worth keeping.
- Expose
--concurrency as a CLI flag so users can pick. Reasonable middle ground.
The current behavior is also confusing because intuition says "more workers = faster", but the opposite is true. Wasted ~2 minutes of my cleanup time before I read the CLI source and re-tested serially.
Discovered by clement-dumas with Claude Code (Opus 4.7).
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com
Summary
tinker checkpoint deleteparallelizes server requests withThreadPoolExecutor(max_workers=32)(cli/commands/checkpoint.py:919, 937-958). For large delete jobs this is slower than a serial loop — both in throughput and per-call latency — by about 2×. So the CLI's default concurrency hurts the workflow it's optimized for (cleaning up many checkpoints at once).Reproduction
I had ~728 step-numbered LoRA checkpoints to clean up. Took two disjoint slices of 50 paths and timed deletion via the SDK:
This isn't just lower throughput — each individual request is ~2× slower under concurrent load. Consistent with server-side contention (lock / serialization of the delete path on the backend).
Sustained rate confirmed over a follow-on cleanup of 600+ more paths:
Steady ~5/s serial → ~125s for 628 deletes. Under the CLI default this would be ~250s.
Versions
Suggested fix
Either:
_DELETE_CONCURRENCYintinker/cli/commands/checkpoint.py:919— based on the numbers above, 1 looks better than 32 here. Maybe benchmark at 2, 4, 8 to find the actual sweet spot — but1is already strictly better than the current default.--concurrencyas a CLI flag so users can pick. Reasonable middle ground.The current behavior is also confusing because intuition says "more workers = faster", but the opposite is true. Wasted ~2 minutes of my cleanup time before I read the CLI source and re-tested serially.
Discovered by clement-dumas with Claude Code (Opus 4.7).
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com