Skip to content

Commit 2cb48f1

Browse files
Add shard-count suggestion and render knobs used by Claude skills
The skills claimed manifest analyze emitted `suggested_shard_count`, that slurm render picked up a rebased manifest automatically, and that render could be run without a local rclone.conf. None of those matched the CLI. Make the CLI match the documented behavior: * `xfer manifest analyze` now emits `suggested_shard_count`, `shard_count_reasoning`, and `shard_count_assumptions` based on a 10 TiB per-shard cap, `4 * array_concurrency` slack, and an optional core budget. New `--assumed-*` / `--max-shard-bytes-tb` flags let the user sharpen the suggestion. * `xfer slurm render` gains `--manifest`, so rebased manifests can be consumed without clobbering `run/manifest.jsonl`, and its `--rclone-config` no longer requires a local file (it warns instead). * Skill docs updated to match, plus pytest coverage for the shard-count heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4e677c9 commit 2cb48f1

9 files changed

Lines changed: 316 additions & 35 deletions

File tree

.claude/skills/xfer-manifest-analyze/SKILL.md

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,39 @@ uv run xfer manifest analyze \
2323
--out run/analyze.json
2424
```
2525

26-
Optionally, if the user already has a preferred base set of rclone flags they want to layer on top, pass `--base-flags "<flags>"`.
26+
Optional flags that tune the **shard-count suggestion** (not the rclone flag suggestion):
27+
28+
| Flag | Default | What it does |
29+
| ------------------------------- | ------- | --------------------------------------------------------------------------- |
30+
| `--assumed-cpus-per-task` | `4` | Cores each worker will request. Matches `xfer slurm render` default. |
31+
| `--assumed-array-concurrency` | `64` | Expected Slurm array concurrency. Matches `xfer slurm render` default. |
32+
| `--assumed-core-budget` | unset | Total cores the partition will make available (supply from `sinfo`). |
33+
| `--max-shard-bytes-tb` | `10` | Per-shard byte cap. No single shard should carry more than this. |
34+
| `--base-flags "<flags>"` || Prepend the user's preferred rclone flags to the suggested ones. |
35+
36+
If the user already knows the transfer cluster's available core budget, pass it — the shard-count suggestion will be sharper. Otherwise the default (concurrency + bytes-only) is fine.
2737

2838
## Step 3 — Report
2939

3040
Read `run/analyze.json` and report to the user:
3141

32-
1. **Dataset shape**: total object count, total bytes, median size, p90/p99 sizes, and the histogram bin counts (power-of-2 edges).
42+
1. **Dataset shape**: total object count, total bytes, median size, p10/p90 sizes, and the histogram bin counts (power-of-2 edges).
3343
2. **Profile classification**: which profile the analyzer picked (`small_files`, `large_files`, or `mixed`) and the reasoning (e.g., ">70% of objects are under 1 MiB").
34-
3. **Suggested rclone flags**: the concrete string to pass to `--rclone-flags` for render. Typical examples:
44+
3. **Suggested rclone flags** (`suggested_flags`): the concrete string to pass to `--rclone-flags` for render. Typical examples:
3545
- small_files → `--transfers 64 --checkers 128 --fast-list`
3646
- large_files → `--transfers 16 --checkers 32 --buffer-size 256M`
3747
- mixed → `--transfers 32 --checkers 64 --fast-list`
38-
4. **Suggested shard count**: derive from total object count / target objects-per-shard (aim for ~10k–50k objects per shard for small-file workloads, smaller for large-file). Cap at what the chosen transfer cluster can reasonably host in a job array. If you don't yet know the transfer cluster, give a range and defer to `xfer-manifest-shard`.
48+
4. **Suggested shard count** (`suggested_shard_count`, plus `shard_count_reasoning` and `shard_count_assumptions`). The heuristic:
49+
- If `total_bytes` is below the per-shard cap (default 10 TiB), **1 shard** — don't shard small datasets.
50+
- Otherwise `ceil(total_bytes / cap)` shards, upper-bounded by `4 × array_concurrency` and (if a core budget was supplied) `core_budget // cpus_per_task`.
51+
52+
Quote `shard_count_reasoning` verbatim back to the user so they can see the trade-offs.
3953

4054
## Step 4 — Persist for downstream skills
4155

42-
`run/analyze.json` is the source of truth for flag/shard decisions. `xfer-manifest-shard` and `xfer-slurm-render` both read it. Don't re-derive flags by hand in those skills — point at this file.
56+
`run/analyze.json` is the source of truth for flag/shard decisions. `xfer-manifest-shard` reads `suggested_shard_count` and `xfer-slurm-render` reads `suggested_flags` — point at this file, don't re-derive.
57+
58+
If the user's plan changes (different transfer cluster, different concurrency cap), re-run `xfer manifest analyze` with updated `--assumed-*` flags before calling `xfer-manifest-shard`.
4359

4460
## After this skill
4561

.claude/skills/xfer-manifest-rebase/SKILL.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@ Always write to a new file (don't overwrite `manifest.jsonl`). Keeping the origi
4040

4141
## Step 3 — Re-shard
4242

43-
Sharding is derived from the manifest, so **re-shard after rebasing**:
43+
Sharding is derived from the manifest, so **re-shard after rebasing**. The existing `run/shards/` directory contains pre-rebase paths and must be replaced.
44+
45+
Confirm with the user before removing the old shards (`run/shards` is small but removing it is irreversible locally):
4446

4547
```bash
4648
rm -rf run/shards
@@ -50,16 +52,26 @@ uv run xfer manifest shard \
5052
--num-shards <same-N-as-before>
5153
```
5254

53-
(Or invoke `xfer-manifest-shard`.) Byte balance won't change meaningfully, but the shard files need to carry the rebased paths or workers will try to copy from the wrong URI.
55+
(Or invoke `xfer-manifest-shard` with the rebased manifest as input.) Byte balance won't change meaningfully, but the shard files need to carry the rebased paths or workers will try to copy from the wrong URI.
56+
57+
## Step 4 — Point `xfer slurm render` at the rebased manifest
58+
59+
`xfer slurm render` reads `source_root` and `dest_root` from a manifest file. By default it reads `<run_dir>/manifest.jsonl`, which is intentionally left at the pre-rebase vantage point as an audit record. Pass `--manifest` to read the rebased file instead:
5460

55-
## Step 4 — Point downstream skills at the rebased manifest
61+
```bash
62+
uv run xfer slurm render \
63+
--run-dir run \
64+
--manifest run/manifest.rebased.jsonl \
65+
...
66+
```
5667

57-
When you invoke `xfer-slurm-render` next, pass `--run-dir run` but ensure `config.resolved.json` references the rebased manifest. If the user plans a fresh `xfer slurm render`, that's automatic (render reads from `run/shards/`).
68+
Without `--manifest`, render would use the original roots and every array task would target the wrong URI.
5869

5970
## Safety
6071

6172
- Never delete the original manifest — always keep `run/manifest.jsonl` as an audit trail alongside `run/manifest.rebased.jsonl`.
6273
- Rebase is a remap, not a content migration. It does not move data. It only relabels what each shard points to.
74+
- Confirm before `rm -rf run/shards` — the user may want to move the old shards aside rather than delete them.
6375

6476
## After this skill
6577

.claude/skills/xfer-manifest-shard/SKILL.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,22 +13,21 @@ Runs **locally on the workstation**. Pure file processing. No Slurm/SSH needed.
1313

1414
## Step 1 — Read the analyze output
1515

16-
Read `run/analyze.json` (from `xfer-manifest-analyze`). Use its `suggested_shard_count` / profile as the starting point. If analyze hasn't been run yet, invoke `xfer-manifest-analyze` first — don't guess shard counts from the raw manifest.
16+
Read `run/analyze.json` (from `xfer-manifest-analyze`) and use `suggested_shard_count` directly as the shard count. The analyzer already factors in the 10 TiB/shard cap, the expected array concurrency, and (if supplied) the core budget.
1717

18-
## Step 2 — Reconcile shard count with cluster resources
18+
If `run/analyze.json` doesn't exist yet, invoke `xfer-manifest-analyze` first — don't guess shard counts from the raw manifest.
1919

20-
The right shard count depends on **both** rclone settings (from analyze) **and** the transfer cluster's available resources. Ask the user:
20+
## Step 2 — Decide whether to override
2121

22-
1. Which cluster will run the transfer? (Same as build host, or different?)
23-
2. What's the target array concurrency — how many shards should run at once? Typical range: 32–256, capped by the partition's `MaxArraySize` and the throughput both S3 endpoints can handle.
24-
3. What's the partition's per-node core/memory budget?
22+
Only override `suggested_shard_count` if one of the inputs that fed it has changed since analyze ran:
2523

26-
Rule of thumb:
27-
- Total shards ≈ max(suggested_shard_count_from_analyze, 4 × array_concurrency). This gives the scheduler enough slack to keep the array fully packed even as slow shards trail.
28-
- For small-files profiles (heavy listing, light bytes), bias toward **more shards** of fewer objects each.
29-
- For large-files profiles (heavy bytes, few objects), bias toward **fewer shards** with more bytes each — byte-balancing matters more than object count.
24+
- The transfer cluster is different from what analyze assumed (different core budget).
25+
- The array concurrency cap is different from what analyze assumed (defaults: `--assumed-array-concurrency=64`).
26+
- The user wants a different per-shard byte cap (default 10 TiB).
3027

31-
State your recommendation with the reasoning, then confirm before running.
28+
In that case, **re-run `xfer-manifest-analyze`** with updated `--assumed-*` flags rather than hand-picking a new number here. Sharing the reasoning/assumptions via `run/analyze.json` is how downstream skills stay coherent.
29+
30+
Show the user `suggested_shard_count` alongside `shard_count_reasoning` from analyze, then confirm before running.
3231

3332
## Step 3 — Run shard
3433

.claude/skills/xfer-slurm-render/SKILL.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -57,16 +57,19 @@ Collect from the user (with defaults from `run/analyze.json` and the chosen part
5757
| `--rclone-image` | `rclone/rclone:latest` |
5858
| `--rclone-config` | absolute path to rclone.conf **on the transfer cluster's compute nodes** (see note below) |
5959
| `--rclone-flags` | `suggested_flags` from `run/analyze.json` |
60-
| `--max-attempts` | 3 (default) |
60+
| `--max-attempts` | 5 (default) |
6161
| `--sbatch-extras` | site-specific `--account=...`, `--qos=...`, etc. |
6262
| `--pyxis-extra` | extra `srun --container-*` flags if site requires them |
63+
| `--manifest` | optional; path to a specific manifest (pass `run/manifest.rebased.jsonl` if the rebase skill ran) |
6364

64-
The `--rclone-config` path is baked into `sbatch_array.sh` and resolved **on the transfer cluster at job time**, not on the workstation. It must:
65+
The `--rclone-config` path is baked into `sbatch_array.sh` and resolved **on the transfer cluster at job time**, not at render time. Render itself no longer requires the file to exist on the workstation — it only prints a warning if the local path is missing, since the actual consumer is the compute node. Still:
6566

66-
- Be an absolute path valid on that cluster's compute nodes (home dirs and shared paths differ between sites).
67-
- Exist with `0600` permissions before the job starts.
67+
- The path must be an absolute path valid on the cluster's compute nodes.
68+
- It must exist with `0600` permissions **on the cluster** before the job starts.
6869

69-
If the user doesn't already have the config deployed to this cluster at a known path, **stop and invoke `xfer-rclone-config`** to set it up and record the path. Don't guess the path — a wrong value here means every array task will fail identically at container start.
70+
If the user doesn't already have the config deployed to the cluster at a known path, invoke `xfer-rclone-config` to set it up and record the path. A wrong value here means every array task will fail identically at container start, so double-check the path.
71+
72+
Use `--manifest` whenever the user ran `xfer-manifest-rebase`: render reads `source_root` / `dest_root` from a manifest file, and the default path (`<run_dir>/manifest.jsonl`) is intentionally left at the pre-rebase vantage point as an audit record. Passing `--manifest run/manifest.rebased.jsonl` is how render picks up the rebased roots.
7073

7174
## Step 4 — Render
7275

@@ -83,9 +86,10 @@ uv run xfer slurm render \
8386
--rclone-image <image> \
8487
--rclone-config <path-on-cluster> \
8588
--rclone-flags "<flags-from-analyze>" \
86-
--max-attempts 3 \
89+
--max-attempts 5 \
8790
--sbatch-extras '<multi-line sbatch directives>' \
88-
--pyxis-extra '<extra pyxis flags>'
91+
--pyxis-extra '<extra pyxis flags>' \
92+
[--manifest run/manifest.rebased.jsonl] # only after rebase
8993
```
9094

9195
## Step 5 — Verify the outputs

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,9 @@ select = ["E", "F", "I", "B", "UP"]
3030
dev = [
3131
"black>=26.1.0",
3232
"pre-commit>=4.5.1",
33+
"pytest>=8.0.0",
3334
]
3435

36+
[tool.pytest.ini_options]
37+
testpaths = ["tests"]
38+

src/xfer/cli.py

Lines changed: 48 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -467,6 +467,25 @@ def manifest_analyze(
467467
"--retries 10 --low-level-retries 20 --stats 600s --progress",
468468
help="Base rclone flags to always include",
469469
),
470+
assumed_cpus_per_task: int = typer.Option(
471+
4,
472+
min=1,
473+
help="Cores per shard worker (matches slurm_render default). Used only for shard-count suggestion.",
474+
),
475+
assumed_array_concurrency: int = typer.Option(
476+
64,
477+
min=1,
478+
help="Expected Slurm array concurrency (matches slurm_render default). Used only for shard-count suggestion.",
479+
),
480+
assumed_core_budget: Optional[int] = typer.Option(
481+
None,
482+
help="Total cores the transfer cluster's partition will make available. Used only for shard-count suggestion. If omitted, the core constraint is skipped.",
483+
),
484+
max_shard_bytes_tb: int = typer.Option(
485+
10,
486+
min=1,
487+
help="Per-shard byte cap in TiB (no single shard should exceed this).",
488+
),
470489
) -> None:
471490
"""
472491
Analyze manifest file sizes and suggest optimal rclone flags.
@@ -481,6 +500,7 @@ def manifest_analyze(
481500
format_histogram_data,
482501
human_bytes,
483502
suggest_rclone_flags_from_sizes,
503+
suggest_shard_count,
484504
)
485505

486506
# Read manifest and extract sizes
@@ -505,6 +525,13 @@ def manifest_analyze(
505525
stats = compute_file_size_stats(sizes)
506526
suggestion = suggest_rclone_flags_from_sizes(sizes)
507527
histogram = format_histogram_data(sizes)
528+
shard_suggestion = suggest_shard_count(
529+
stats.total_bytes,
530+
cpus_per_task=assumed_cpus_per_task,
531+
array_concurrency=assumed_array_concurrency,
532+
core_budget=assumed_core_budget,
533+
max_shard_bytes_tb=max_shard_bytes_tb,
534+
)
508535

509536
# Combine base flags with profile-specific flags
510537
combined_flags = f"{suggestion.flags} {base_flags}"
@@ -528,6 +555,9 @@ def manifest_analyze(
528555
"profile": suggestion.profile,
529556
"profile_explanation": suggestion.explanation,
530557
"suggested_flags": combined_flags,
558+
"suggested_shard_count": shard_suggestion.num_shards,
559+
"shard_count_reasoning": shard_suggestion.reasoning,
560+
"shard_count_assumptions": shard_suggestion.assumptions,
531561
"histogram": histogram,
532562
}
533563

@@ -882,9 +912,7 @@ def slurm_render(
882912
rclone_image: str = typer.Option(..., help="Container image containing rclone"),
883913
rclone_config: Path = typer.Option(
884914
...,
885-
exists=True,
886-
dir_okay=False,
887-
help="Host path to rclone.conf",
915+
help="Absolute path to rclone.conf on the transfer cluster's compute nodes. Not required to exist on this host; a warning is emitted if it is missing locally.",
888916
resolve_path=True,
889917
),
890918
rclone_conf_in_container: str = typer.Option(
@@ -911,6 +939,11 @@ def slurm_render(
911939
pyxis_extra: str = typer.Option(
912940
"", help="Extra pyxis flags (string placed after --container-mounts...)"
913941
),
942+
manifest: Optional[Path] = typer.Option(
943+
None,
944+
help="Manifest JSONL to read source/dest_root from. Defaults to <run_dir>/manifest.jsonl. Use this after `xfer manifest rebase` to point render at the rebased file.",
945+
resolve_path=True,
946+
),
914947
) -> None:
915948
"""
916949
Render worker.sh, sbatch_array.sh, and submit.sh under run_dir.
@@ -919,12 +952,19 @@ def slurm_render(
919952
mkdirp(run_dir / "logs")
920953
mkdirp(run_dir / "state")
921954

922-
# If source/dest not provided, try to read first line of manifest.jsonl (if present)
955+
if not rclone_config.exists():
956+
eprint(
957+
f"WARNING: rclone_config {rclone_config} does not exist on this host. "
958+
"That is fine if the path is valid on the transfer cluster's compute nodes. "
959+
"Verify before submitting."
960+
)
961+
962+
# If source/dest not provided, try to read first line of the manifest (if present)
923963
if source_root is None or dest_root is None:
924-
manifest = run_dir / "manifest.jsonl"
925-
if manifest.exists():
964+
manifest_path = manifest or (run_dir / "manifest.jsonl")
965+
if manifest_path.exists():
926966
first = None
927-
for ln in manifest.read_text(encoding="utf-8").splitlines():
967+
for ln in manifest_path.read_text(encoding="utf-8").splitlines():
928968
if ln.strip():
929969
first = json.loads(ln)
930970
break
@@ -934,7 +974,7 @@ def slurm_render(
934974

935975
if not source_root or not dest_root:
936976
raise typer.BadParameter(
937-
"source_root/dest_root not set and could not be read from run_dir/manifest.jsonl"
977+
"source_root/dest_root not set and could not be read from the manifest (pass --manifest, --source-root, and/or --dest-root explicitly)"
938978
)
939979

940980
# Write scripts

src/xfer/est.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -569,6 +569,90 @@ def suggest_rclone_flags_from_sizes(sizes: List[int]) -> RcloneFlagsSuggestion:
569569
)
570570

571571

572+
@dataclass
573+
class ShardCountSuggestion:
574+
"""Suggested shard count for `xfer manifest shard` based on bytes, cores, and concurrency."""
575+
576+
num_shards: int
577+
reasoning: str
578+
assumptions: Dict[str, Any]
579+
580+
581+
def suggest_shard_count(
582+
total_bytes: int,
583+
*,
584+
cpus_per_task: int = 4,
585+
array_concurrency: int = 64,
586+
core_budget: Optional[int] = None,
587+
max_shard_bytes_tb: int = 10,
588+
) -> ShardCountSuggestion:
589+
"""
590+
Suggest a shard count for a transfer based on three constraints:
591+
592+
1. Bytes cap: no single shard should carry more than ``max_shard_bytes_tb`` TiB
593+
of data (worker wall-clock dominates the array's long tail otherwise).
594+
2. Concurrency cap: producing more than ``4 * array_concurrency`` shards is
595+
wasteful (the scheduler only needs enough slack to keep the queue packed
596+
as slow shards trail).
597+
3. Core cap (optional): if ``core_budget`` is supplied, the array can't
598+
usefully exceed ``core_budget // cpus_per_task`` concurrent workers, so
599+
producing more shards just lengthens the queue.
600+
601+
Special case: if ``total_bytes`` is below the per-shard cap, return
602+
``num_shards=1`` — sharding isn't helpful.
603+
"""
604+
tib = 1024**4
605+
max_shard_bytes = max_shard_bytes_tb * tib
606+
607+
assumptions: Dict[str, Any] = {
608+
"total_bytes": total_bytes,
609+
"cpus_per_task": cpus_per_task,
610+
"array_concurrency": array_concurrency,
611+
"core_budget": core_budget,
612+
"max_shard_bytes_tb": max_shard_bytes_tb,
613+
}
614+
615+
if total_bytes < max_shard_bytes:
616+
reasoning = (
617+
f"total_bytes ({human_bytes(total_bytes)}) is below the per-shard cap "
618+
f"({max_shard_bytes_tb} TiB); a single shard is sufficient."
619+
)
620+
return ShardCountSuggestion(
621+
num_shards=1, reasoning=reasoning, assumptions=assumptions
622+
)
623+
624+
shards_by_bytes = math.ceil(total_bytes / max_shard_bytes)
625+
shards_by_concurrency = 4 * array_concurrency
626+
shards_by_cores: Optional[int] = None
627+
if core_budget is not None:
628+
shards_by_cores = max(1, core_budget // cpus_per_task)
629+
630+
upper = shards_by_concurrency
631+
if shards_by_cores is not None:
632+
upper = min(upper, shards_by_cores)
633+
634+
num_shards = max(1, min(upper, max(shards_by_bytes, 1)))
635+
636+
reasoning_parts = [
637+
f"shards_by_bytes={shards_by_bytes} "
638+
f"(total {human_bytes(total_bytes)} / {max_shard_bytes_tb} TiB cap)",
639+
f"shards_by_concurrency={shards_by_concurrency} (4 x {array_concurrency})",
640+
]
641+
if shards_by_cores is not None:
642+
reasoning_parts.append(
643+
f"shards_by_cores={shards_by_cores} "
644+
f"({core_budget} cores / {cpus_per_task} cpus_per_task)"
645+
)
646+
reasoning_parts.append(
647+
f"chose max(1, min(upper={upper}, shards_by_bytes)) = {num_shards}"
648+
)
649+
reasoning = "; ".join(reasoning_parts)
650+
651+
return ShardCountSuggestion(
652+
num_shards=num_shards, reasoning=reasoning, assumptions=assumptions
653+
)
654+
655+
572656
def format_histogram_data(
573657
sizes: List[int],
574658
) -> List[Dict[str, Any]]:

0 commit comments

Comments
 (0)