Problem
Each Argo retry spawns a new pod with a fresh ephemeral `/tmp`. The fsspec `simplecache` populated during a failed run is lost, so every retry re-downloads the full source product from EODC HTTPS — wasting time and increasing the chance of hitting the same transient-failure window again.
Proposed fix
Add a workflow-scoped `volumeClaimTemplates` PVC (20 Gi) to `eopf-explorer-convert-v1-s2-template.yaml` and mount it at `/cache/zarr-source`. Set `ZARR_SOURCE_CACHE_DIR=/cache/zarr-source` on the convert pod.
No changes to `scripts/convert_v1_s2.py` — the script already reads `ZARR_SOURCE_CACHE_DIR`.
The PVC is scoped to one workflow execution and cleaned up by Argo GC after completion.
Expected outcome
On a transient-failure retry (OOM, network blip, SIGTERM), the new pod finds previously-fetched source chunks in the cache and only re-downloads the chunks it hasn't seen yet — reducing HTTP load on EODC and shortening retry time.
Pre-flight check
Before applying: confirm Argo PVC GC is enabled:
```bash
kubectl get configmap workflow-controller-configmap -n argo -o yaml | grep pvcAutoDelete
```
Verification
- Apply the updated workflow YAML
- Trigger a retry by deleting the running convert pod: `kubectl delete pod -n devseed `
- On the new pod, confirm cache is populated: `kubectl exec -n devseed -- ls -lh /cache/zarr-source/`
- After workflow completes, confirm PVC is cleaned up: `kubectl get pvc -n devseed | grep zarr-source-cache`
Problem
Each Argo retry spawns a new pod with a fresh ephemeral `/tmp`. The fsspec `simplecache` populated during a failed run is lost, so every retry re-downloads the full source product from EODC HTTPS — wasting time and increasing the chance of hitting the same transient-failure window again.
Proposed fix
Add a workflow-scoped `volumeClaimTemplates` PVC (20 Gi) to `eopf-explorer-convert-v1-s2-template.yaml` and mount it at `/cache/zarr-source`. Set `ZARR_SOURCE_CACHE_DIR=/cache/zarr-source` on the convert pod.
No changes to `scripts/convert_v1_s2.py` — the script already reads `ZARR_SOURCE_CACHE_DIR`.
The PVC is scoped to one workflow execution and cleaned up by Argo GC after completion.
Expected outcome
On a transient-failure retry (OOM, network blip, SIGTERM), the new pod finds previously-fetched source chunks in the cache and only re-downloads the chunks it hasn't seen yet — reducing HTTP load on EODC and shortening retry time.
Pre-flight check
Before applying: confirm Argo PVC GC is enabled:
```bash
kubectl get configmap workflow-controller-configmap -n argo -o yaml | grep pvcAutoDelete
```
Verification