Skip to content

Warn on deploy prepare storage rewrites#4548

Merged
YuanTingHsieh merged 8 commits into
NVIDIA:mainfrom
YuanTingHsieh:codex/deploy-prepare-storage-warnings-main
May 8, 2026
Merged

Warn on deploy prepare storage rewrites#4548
YuanTingHsieh merged 8 commits into
NVIDIA:mainfrom
YuanTingHsieh:codex/deploy-prepare-storage-warnings-main

Conversation

@YuanTingHsieh
Copy link
Copy Markdown
Collaborator

@YuanTingHsieh YuanTingHsieh commented May 7, 2026

same as #4530

What changed

  • Port the deploy prepare storage relocation and observability warnings from the 2.8 PR to main.
  • Relocate Docker server job history and snapshot storage into the mounted workspace, matching K8s server prepare behavior.
  • Warn when deploy prepare replaces or removes existing resource manager, resource consumer, or launcher configuration.
  • Warn when a present snapshot_persistor cannot be relocated because it does not expose snapshot_persistor.args.storage.args.root_dir.

Why

Containerized deploy prepare rewrites runtime components and relocates server storage. Without these warnings and Docker storage relocation, operator customization can be silently dropped and server state can remain on container-local storage.

@YuanTingHsieh YuanTingHsieh marked this pull request as ready for review May 7, 2026 20:55
Copilot AI review requested due to automatic review settings May 7, 2026 20:55
@YuanTingHsieh YuanTingHsieh changed the title [main] Warn on deploy prepare storage rewrites Warn on deploy prepare storage rewrites May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports deploy-prepare warnings and storage relocation behavior to main to make containerized (Docker/K8s) prepared kits safer and more transparent—avoiding silent drops of operator-provided runtime components and ensuring server state is stored on the mounted workspace.

Changes:

  • Relocate Docker server snapshot/job storage paths into the mounted workspace (aligning with existing K8s prepare behavior).
  • Emit human-readable warnings when deploy prepare removes/replaces configured resource manager/consumer or launcher components.
  • Add unit tests covering Docker storage relocation and warning/no-warning scenarios (including unexpected snapshot persistor shapes).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
nvflare/tool/deploy/deploy_commands.py Adds warning logic for replaced/removed components and relocates server storage to the workspace during Docker prepare (plus warning on unrelocatable snapshot persistor shapes).
tests/unit_test/tool/deploy/deploy_commands_test.py Adds tests for Docker server storage relocation and for warning behavior when existing component configs would be discarded.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR ports observability improvements from the 2.8 branch to main: it adds warnings when deploy prepare silently drops operator-customised resource manager, resource consumer, or launcher configuration, and relocates Docker server job-history and snapshot storage into the mounted workspace (matching K8s behaviour).

  • _warn_for_replaced_components is injected into _patch_resources and inspects each component before it is filtered out, emitting Warning: lines for any component that carries a custom path or non-empty args.
  • _relocate_server_storage_to_workspace is now also called from the Docker path for server kits, and now warns (instead of silently skipping) when a present snapshot_persistor lacks the expected nested args.storage.args.root_dir key or contains a non-dict at any intermediate level.

Confidence Score: 5/5

The change is additive — it adds warnings before existing filters run and extends Docker server behaviour to match the already-stable K8s path. No existing data paths are modified in a breaking way.

All new code paths have corresponding tests. The elif chains operate on disjoint ID sets, the TypeError catch correctly handles non-dict intermediates in the nested key traversal, and the Docker server storage relocation mirrors the already-tested K8s path. No logic regressions were found.

No files require special attention.

Important Files Changed

Filename Overview
nvflare/tool/deploy/deploy_commands.py Adds warning helpers and Docker-server storage relocation; logic is correct, elif chains over disjoint ID sets are safe, and TypeError is now caught alongside KeyError in the nested-key traversal.
tests/unit_test/tool/deploy/deploy_commands_test.py New tests cover Docker storage relocation, both runtimes for the no-persistor and unexpected-shape warning paths, and the custom-vs-default component warning logic; only the TypeError code path in _relocate_server_storage_to_workspace lacks a dedicated case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[prepare_deployment] --> B{runtime?}
    B -->|docker| C[_prepare_docker]
    B -->|k8s| D[_prepare_k8s]

    C --> E[_patch_resources\ndocker_launcher]
    D --> F[_patch_resources\nk8s_launcher]

    E --> G[_warn_for_replaced_components]
    F --> G

    G --> G1{resource_consumer\ncustom config?}
    G1 -->|yes| G2[Warning: removes component]
    G --> G3{resource_manager\ncustom config?}
    G3 -->|yes| G4[Warning: replaces resource_manager]
    G --> G5{launcher\ncustom config?}
    G5 -->|yes| G6[Warning: replaces launcher]

    E --> H{role == SERVER?}
    H -->|yes| I[_relocate_server_storage_to_workspace]
    F --> J{role == SERVER?}
    J -->|yes| K[_relocate_server_storage_to_workspace]

    I --> L{snapshot_persistor\nin resources?}
    K --> L
    L -->|yes| M[try set root_dir]
    M -->|KeyError/TypeError| N[Warning: cannot relocate]
    M -->|success| O[root_dir = workspace/snapshot-storage]
    L -->|no| P[skip]

    I --> Q[job_manager.uri_root = workspace/jobs-storage]
    K --> Q
    Q --> R[_write_resources]
Loading

Reviews (4): Last reviewed commit: "Merge remote-tracking branch 'upstream/m..." | Re-trigger Greptile

Comment thread nvflare/tool/deploy/deploy_commands.py
@pcnudde
Copy link
Copy Markdown
Collaborator

pcnudde commented May 8, 2026

/build

1 similar comment
@YuanTingHsieh
Copy link
Copy Markdown
Collaborator Author

/build

pcnudde
pcnudde previously approved these changes May 8, 2026
…e-storage-warnings-main

# Conflicts:
#	tests/unit_test/tool/deploy/deploy_commands_test.py
@YuanTingHsieh YuanTingHsieh merged commit b759c8c into NVIDIA:main May 8, 2026
24 checks passed
@YuanTingHsieh YuanTingHsieh deleted the codex/deploy-prepare-storage-warnings-main branch May 8, 2026 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants