feat: Support more Data Designer seed sources#413
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (22)
💤 Files with no reviewable changes (4)
✅ Files skipped from review due to trivial changes (3)
🚧 Files skipped from review as they are similar to previous changes (15)
📝 WalkthroughWalkthroughSeed validation now distinguishes local and fileset-backed sources, caches validated filesystem roots in contexts, and routes seed readers through scoped filesystem providers. Plugin request paths now build configs directly, and model provider fixtures and tests use explicit providers. ChangesFilesystem Seed Support
Model provider behavior and fixture alignment
Sequence Diagram(s)sequenceDiagram
participant validate_seed
participant _validate_seed_from_files_service
participant "sdk.files.filesets.retrieve"
participant "sdk.files.list"
validate_seed->>_validate_seed_from_files_service: validate_seed(..., is_local=False)
_validate_seed_from_files_service->>"sdk.files.filesets.retrieve": retrieve(name, workspace)
_validate_seed_from_files_service->>"sdk.files.list": list(remote_path, fileset, workspace)
_validate_seed_from_files_service-->>validate_seed: validated_root
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.23.0)plugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.py┌──────────────┐ �[32m✔�[39m �[1mOpengrep OSS�[0m [00.28][ERROR]: unable to find a config; path plugins/nemo-data-designer/tests/unit/test_preview_function.py┌──────────────┐ �[32m✔�[39m �[1mOpengrep OSS�[0m [00.13][ERROR]: unable to find a config; path plugins/nemo-data-designer/tests/unit/test_context.py┌──────────────┐ �[32m✔�[39m �[1mOpengrep OSS�[0m [00.15][ERROR]: unable to find a config; path
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py (1)
4-4: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRemove postponed annotations import in this test module.
Line 4 enables postponed/string-based annotations; keep runtime-resolved concrete hints instead.
Suggested change
-from __future__ import annotations - from typing import AnyAs per coding guidelines "Always prefer concrete type hints over string-based ones in Python code; do not import types under TYPE_CHECKING, instead import types as regular imports when possible".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py` at line 4, Remove the `from __future__ import annotations` import statement from the top of the test_remote_filesystem_seeds.py module. This import enables postponed evaluation of annotations (string-based type hints), but the coding guidelines require using concrete type hints instead of string-based ones. Delete this single import line and ensure any type annotations in the module use direct type references rather than string representations.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py`:
- Line 4: Remove the `from __future__ import annotations` import statement from
the top of the test_remote_filesystem_seeds.py module. This import enables
postponed evaluation of annotations (string-based type hints), but the coding
guidelines require using concrete type hints instead of string-based ones.
Delete this single import line and ensure any type annotations in the module use
direct type references rather than string representations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e07ae8ef-d677-42f9-9166-d1e26b92627d
📒 Files selected for processing (15)
packages/data_designer_nemo/src/data_designer_nemo/context.pypackages/data_designer_nemo/src/data_designer_nemo/fileset_filesystem_provider.pypackages/data_designer_nemo/src/data_designer_nemo/model_provider.pypackages/data_designer_nemo/src/data_designer_nemo/seed.pypackages/data_designer_nemo/src/data_designer_nemo/unsupported_features.pypackages/data_designer_nemo/tests/unit/test_fileset_filesystem_provider.pypackages/data_designer_nemo/tests/unit/test_model_configs.pypackages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.pyplugins/nemo-data-designer/tests/integration/test_preview_remote_sdk.pyplugins/nemo-data-designer/tests/integration/test_remote_validation_errors.pyplugins/nemo-data-designer/tests/integration/test_validate_sdk.pyplugins/nemo-data-designer/tests/unit/test_model_provider.pyplugins/nemo-data-designer/tests/unit/test_preview_function.pyplugins/nemo-data-designer/tests/unit/test_sdk_resources.py
💤 Files with no reviewable changes (3)
- plugins/nemo-data-designer/tests/integration/test_remote_validation_errors.py
- plugins/nemo-data-designer/tests/unit/test_model_provider.py
- packages/data_designer_nemo/src/data_designer_nemo/model_provider.py
|
89e0539 to
35e1253
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py (1)
16-20: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winThis test does not verify the new remote provider wiring.
These assertions still pass if
RemoteDataDesignerContext.get_seed_readers()returns plainDirectorySeedReader()/FileContentsSeedReader()without theFilesetFileSystemProvider. Since this PR’s behavior change is the injected provider, mirror the local-context test and assert the readers’_fs_providertype.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py` around lines 16 - 20, The test only checks that get_seed_readers() returns DirectorySeedReader and FileContentsSeedReader, but it does not verify the new remote provider wiring. Update test_remote_context_includes_filesystem_seed_readers to mirror the local-context test by asserting the _fs_provider on each reader returned by RemoteDataDesignerContext.get_seed_readers() is a FilesetFileSystemProvider, so the test fails if plain readers are returned without the injected provider.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/data_designer_nemo/src/data_designer_nemo/seed.py`:
- Around line 89-106: The existence check in the seed path validation is using
files.list() and then treating an empty response as missing, which incorrectly
flags valid but empty directories as invalid. Update the logic in seed.py around
the files.list() call to use an explicit existence/stat-style API, matching the
approach used by FilesetFileSystemProvider.ensure_root_exists(), and reserve
NDDInvalidConfigError only for true missing paths rather than empty directories.
---
Nitpick comments:
In `@packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py`:
- Around line 16-20: The test only checks that get_seed_readers() returns
DirectorySeedReader and FileContentsSeedReader, but it does not verify the new
remote provider wiring. Update
test_remote_context_includes_filesystem_seed_readers to mirror the local-context
test by asserting the _fs_provider on each reader returned by
RemoteDataDesignerContext.get_seed_readers() is a FilesetFileSystemProvider, so
the test fails if plain readers are returned without the injected provider.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 134d8172-a0ef-470a-8469-60d02e1d9234
📒 Files selected for processing (21)
packages/data_designer_nemo/src/data_designer_nemo/context.pypackages/data_designer_nemo/src/data_designer_nemo/fileset_filesystem_provider.pypackages/data_designer_nemo/src/data_designer_nemo/model_provider.pypackages/data_designer_nemo/src/data_designer_nemo/seed.pypackages/data_designer_nemo/src/data_designer_nemo/tool_configs.pypackages/data_designer_nemo/src/data_designer_nemo/unsupported_features.pypackages/data_designer_nemo/tests/unit/test_fileset_filesystem_provider.pypackages/data_designer_nemo/tests/unit/test_local_filesystem_seeds.pypackages/data_designer_nemo/tests/unit/test_model_configs.pypackages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/functions/_types.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/jobs/spec.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/resources.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.pyplugins/nemo-data-designer/tests/integration/test_preview_remote_sdk.pyplugins/nemo-data-designer/tests/integration/test_remote_validation_errors.pyplugins/nemo-data-designer/tests/integration/test_validate_sdk.pyplugins/nemo-data-designer/tests/unit/test_context.pyplugins/nemo-data-designer/tests/unit/test_model_provider.pyplugins/nemo-data-designer/tests/unit/test_preview_function.pyplugins/nemo-data-designer/tests/unit/test_sdk_resources.py
💤 Files with no reviewable changes (13)
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/functions/_types.py
- plugins/nemo-data-designer/tests/unit/test_context.py
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/jobs/spec.py
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.py
- packages/data_designer_nemo/src/data_designer_nemo/model_provider.py
- plugins/nemo-data-designer/tests/integration/test_preview_remote_sdk.py
- plugins/nemo-data-designer/tests/integration/test_remote_validation_errors.py
- plugins/nemo-data-designer/tests/unit/test_sdk_resources.py
- packages/data_designer_nemo/src/data_designer_nemo/unsupported_features.py
- plugins/nemo-data-designer/tests/integration/test_validate_sdk.py
- plugins/nemo-data-designer/tests/unit/test_preview_function.py
- plugins/nemo-data-designer/tests/unit/test_model_provider.py
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/resources.py
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/data_designer_nemo/tests/unit/test_model_configs.py
matthewgrossman
left a comment
There was a problem hiding this comment.
lgtm from a files perspective
098248b to
52613dd
Compare
|
🌿 Preview your docs: https://nvidia-preview-remote-seeds-mknepper.docs.buildwithfern.com/nemo-platform |
There was a problem hiding this comment.
Actionable comments posted: 16
🧹 Nitpick comments (6)
script/submit-hello-world-job.sh (1)
33-61: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winJSON built via raw interpolation breaks on special chars.
MESSAGE/JOB_NAME/WORKSPACEare spliced unescaped into the heredoc. A", backslash, or newline produces invalid JSON. You already shell out topython3below—build the whole payload there instead.♻️ Build payload with python3
payload_file="$(mktemp)" trap 'rm -f "${payload_file}"' EXIT WORKSPACE="${WORKSPACE}" JOB_NAME="${JOB_NAME}" MESSAGE="${MESSAGE}" \ IMAGE="${IMAGE}" EXECUTION_PROFILE="${EXECUTION_PROFILE}" PROJECT="${PROJECT}" \ python3 - "${payload_file}" <<'PY' import json, os, sys payload = { "workspace": os.environ["WORKSPACE"], "name": os.environ["JOB_NAME"], "source": "hello-world", "spec": {"message": os.environ["MESSAGE"]}, "platform_spec": {"steps": [{ "name": "hello-world", "executor": { "provider": "cpu", "profile": os.environ["EXECUTION_PROFILE"], "container": { "image": os.environ["IMAGE"], "entrypoint": ["nemo-platform"], "command": ["run", "task", "--task", "nmp.hello_world.tasks.hello_world"], }, }, "config": {"message": os.environ["MESSAGE"]}, }]}, } if os.environ.get("PROJECT"): payload["project"] = os.environ["PROJECT"] with open(sys.argv[1], "w", encoding="utf-8") as f: json.dump(payload, f) PY🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@script/submit-hello-world-job.sh` around lines 33 - 61, The heredoc in submit-hello-world-job.sh builds JSON by raw interpolation, so special characters in MESSAGE, JOB_NAME, or WORKSPACE can break the payload. Replace the shell heredoc with the existing python3-based payload निर्माण so the JSON is serialized safely, and ensure the payload assembly logic uses the same fields currently written in the heredoc (workspace, name, spec.message, and platform_spec.steps[].config.message) via json.dump.spec/brian-auth-call.md (1)
1-1: 📐 Maintainability & Code Quality | 🔵 Trivial | 🏗️ Heavy liftSplit this transcript into a real docs page.
This reads like raw meeting notes, not a single Diataxis page. If it’s meant to ship, add a proper heading/summary and move the verbatim transcript out of the published docs tree.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@spec/brian-auth-call.md` at line 1, This transcript should not be published as-is; turn it into a proper docs page with a clear title, short summary, and structured sections, and move the raw verbatim conversation out of the published docs path. Use the existing spec/doc entry as the source to extract the actual auth story, then rewrite it into a Diataxis-style page instead of keeping meeting notes in place.Sources: Coding guidelines, Linters/SAST tools
e2e/services_pool.py (1)
6-6: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRemove postponed annotations.
Line 6 makes annotations string-backed; this file imports the referenced types directly.
As per coding guidelines,
**/*.py: “Always prefer concrete type hints over string-based ones in Python code.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e/services_pool.py` at line 6, Remove the postponed annotations import from the module so type hints are evaluated normally instead of being string-backed. Update the top-level imports in services_pool.py and keep the existing concrete type references unchanged, since the referenced types are already imported directly and should be used as real annotations.Source: Coding guidelines
spec/jobs-local-remote-unification-spec.md (2)
1-1880: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd the required doc scaffolding.
This page still needs a short prerequisites section near the top and a Next Steps section with cross-links at the end. As per coding guidelines, documentation pages must list prerequisites before other content and include Next Steps at the end.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@spec/jobs-local-remote-unification-spec.md` around lines 1 - 1880, Add the missing documentation scaffolding: insert a short Prerequisites section near the top before the main spec content, and add a Next Steps section at the end with cross-links. Update the Jobs Local/Remote Unification Spec so it follows the standard doc order and includes the required navigation references.Source: Coding guidelines
508-670: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRename the duplicated section headings.
The repeated headings here trigger the markdown-lint duplicate-heading warning and make anchor IDs ambiguous. Unique section titles will keep cross-links stable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@spec/jobs-local-remote-unification-spec.md` around lines 508 - 670, The markdown spec has duplicate section headings that trigger lint warnings and create ambiguous anchors. Update the repeated headings in this diff to unique titles while preserving the meaning of each section, and make sure the nearby section structure around the working-directory and retention discussions remains clear and consistently named.Source: Linters/SAST tools
public-images.md (1)
13-76: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMake the registry comparison neutral and source-backed.
Several registry-limit claims are vendor-policy details that can change, and phrases like “definitive home” and “highly resilient” read as marketing. Please verify the limits against current vendor docs and rewrite the comparisons around concrete criteria. As per coding guidelines, docs should stay concrete and avoid marketing language.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@public-images.md` around lines 13 - 76, The registry comparison in public-images.md should be rewritten to remove marketing-style language and unverified policy claims. Update the affected sections and the summary table to use neutral, source-backed wording, and ensure any size limits, pull quotas, and auth requirements are stated only if they can be verified from current vendor documentation. Keep the focus on concrete criteria in the GHCR, Docker Hub, Hugging Face, Quay.io, and NVCR/NGC sections, and avoid subjective phrases like “definitive home” or “highly resilient.”Sources: Coding guidelines, Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e/conftest.py`:
- Line 63: The log assertion helper in conftest is reading pooled service logs
with read_text().splitlines(), which loads the entire file on every failure.
Update the log-reading logic used by the affected test fixture/helper to tail
only the needed end of the file instead of reading everything, and apply the
same change wherever the same pattern is used in the related sections noted by
the review. Use the existing log-path handling in conftest to keep the fix
localized and avoid full-file reads for large logs.
In `@e2e/services_pool.py`:
- Around line 399-402: The one-shot free-port selection in _find_free_port is
racy because the socket closes before nemo services run binds it, so update the
startup flow to retry with a newly selected port when bind/startup fails. Adjust
the services startup logic around the call sites noted in the comment (including
the related block later in the file) so that a failed launch due to port
conflict triggers a fresh _find_free_port attempt and restart rather than
waiting for the full timeout.
- Around line 191-193: The generated config reuse logic in services_pool.py
should not skip writing when platform-{state.key.config_hash}.yaml already
exists, because stale files can survive after rendering changes. Update the
config generation path in the relevant method to always rewrite the file with
the freshly rendered output instead of guarding on config_path.exists(), so each
run refreshes the generated config even when the hash matches.
In `@plans/2026-06-03-helm-chart-kube-e2e-plan.md`:
- Around line 6-29: The plan still uses workstation-specific absolute links,
which are not portable. Replace every `/Users/rsadler/src/...` reference in this
document with repo-relative links so the paths work anywhere; update the links
pointing to Makefile, e2e/conftest.py, docker-bake.hcl, and any other mentioned
files to use the repository root relative locations instead.
In `@plans/2026-06-03-jobs-pause-resume-plan.md`:
- Around line 15-29: The plan contains non-portable workstation links that
should be replaced with repo-relative references. Update the affected references
in the plan text so the linked files for e2e/test_jobs.py, docker.py,
dispatcher.py, test_pause_resume.py, test_docker_backend.py, and
test_jobs_pause_resume_docker.py use paths relative to the repository root,
keeping the same file targets but removing any /Users/rsadler/src/... prefixes.
In `@plans/2026-06-03-jobs-persistent-storage-plan.md`:
- Around line 15-29: The plan still uses absolute workstation paths for the
affected files, which makes the links brittle and non-portable. Update the
references in the planning doc to use repo-relative paths for the same targets
(e2e/test_jobs.py, docker.py, endpoints.py, schemas.py, test_docker_backend.py,
and test_jobs_persistent_storage.py) so the links work across environments.
In `@process.md`:
- Around line 1-9: The document opens with email-style context instead of the
actual proposal, so update the start of the content to lead with the
release-process framework. Add a clear top-level title at the beginning of
process.md and move the core proposal summary to the opening paragraph, using
the existing release-process draft content to orient readers immediately.
In `@public-images.md`:
- Line 1: The document starts with a second-level heading instead of the
canonical top-level title. Update the opening heading in the markdown doc to use
an H1 so the page has a proper title and complies with markdownlint; locate the
current title at the top of the file and promote it from the existing heading
level to the top-level heading.
In `@services/core/jobs/src/nmp/core/jobs/controllers/backends/subprocess.py`:
- Line 374: The latest_task selection in subprocess.py can fail when mixing
tz-aware task timestamps with the naive datetime.datetime.min fallback. Update
the max() key in the task comparison so the fallback uses a timezone-aware
minimum datetime consistent with updated_at/created_at values, keeping the logic
in latest_task and the lambda key fully comparable.
In `@spec/jobs-backed-local-run-ux-and-progressive-start-spec.md`:
- Line 14: The reference to run_local(...) in the scheduler.py note uses a
machine-specific absolute path that is not portable. Replace that
/Users/rsadler/... reference with a repo-relative path or a plain code symbol
reference for nemo_platform_plugin.scheduler.run_local so the spec stays valid
across environments.
In `@spec/jobs-local-remote-unification-spec.md`:
- Around line 1651-1666: The jobs selection rules are inconsistent: the spec
says `nemo jobs` should only use remote mode when `--cluster` or `--base-url` is
explicitly provided, but the earlier `cluster use` flow implies persisted active
context should still steer `jobs run`. Update the jobs
submission/target-resolution behavior to follow one rule consistently, and
adjust the relevant jobs CLI flow/spec text around the submit path and active
context handling so the behavior matches the chosen `--cluster`/`--base-url`
default.
In `@spec/jobs-runtime-availability-and-capabilities-spec.md`:
- Around line 30-34: Remove the machine-specific absolute paths from the
referenced bullets in the jobs availability/capabilities spec and replace them
with repo-relative links or plain code references. Update the entries that
mention validate_docker_available, the platform config downgrade logic, the Jobs
profiles builder, the default Jobs backend config selector, and the
customization jobs contributor so they point to symbols only, not local
filesystem paths.
In `@spec/nvapi-authentication-gateway-spec.md`:
- Around line 450-463: The second Recommendation heading is duplicated and makes
the section anchor ambiguous; rename the later heading in the document to a
distinct title such as Final Recommendation so cross-links resolve cleanly.
Update the heading text in the recommendation section while keeping the
gateway/ext_authz translator guidance unchanged.
In `@spec/options-for-multi-idp.md`:
- Around line 31-84: The review note is pointing out that the documented
references use workstation-specific absolute paths, which will break for other
contributors. Update the affected text in this spec to use repo-relative links
or plain filenames instead of /Users/rsadler/src/nemo-platform/... so the
references remain portable, including the body and any repo-touchpoints entries
that mention base.py, jwt.py, endpoints.py, environment.ts, useAuthLogin.ts,
useAuthProfile.ts, and the related docs.
In `@spec/plugin-service-authz-spec.md`:
- Around line 218-231: The path-rule validation is missing a guard for empty
callers, which allows a rule with callers=[] to validate even though it matches
nobody. Update the endpoint/rule validation logic for plugin-owned routes so
each path rule explicitly requires at least one caller before merge/startup, and
ensure the check is applied per rule alongside the existing decorator validation
rules in the plugin authz spec.
In `@spec/subprocess-first-class-execution-resolution-spec.md`:
- Around line 33-42: Replace the machine-local path references in the spec with
repo-relative links or plain code symbols so they resolve everywhere and do not
expose local filesystem paths. Update the Jobs API ingress reference to point to
the `endpoints.py` location using a repository-relative target, and keep any
mentions of `run_local(...)` and `submit_remote(...)` as code references rather
than absolute paths.
---
Nitpick comments:
In `@e2e/services_pool.py`:
- Line 6: Remove the postponed annotations import from the module so type hints
are evaluated normally instead of being string-backed. Update the top-level
imports in services_pool.py and keep the existing concrete type references
unchanged, since the referenced types are already imported directly and should
be used as real annotations.
In `@public-images.md`:
- Around line 13-76: The registry comparison in public-images.md should be
rewritten to remove marketing-style language and unverified policy claims.
Update the affected sections and the summary table to use neutral, source-backed
wording, and ensure any size limits, pull quotas, and auth requirements are
stated only if they can be verified from current vendor documentation. Keep the
focus on concrete criteria in the GHCR, Docker Hub, Hugging Face, Quay.io, and
NVCR/NGC sections, and avoid subjective phrases like “definitive home” or
“highly resilient.”
In `@script/submit-hello-world-job.sh`:
- Around line 33-61: The heredoc in submit-hello-world-job.sh builds JSON by raw
interpolation, so special characters in MESSAGE, JOB_NAME, or WORKSPACE can
break the payload. Replace the shell heredoc with the existing python3-based
payload निर्माण so the JSON is serialized safely, and ensure the payload
assembly logic uses the same fields currently written in the heredoc (workspace,
name, spec.message, and platform_spec.steps[].config.message) via json.dump.
In `@spec/brian-auth-call.md`:
- Line 1: This transcript should not be published as-is; turn it into a proper
docs page with a clear title, short summary, and structured sections, and move
the raw verbatim conversation out of the published docs path. Use the existing
spec/doc entry as the source to extract the actual auth story, then rewrite it
into a Diataxis-style page instead of keeping meeting notes in place.
In `@spec/jobs-local-remote-unification-spec.md`:
- Around line 1-1880: Add the missing documentation scaffolding: insert a short
Prerequisites section near the top before the main spec content, and add a Next
Steps section at the end with cross-links. Update the Jobs Local/Remote
Unification Spec so it follows the standard doc order and includes the required
navigation references.
- Around line 508-670: The markdown spec has duplicate section headings that
trigger lint warnings and create ambiguous anchors. Update the repeated headings
in this diff to unique titles while preserving the meaning of each section, and
make sure the nearby section structure around the working-directory and
retention discussions remains clear and consistently named.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: aa1f14c6-afc3-4cb8-8003-81a2ceed6c2c
📒 Files selected for processing (81)
commit.txtcontainer.mddocs/set-up/config-reference.mdxe2e/configs/local-subprocess.yamle2e/conftest.pye2e/services_pool.pye2e/test_data_designer.pye2e/test_jobs.pye2e/test_jobs_auth.pyghcr-mirroring-issue.mdpackages/data_designer_nemo/src/data_designer_nemo/context.pypackages/data_designer_nemo/src/data_designer_nemo/fileset_filesystem_provider.pypackages/data_designer_nemo/src/data_designer_nemo/model_provider.pypackages/data_designer_nemo/src/data_designer_nemo/seed.pypackages/data_designer_nemo/src/data_designer_nemo/tool_configs.pypackages/data_designer_nemo/src/data_designer_nemo/unsupported_features.pypackages/data_designer_nemo/tests/unit/test_fileset_filesystem_provider.pypackages/data_designer_nemo/tests/unit/test_local_filesystem_seeds.pypackages/data_designer_nemo/tests/unit/test_model_configs.pypackages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.pypackages/nmp_testing/src/nmp/testing/__init__.pypackages/nmp_testing/src/nmp/testing/utils.pypackages/nmp_testing/tests/unit/test_e2e_harness.pyplans/2026-06-03-helm-chart-kube-e2e-plan.mdplans/2026-06-03-jobs-pause-resume-plan.mdplans/2026-06-03-jobs-persistent-storage-plan.mdplugins/nemo-data-designer/src/nemo_data_designer_plugin/functions/_types.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/jobs/spec.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/resources.pyplugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.pyplugins/nemo-data-designer/tests/integration/test_preview_local_cli.pyplugins/nemo-data-designer/tests/integration/test_preview_remote_sdk.pyplugins/nemo-data-designer/tests/integration/test_remote_validation_errors.pyplugins/nemo-data-designer/tests/integration/test_validate_sdk.pyplugins/nemo-data-designer/tests/unit/test_context.pyplugins/nemo-data-designer/tests/unit/test_model_provider.pyplugins/nemo-data-designer/tests/unit/test_preview_function.pyplugins/nemo-data-designer/tests/unit/test_sdk_resources.pyprocess.mdpublic-images.mdpytest.iniscript/Untitledscript/run-e2e-linux.shscript/run-hello-world-jobs.shscript/submit-hello-world-job.shservices/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.pyservices/core/jobs/src/nmp/core/jobs/app/dispatcher.pyservices/core/jobs/src/nmp/core/jobs/config.pyservices/core/jobs/src/nmp/core/jobs/controllers/backends/subprocess.pyservices/core/jobs/src/nmp/core/jobs/controllers/diagnostics.pyservices/core/jobs/src/nmp/core/jobs/controllers/reconciler.pyservices/core/jobs/src/nmp/core/jobs/controllers/scheduler.pyservices/core/jobs/tests/controllers/test_diagnostics.pyservices/core/jobs/tests/controllers/test_reconciler.pyservices/core/jobs/tests/controllers/test_scheduler.pyservices/core/jobs/tests/controllers/test_subprocess_backend.pyservices/core/jobs/tests/integration/test_task_auth_runtime.pyservices/core/jobs/tests/test_dispatcher_cross_workspace.pyservices/core/jobs/tests/test_timestamp_contracts.pyspec/brian-auth-call-summary.mdspec/brian-auth-call.mdspec/caller-execution-hints-and-profile-plumbing-spec.mdspec/capability-vs-provider-execution-model-spec.mdspec/core-role-default-grants-spec.mdspec/daemon-group-local-jobs-follow-on-spec.mdspec/first-class-subprocess-provider-slack-draft.mdspec/jobs-backed-local-run-ux-and-progressive-start-spec.mdspec/jobs-local-remote-unification-spec.mdspec/jobs-runtime-availability-and-capabilities-spec.mdspec/machine-auth-authentication-and-authorization-spec.mdspec/nvapi-authentication-gateway-spec.mdspec/oidc-scope-and-claim-mapping-spec.mdspec/options-for-multi-idp.mdspec/permissionless-authenticated-routes-spec.mdspec/plugin-defined-scopes-spec.mdspec/plugin-service-authz-spec.mdspec/plugin-service-authz-ticket-description.mdspec/provider-backend-extensibility-spec.mdspec/service-role-visibility-and-bindability-spec.mdspec/subprocess-first-class-execution-resolution-spec.mdspec/trusted-probes-and-endpoints-spec.md
💤 Files with no reviewable changes (4)
- packages/data_designer_nemo/src/data_designer_nemo/model_provider.py
- plugins/nemo-data-designer/tests/integration/test_remote_validation_errors.py
- plugins/nemo-data-designer/tests/unit/test_model_provider.py
- packages/data_designer_nemo/src/data_designer_nemo/unsupported_features.py
✅ Files skipped from review due to trivial changes (12)
- spec/first-class-subprocess-provider-slack-draft.md
- spec/provider-backend-extensibility-spec.md
- spec/service-role-visibility-and-bindability-spec.md
- ghcr-mirroring-issue.md
- e2e/configs/local-subprocess.yaml
- spec/trusted-probes-and-endpoints-spec.md
- commit.txt
- packages/nmp_testing/src/nmp/testing/init.py
- pytest.ini
- spec/capability-vs-provider-execution-model-spec.md
- spec/permissionless-authenticated-routes-spec.md
- plugins/nemo-data-designer/tests/integration/test_validate_sdk.py
🚧 Files skipped from review as they are similar to previous changes (17)
- packages/data_designer_nemo/tests/unit/test_model_configs.py
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/functions/_types.py
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/jobs/spec.py
- packages/data_designer_nemo/src/data_designer_nemo/tool_configs.py
- packages/data_designer_nemo/tests/unit/test_local_filesystem_seeds.py
- plugins/nemo-data-designer/tests/integration/test_preview_local_cli.py
- plugins/nemo-data-designer/tests/unit/test_context.py
- packages/data_designer_nemo/tests/unit/test_fileset_filesystem_provider.py
- plugins/nemo-data-designer/tests/unit/test_preview_function.py
- plugins/nemo-data-designer/tests/integration/test_preview_remote_sdk.py
- packages/data_designer_nemo/src/data_designer_nemo/context.py
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.py
- plugins/nemo-data-designer/tests/unit/test_sdk_resources.py
- plugins/nemo-data-designer/src/nemo_data_designer_plugin/sdk/resources.py
- packages/data_designer_nemo/src/data_designer_nemo/fileset_filesystem_provider.py
- packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py
- packages/data_designer_nemo/src/data_designer_nemo/seed.py
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
…rted Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
…d sources Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
52613dd to
698445b
Compare
Adds support for
DirectorySeedSourceandFileContentsSeedSourceseed datasets in remote contexts. Depends on NVIDIA-NeMo/DataDesigner#765 getting merged and a new release of the library getting published.Currently failing CI because we're still pinned to DD
0.6.1. Once the library PR is merged and a new version is released, we'll update the library version in this PR.Note: the library has also deprecated
ModelProviderRegistry.defaultand madeModelConfig.providerrequired. Those updates are included here.Also does some refactoring of seed-related validation logic that was kind of sprawling.
Summary by CodeRabbit
New Features
Bug Fixes
Tests