feat(files): migrate filesets package to NemoClient typed HTTP client by matthewgrossman · Pull Request #429 · NVIDIA-NeMo/nemo-platform

matthewgrossman · 2026-06-24T01:28:14Z

Summary

Migrates packages/filesets/ (the Files/Filesets client layer) from the Stainless-generated NeMoPlatform SDK to the new NemoClient typed HTTP client introduced in #397. This is tracked by AIRCORE-839 (sub-issue of AIRCORE-827).

What changed

Shared types — FilesetPurpose, CacheStatus, StorageConfig, FilesetMetadata, and request/response DTOs (FilesetOutput, FilesetFileOutput, CreateFilesetRequest, etc.) now live in nemo_platform_plugin/files/ as the single source of truth. StorageConfig and FilesetMetadata moved from nmp_common (which now re-exports). SecretRef deduplicated — nmp_common re-exports from the plugin.
Typed endpoints — 9 endpoint functions (create_fileset, get_fileset, upload_file, download_file, etc.) defined using the @get/@post/@put/@patch/@delete + @abstractmethod pattern.
FilesetFileSystem migration — All self._sdk.files._upload_file()/_download_file()/etc. calls replaced with self._client.send(endpoints.X(...)). Constructor takes client: AsyncNemoClient with sdk= backward compat for the vendored SDK path.
FilesResource migration — No longer inherits from Stainless BaseFilesResource. Standalone class backed by NemoClient. Adds FilesetsSubResource for CRUD operations with explicit typed parameters (name, description, project, purpose, metadata, storage, custom_fields, cache, timeout).
Exception hierarchy — NemoHTTPError and status-code-specific subclasses (NotFoundError, ConflictError, PermissionDeniedError, etc.) in client/errors.py. Subclass the corresponding Stainless exceptions for backward compat so existing except ConflictError blocks continue to work.
send() raises on non-2xx — NemoClient.send() now raises NemoHTTPError subclasses immediately for non-2xx JSON responses. No more silent failures from forgetting .data(). NemoResponse.data() is now just a body accessor.
Binary response ergonomics — NemoBinaryResponse.read() does a plain httpx.request() (no streaming overhead). resp.stream() returns a context manager for chunk-by-chunk iteration.
Path encoding — Path parameters are percent-encoded in _resolve_path() so reserved characters (#, ?) in file paths don't break URLs.
Body serialization — exclude_unset=True on Pydantic body serialization so partial updates don't send null for unset fields.
Query param serialization — Dict/list query params are JSON-serialized at the NemoClient level.

What stays the same

Public API of FilesResource (.download(), .upload(), .upload_content(), .download_content(), .list(), .delete(), .filesets.*, .fsspec)
The vendored SDK copy still works via backward compat (FilesResource(NeMoPlatform) auto-converts)
All existing docstrings and examples preserved

Backward compatibility

FilesetFileSystem accepts both client= (new) and sdk= (legacy NeMoPlatform)
FilesResource/AsyncFilesResource accept both NemoClient and NeMoPlatform
Our exceptions subclass the Stainless exceptions so except ConflictError (from nemo_platform) catches them
nmp_common re-exports StorageConfig, FilesetMetadata, SecretRef from the plugin

Known limitations

Auth identity does not propagate through the fsspec upload path when using the sdk= backward compat conversion — tracked by AIRCORE-828. 1 unit test skipped, integration tests with auth fail.
Data designer FilesetFileSystem event loop issue (Loop is not running) — pre-existing, same root cause as auth propagation.

Test changes

Deleted TestSDKResourceImports (tested Stainless .with_raw_response/.with_streaming_response)
Replaced isinstance(result, Fileset) with attribute checks
Replaced isinstance(storage, LocalStorageConfig) with .type == "local"
Changed metadata validation test to accept ValidationError (client-side) in addition to APIStatusError
Replaced _upload_file() internal method calls with upload_content()
Skipped 1 auth propagation test (AIRCORE-828)
Added 5 regression tests: send() raises on non-2xx, path encoding, JSON query param serialization, project field preservation, exclude_unset body serialization

CI status

Unit tests: pass
Lint: pass
All wheel builds + tests: pass
Integration tests: 20 failures (all auth propagation / data designer event loop — documented in AIRCORE-828)
E2e tests: pre-existing failures (data designer, job logs)

Test plan

17 new tests (12 endpoint + 5 regression)
2333 local tests pass, 52 skipped, 0 failed
make lint-fix passes
make update-sdk / make vendor output is stable

🤖 Generated with Claude Code

coderabbitai · 2026-06-24T01:38:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Moves fileset contracts and HTTP wrappers into nemo_platform_plugin, centralizes shared metadata and storage config models there, refactors server fileset schema/cache code to use shared types, migrates fileset filesystem/resources to the plugin client, and updates integration tests and package dependencies.

Changes

Files service plugin migration

Layer / File(s)	Summary
Shared files contracts `packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py`, `packages/nemo_platform_plugin/src/nemo_platform_plugin/files/endpoints.py`, `packages/nemo_platform_plugin/tests/files/test_endpoints.py`, `openapi/.yaml`, `packages//pyproject.toml`, `packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py`	Defines shared fileset enums, DTOs, request models, query types, typed endpoint contracts, endpoint tests, OpenAPI schema descriptions, dependency entries, and BaseModel JSON serialization with unset fields excluded.
HTTP request and response wrappers `packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`, `packages/nemo_platform_plugin/src/nemo_platform_plugin/client/response.py`	Changes query param resolution, BaseModel request serialization, and refactors sync/async send and response wrappers to own stream opening, parsing, and HTTP error mapping.
Shared metadata and storage models `packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py`, `packages/nemo_platform_plugin/src/nemo_platform_plugin/files/storage_config.py`, `packages/nmp_common/src/nmp/common/api/common.py`, `packages/nmp_common/src/nmp/common/files/{metadata.py,storage_config.py}`	Adds shared metadata/storage config models in the plugin package and turns the nmp_common metadata, storage config, and SecretRef definitions into re-export shims.
Server fileset schema and cache updates `services/core/files/src/nmp/core/files/api/v2/filesets/{schemas.py,endpoints.py}`, `services/core/files/src/nmp/core/files/app/cache.py`, `services/core/files/src/nmp/core/files/entities.py`	Moves fileset response conversion into standalone helpers using shared plugin types, updates endpoint returns to use those helpers, switches cache warming to the shared CacheStatus and limiter, and imports FilesetPurpose from the shared package.
Filesystem and resource client migration `packages/filesets/src/filesets/filesystem/filesystem.py`, `packages/filesets/src/filesets/resources.py`	Replaces SDK-backed fileset filesystem and resources with plugin client calls, adds typed fileset subresources, updates list/upload/download flows, and switches return types to shared fileset output models.
Integration test updates `services/core/files/tests/integration/test_files_basic.py`, `services/core/files/tests/integration/test_files_sdk.py`, `services/core/files/tests/integration/test_fileset_filesystem.py`	Adjusts integration assertions for shared validation and storage typing, switches one upload path to the public API, and removes the SDK import-access test block.

Possibly related PRs

NVIDIA-NeMo/nemo-platform#63: Updates SecretRef schema handling in the OpenAPI contract.
NVIDIA-NeMo/nemo-platform#370: Introduces the typed client and response infrastructure used by this migration.

Suggested reviewers

maxdubrinsky

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: migrating the filesets package to the typed NemoClient HTTP client.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mgrossman/aircore-827-migrate-first-plugin-to-nemoclient-typed-http-client-files

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py (1)

146-151: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Docstring example no longer valid.

The new wrappers don't implement __enter__/__iter__; with client.send(...) as resp: for chunk in resp: will fail. Update to the resp.stream() API.

Proposed doc fix

-        For binary and streaming endpoints, the caller should use the
-        response as a context manager to ensure the connection is closed::
-
-            with client.send(endpoints.download(name="file.csv")) as resp:
-                for chunk in resp:
-                    f.write(chunk)
+        For binary and streaming endpoints, use ``stream()`` to ensure the
+        connection is closed::
+
+            with client.send(endpoints.download(name="file.csv")).stream() as resp:
+                for chunk in resp.iter_bytes():
+                    f.write(chunk)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`
around lines 146 - 151, The docstring example for the send method (or similar
client method) showing binary and streaming endpoint usage is outdated and no
longer matches the actual API. The example demonstrates using with
client.send(...) as resp: for chunk in resp: which relies on __enter__ and
__iter__ methods that the new wrappers no longer implement. Update the docstring
example to use the new resp.stream() API instead of the context manager and
direct iteration pattern, ensuring the example accurately reflects how the
current implementation should be used.

services/core/files/src/nmp/core/files/app/cache.py (1)

154-156: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Bound task creation, not just cache work.

The limiter is acquired inside cache_file_directly, so this still creates one parked task per file. Large filesets can exhaust memory before throttling helps.

Proposed fix

-    async with anyio.create_task_group() as tg:
-        for file_info in files:
-            tg.start_soon(cache_with_logging, file_info.path)
+    batch_size = max(1, int(_get_background_cache_limiter().total_tokens))
+    for start in range(0, len(files), batch_size):
+        async with anyio.create_task_group() as tg:
+            for file_info in files[start : start + batch_size]:
+                tg.start_soon(cache_with_logging, file_info.path)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/files/src/nmp/core/files/app/cache.py` around lines 154 - 156,
The task creation loop in the anyio task group creates one task for each file
upfront with tg.start_soon, and the limiter is only acquired inside the
cache_file_directly function, causing all tasks to be created and parked in
memory. To fix this, acquire the limiter before creating each task by using a
semaphore in the task creation loop itself. Wrap the cache_with_logging call in
a helper function that acquires the limiter first, then releases it after the
caching operation completes, so that only a bounded number of tasks are created
at any given time rather than creating all tasks upfront.

🧹 Nitpick comments (3)

packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py (1)

159-179: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Identical kwargs dict built in both branches. Hoist it once before the _is_binary check (same applies to the async send at lines 236-256).

Proposed refactor

-        if self._is_binary(request):
-            kwargs = {
-                "method": request.method,
-                "url": url,
-                "content": request.content,
-                "headers": req_headers,
-                "params": params,
-            }
-            return NemoBinaryResponse(self._http, kwargs, request)
-
-        if self._is_stream(request):
-            assert request.response_type is not None
-            kwargs = {
-                "method": request.method,
-                "url": url,
-                "content": request.content,
-                "headers": req_headers,
-                "params": params,
-            }
-            model_type = _get_stream_model_type(request.response_type)
-            return NemoStreamResponse(self._http, kwargs, model_type, request)
+        kwargs = {
+            "method": request.method,
+            "url": url,
+            "content": request.content,
+            "headers": req_headers,
+            "params": params,
+        }
+        if self._is_binary(request):
+            return NemoBinaryResponse(self._http, kwargs, request)
+
+        if self._is_stream(request):
+            assert request.response_type is not None
+            model_type = _get_stream_model_type(request.response_type)
+            return NemoStreamResponse(self._http, kwargs, model_type, request)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`
around lines 159 - 179, The kwargs dictionary containing method, url, content,
headers, and params is being constructed identically in both the _is_binary and
_is_stream conditional branches, causing unnecessary code duplication. Extract
this dictionary construction and assign it to kwargs before the _is_binary
conditional check, then reuse the same kwargs variable in both the
NemoBinaryResponse and NemoStreamResponse instantiations. Apply the same
refactoring pattern to the async send method where the identical duplication
occurs with _is_binary and _is_stream checks as well.

packages/nemo_platform_plugin/tests/files/test_endpoints.py (1)

98-115: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add endpoint tests for streaming upload variants and partial download headers.

Current coverage checks bytes upload and full download only. Add cases for Iterable[bytes]/AsyncIterable[bytes] upload content and, if supported by request plumbing, Range header preparation for download.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/tests/files/test_endpoints.py` around lines 98
- 115, The test file currently covers only bytes upload content and full file
download. Add additional test cases after the existing test_upload_file() and
test_download_file() functions to cover streaming upload variants and partial
download with Range headers. Create new test functions to verify that
endpoints.upload_file() correctly handles Iterable[bytes] and
AsyncIterable[bytes] content types (validating the method, path_params, content
type handling), and add a test for endpoints.download_file() that verifies Range
header preparation for partial downloads if the request plumbing supports it.

packages/filesets/src/filesets/resources.py (1)

604-617: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Use the static glob prefix for server-side filtering.

data/*.jsonl currently sends no path, forcing a full-fileset scan before client filtering; with include_cache_status=True, cache status is computed for unrelated files too. Derive the non-magic directory prefix and pass it as path.

♻️ Suggested shape

-        api_path = None if has_magic(path) else (path or None)
+        api_path = _glob_static_prefix(path) if has_magic(path) else (path or None)

def _glob_static_prefix(path: str) -> str | None:
    if not has_magic(path):
        return path or None
    prefix = path.rsplit("/", 1)[0] if "/" in path else ""
    while prefix and has_magic(prefix):
        prefix = prefix.rsplit("/", 1)[0] if "/" in prefix else ""
    return f"{prefix.rstrip('/')}/" if prefix else None

Also applies to: 1097-1110

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/filesets/src/filesets/resources.py` around lines 604 - 617, The
current implementation in the code block around the list_files call sends no
path parameter for glob patterns, forcing full-fileset scans. Implement a helper
function to extract the static (non-magic) directory prefix from glob patterns,
then use this prefix as the api_path for server-side filtering. Replace the
simple has_magic check on the path variable with a call to a new function that
returns the static prefix by repeatedly removing path segments until a non-magic
prefix is found, then pass this prefix to the query_params instead of the full
path. Apply this same pattern to the other similar code block mentioned at lines
1097-1110 to ensure consistent server-side filtering across all list_files
calls.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/filesets/src/filesets/filesystem/filesystem.py`:
- Around line 361-399: The `AsyncNemoClient` class stores the `http_client`
passed to it in the `_client_from_sdk` method but has no cleanup mechanism to
close the connection, causing resource leaks when instances are destroyed. Add
async cleanup methods such as `__aexit__` or `aclose` to the `AsyncNemoClient`
class to properly close the `http_client` when the instance is destroyed, or
alternatively, have `FilesetFileSystem` track and manage the lifecycle of the
`AsyncNemoClient` instances it creates by calling close on them when they are no
longer needed.
- Around line 612-626: The `send()` method does not raise exceptions on non-2xx
HTTP responses; only calling `.data()` on the response does. The `_rm_file`
method and `_pipe_file` method are discarding the response from `send()` without
calling `.data()`, which silently swallows errors like 403, 408, 507, or 500. To
fix this, capture the result of each `await self._client.send()` call and
immediately call `.data()` on it to ensure HTTP errors are surfaced before the
parent cache is invalidated. Apply this same fix pattern to the `_pipe_stream`
and `_put_file` methods as mentioned in the comment.

In `@packages/filesets/src/filesets/resources.py`:
- Around line 145-156: The fsspec property is creating an AsyncNemoClient but
only passing base_url and workspace parameters, which discards important HTTP
configuration from the original sync client self._client. To fix this, extract
the HTTP configuration (including auth headers, custom headers for service
principals/identity markers, and timeout settings) from self._client and pass
them to AsyncNemoClient either by extracting headers from
self._client._http.headers and passing them as the default_headers parameter, or
by following the existing pattern used in FilesetFileSystem._client_from_sdk to
properly construct an http_client parameter that preserves the full
configuration. This ensures the async client has the same authentication and
header setup as the original sync client.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py`:
- Around line 79-88: The NAME_PATTERN regex uses \w which matches Unicode word
characters, but the CreateFilesetRequest name field documentation explicitly
limits allowed characters to ASCII letters (a-z, A-Z), digits (0-9),
underscores, hyphens, and dots. Update the NAME_PATTERN regex definition to use
character classes that match only the documented ASCII character set instead of
the broader \w pattern, ensuring the regex restriction aligns with the field
description.

In `@services/core/files/src/nmp/core/files/api/v2/filesets/schemas.py`:
- Line 62: The file_url construction is directly interpolating file_info.path
without URL encoding, which breaks URLs containing special characters like
spaces, question marks, hashes, or percent signs. In the file_url assignment
where file_info.path is included in the f-string, apply URL encoding to the path
component using urllib.parse.quote() to ensure special characters are properly
escaped before being inserted into the URL path.

---

Outside diff comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`:
- Around line 146-151: The docstring example for the send method (or similar
client method) showing binary and streaming endpoint usage is outdated and no
longer matches the actual API. The example demonstrates using with
client.send(...) as resp: for chunk in resp: which relies on __enter__ and
__iter__ methods that the new wrappers no longer implement. Update the docstring
example to use the new resp.stream() API instead of the context manager and
direct iteration pattern, ensuring the example accurately reflects how the
current implementation should be used.

In `@services/core/files/src/nmp/core/files/app/cache.py`:
- Around line 154-156: The task creation loop in the anyio task group creates
one task for each file upfront with tg.start_soon, and the limiter is only
acquired inside the cache_file_directly function, causing all tasks to be
created and parked in memory. To fix this, acquire the limiter before creating
each task by using a semaphore in the task creation loop itself. Wrap the
cache_with_logging call in a helper function that acquires the limiter first,
then releases it after the caching operation completes, so that only a bounded
number of tasks are created at any given time rather than creating all tasks
upfront.

---

Nitpick comments:
In `@packages/filesets/src/filesets/resources.py`:
- Around line 604-617: The current implementation in the code block around the
list_files call sends no path parameter for glob patterns, forcing full-fileset
scans. Implement a helper function to extract the static (non-magic) directory
prefix from glob patterns, then use this prefix as the api_path for server-side
filtering. Replace the simple has_magic check on the path variable with a call
to a new function that returns the static prefix by repeatedly removing path
segments until a non-magic prefix is found, then pass this prefix to the
query_params instead of the full path. Apply this same pattern to the other
similar code block mentioned at lines 1097-1110 to ensure consistent server-side
filtering across all list_files calls.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py`:
- Around line 159-179: The kwargs dictionary containing method, url, content,
headers, and params is being constructed identically in both the _is_binary and
_is_stream conditional branches, causing unnecessary code duplication. Extract
this dictionary construction and assign it to kwargs before the _is_binary
conditional check, then reuse the same kwargs variable in both the
NemoBinaryResponse and NemoStreamResponse instantiations. Apply the same
refactoring pattern to the async send method where the identical duplication
occurs with _is_binary and _is_stream checks as well.

In `@packages/nemo_platform_plugin/tests/files/test_endpoints.py`:
- Around line 98-115: The test file currently covers only bytes upload content
and full file download. Add additional test cases after the existing
test_upload_file() and test_download_file() functions to cover streaming upload
variants and partial download with Range headers. Create new test functions to
verify that endpoints.upload_file() correctly handles Iterable[bytes] and
AsyncIterable[bytes] content types (validating the method, path_params, content
type handling), and add a test for endpoints.download_file() that verifies Range
header preparation for partial downloads if the request plumbing supports it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8694569f-46e1-4fc9-98c1-41c895bbff96

📥 Commits

Reviewing files that changed from the base of the PR and between d8ea9e2 and 7e65bf1.

⛔ Files ignored due to path filters (4)

sdk/python/nemo-platform/.nmpcontext/openapi.yaml is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/filesets/filesystem/filesystem.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/filesets/resources.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/types/files/fileset_file.py is excluded by !sdk/**

📒 Files selected for processing (14)

openapi/ga/individual/platform.openapi.yaml
openapi/ga/openapi.yaml
openapi/openapi.yaml
packages/filesets/src/filesets/filesystem/filesystem.py
packages/filesets/src/filesets/resources.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/response.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/endpoints.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py
packages/nemo_platform_plugin/tests/files/test_endpoints.py
services/core/files/src/nmp/core/files/api/v2/filesets/endpoints.py
services/core/files/src/nmp/core/files/api/v2/filesets/schemas.py
services/core/files/src/nmp/core/files/app/cache.py
services/core/files/src/nmp/core/files/entities.py

coderabbitai · 2026-06-24T01:38:36Z

+    @staticmethod
+    def _client_from_sdk(sdk: Any) -> AsyncNemoClient:
+        """Convert a NeMoPlatform SDK instance to an AsyncNemoClient.
+
+        Handles both sync and async SDK instances. For sync SDKs, creates
+        a new AsyncNemoClient with a fresh httpx.AsyncClient.
+        """
+        import httpx
+        from nemo_platform import AsyncNeMoPlatform, NeMoPlatform

-    def _get_sdk(
-        self,
-        sdk: NeMoPlatform | AsyncNeMoPlatform,
-    ) -> AsyncNeMoPlatform:
-        # If already an async SDK, use it as-is to preserve custom transports (e.g., test clients)
        if isinstance(sdk, AsyncNeMoPlatform):
-            return sdk
+            return AsyncNemoClient(
+                base_url=str(sdk.base_url).rstrip("/"),
+                workspace=sdk.workspace,
+                http_client=sdk._client,
+            )

-        # Convert sync SDK to async SDK
+        if not isinstance(sdk, NeMoPlatform):
+            raise TypeError(f"Expected NeMoPlatform or AsyncNeMoPlatform, got {type(sdk).__name__}")
+
+        # Convert sync SDK to async client with a fresh httpx.AsyncClient
        transport: httpx.AsyncBaseTransport | None = None
-        if TestClient is not None and isinstance(sdk._client, TestClient):
-            # If using a synchronous test client, we should use the ASGITransport
-            transport = httpx.ASGITransport(app=sdk._client.app)
+        try:
+            from starlette.testclient import TestClient

-        return AsyncNeMoPlatform(
+            if isinstance(sdk._client, TestClient):
+                transport = httpx.ASGITransport(app=sdk._client.app)
+        except ImportError:
+            pass
+
+        return AsyncNemoClient(
+            base_url=str(sdk.base_url).rstrip("/"),
            workspace=sdk.workspace,
-            base_url=sdk.base_url,
-            timeout=sdk.timeout,
-            max_retries=sdk.max_retries,
-            default_headers=sdk._custom_headers,
-            default_query=sdk.default_query,
            http_client=httpx.AsyncClient(
                transport=transport,
-                base_url=sdk.base_url,
+                base_url=str(sdk.base_url).rstrip("/"),
                headers=sdk._custom_headers,
            ),
        )


🩺 Stability & Availability | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP -C4 '(http_client|aclose|__aexit__|async with|def close)' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 1176

🏁 Script executed:

ast-grep outline packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py --view expanded

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 1383

🏁 Script executed:

rg -nP 'class AsyncNemoClient' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py -A 150 | head -200

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 3388

🏁 Script executed:

rg -nP 'class AsyncNemoClient' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py -A 500 | tail -300

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 3388

🏁 Script executed:

rg -nP '(__aexit__|aclose|__del__)' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 163

🏁 Script executed:

rg -nP 'AsyncNemoClient' packages/filesets/src/filesets/filesystem/filesystem.py -B 2 -A 5

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 2433

🏁 Script executed:

wc -l packages/nemo_platform_plugin/src/nemo_platform_plugin/client/client.py

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 239

AsyncNemoClient doesn't close the http_client it receives or creates.

The freshly created httpx.AsyncClient in the sync-SDK branch (lines 394–398) is stored in AsyncNemoClient._http but never closed. AsyncNemoClient has no cleanup methods (__aexit__, aclose, or __del__), so the connection pool leaks when the instance is dropped. Either add a cleanup mechanism to AsyncNemoClient or track and close the client in FilesetFileSystem.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/filesets/src/filesets/filesystem/filesystem.py` around lines 361 - 399, The `AsyncNemoClient` class stores the `http_client` passed to it in the `_client_from_sdk` method but has no cleanup mechanism to close the connection, causing resource leaks when instances are destroyed. Add async cleanup methods such as `__aexit__` or `aclose` to the `AsyncNemoClient` class to properly close the `http_client` when the instance is destroyed, or alternatively, have `FilesetFileSystem` track and manage the lifecycle of the `AsyncNemoClient` instances it creates by calling close on them when they are no longer needed.

coderabbitai · 2026-06-24T01:38:36Z

+NAME_PATTERN = r"^[\w\-.]+$"
+MAX_LENGTH = 255
+
+
+class CreateFilesetRequest(BaseModel):
+    name: str = Field(
+        description="The name of the fileset. Allowed characters: letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots.",
+        max_length=MAX_LENGTH,
+        pattern=NAME_PATTERN,
+        examples=["training-data-v1", "llama-checkpoint"],


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

NAME_PATTERN is broader than the documented character set.

Line 79 uses \w, which accepts Unicode word characters; the field description says ASCII letters/digits/underscore/hyphen/dot only.

Proposed fix

-NAME_PATTERN = r"^[\w\-.]+$" +NAME_PATTERN = r"^[A-Za-z0-9_.-]+$"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py` around lines 79 - 88, The NAME_PATTERN regex uses \w which matches Unicode word characters, but the CreateFilesetRequest name field documentation explicitly limits allowed characters to ASCII letters (a-z, A-Z), digits (0-9), underscores, hyphens, and dots. Update the NAME_PATTERN regex definition to use character classes that match only the documented ASCII character set instead of the broader \w pattern, ensuring the regex restriction aligns with the field description.

coderabbitai · 2026-06-24T01:38:36Z

+) -> FilesetFileOutput:
+    """Convert a FileInfo to a FilesetFileOutput response DTO."""
+    return FilesetFileOutput(
+        file_url=f"/apis/files/v2/workspaces/{workspace}/filesets/{name}/-/{file_info.path}",


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Encode file_url path components.

Raw file_info.path breaks returned URLs when paths contain ?, #, spaces, or %.

Proposed fix

from typing import Annotated, Optional +from urllib.parse import quote

def fileset_file_output_from_info( workspace: str, name: str, file_info: FileInfo, cache_status: CacheStatus | None = None, ) -> FilesetFileOutput: """Convert a FileInfo to a FilesetFileOutput response DTO.""" + encoded_workspace = quote(workspace, safe="") + encoded_name = quote(name, safe="") + encoded_path = quote(file_info.path, safe="/") return FilesetFileOutput( - file_url=f"/apis/files/v2/workspaces/{workspace}/filesets/{name}/-/{file_info.path}", + file_url=f"/apis/files/v2/workspaces/{encoded_workspace}/filesets/{encoded_name}/-/{encoded_path}", file_ref=f"{workspace}/{name}#{file_info.path}",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/core/files/src/nmp/core/files/api/v2/filesets/schemas.py` at line 62, The file_url construction is directly interpolating file_info.path without URL encoding, which breaks URLs containing special characters like spaces, question marks, hashes, or percent signs. In the file_url assignment where file_info.path is included in the f-string, apply URL encoding to the path component using urllib.parse.quote() to ensure special characters are properly escaped before being inserted into the URL path.

github-actions · 2026-06-24T01:38:42Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	21020/27580	76.2%	61.3%
Integration Tests	12304/26349	46.7%	20.0%

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/filesets/src/filesets/resources.py`:
- Around line 234-239: The fsspec property in FilesResource class is using
sdk=self._raw_client to instantiate FilesetFileSystem, which fails when
FilesResource is initialized with NemoClient because _client_from_sdk only
accepts NeMoPlatform or AsyncNeMoPlatform. Fix this by changing the
FilesetFileSystem instantiation in the fsspec cached_property method to use
client=self._client instead of sdk=self._raw_client, matching the pattern
already used in the async variant at line 762.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: eb0652c3-d9d5-4b66-a12f-e59bd051336d

📥 Commits

Reviewing files that changed from the base of the PR and between 7e65bf1 and a31d202.

⛔ Files ignored due to path filters (2)

sdk/python/nemo-platform/src/nemo_platform/filesets/filesystem/filesystem.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/filesets/resources.py is excluded by !sdk/**

📒 Files selected for processing (2)

packages/filesets/src/filesets/filesystem/filesystem.py
packages/filesets/src/filesets/resources.py

🚧 Files skipped from review as they are similar to previous changes (1)

packages/filesets/src/filesets/filesystem/filesystem.py

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

packages/filesets/src/filesets/resources.py (1)
213-214: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Parameterize the filter dict type.

Use dict[str, Any] instead of bare dict in both sync and async signatures.
Proposed fix
-        filter: str | dict | None = None,
+        filter: str | dict[str, Any] | None = None,
...
-        filter: str | dict | None = None,
+        filter: str | dict[str, Any] | None = None,
As per coding guidelines, "**/*.py: Always prefer concrete type hints over string-based ones in Python code."

Also applies to: 305-306
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/filesets/src/filesets/resources.py` around lines 213 - 214, The
Fileset API signatures still use an unparameterized filter dict type, so update
both the sync and async method signatures in the Fileset resource class to use
dict[str, Any] instead of bare dict. Make the same change wherever the paired
signatures appear (including the other occurrence referenced by this review) and
keep the existing union with str | None intact.
Source: Coding guidelines
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py (1)
41-48: 📐 Maintainability & Code Quality | 🔵 Trivial

Replace string return annotations with Self. Python target is 3.11+, so typing.Self is available; update both validators to return Self instead of "DatasetMetadataContent".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py`
around lines 41 - 48, The two validator methods still use a stringified return
annotation for the model type, which should be updated to use typing.Self now
that the runtime target is Python 3.11+. In DatasetMetadataContent, change the
return annotations on validate_schema_refs and validate_json_schemas from the
quoted class name to Self, keeping the validator behavior unchanged and ensuring
the type hints match the current language support.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py`:
- Around line 74-78: The body serialization in the endpoint handling for "body"
is dropping default-valued fields because `model_dump_json(exclude_unset=True)`
omits them, which affects requests like `create_fileset()` using
`CreateFilesetRequest`. Update the serialization in the `endpoint.py` body
branch to include default values so request payloads preserve fields such as
`purpose`, `metadata`, `custom_fields`, and `cache` unless there is an explicit
reason to omit them.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py`:
- Around line 21-27: The DatasetMetadataContent model currently serializes with
the schema alias but does not accept the field name schema_, so constructor
calls using schema_=... are ignored. Update the model_config in
DatasetMetadataContent to allow population by field name as well as alias, so
both schema_ and schema map to the same schema field while keeping
serialize_by_alias enabled.

In
`@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/storage_config.py`:
- Line 9: Drop the unnecessary future import in storage_config.py by removing
from __future__ import annotations at the top of the module. This file can use
concrete type hints directly because StorageConfig is defined before it is
referenced, so keep the existing annotations as-is and ensure no string-based
hints are introduced in the module.

In `@services/core/files/tests/integration/test_files_sdk.py`:
- Around line 1107-1112: The re-enabled upload_content test is too broad because
pytest.raises(Exception) can hide unrelated failures; update the assertion in
test_files_sdk.py around sdk_user.files.upload_content to expect the specific
auth/API exception and verify the failure status returned by the SDK. Use the
upload_content call and the surrounding auth-failure setup to pinpoint the exact
exception type/value rather than accepting any Exception.

---

Nitpick comments:
In `@packages/filesets/src/filesets/resources.py`:
- Around line 213-214: The Fileset API signatures still use an unparameterized
filter dict type, so update both the sync and async method signatures in the
Fileset resource class to use dict[str, Any] instead of bare dict. Make the same
change wherever the paired signatures appear (including the other occurrence
referenced by this review) and keep the existing union with str | None intact.

In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py`:
- Around line 41-48: The two validator methods still use a stringified return
annotation for the model type, which should be updated to use typing.Self now
that the runtime target is Python 3.11+. In DatasetMetadataContent, change the
return annotations on validate_schema_refs and validate_json_schemas from the
quoted class name to Self, keeping the validator behavior unchanged and ensuring
the type hints match the current language support.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4a5e4daf-0faf-4b86-b531-92b4805d5d6d

📥 Commits

Reviewing files that changed from the base of the PR and between a31d202 and d25a20c.

⛔ Files ignored due to path filters (3)

sdk/python/nemo-platform/src/nemo_platform/filesets/filesystem/filesystem.py is excluded by !sdk/**
sdk/python/nemo-platform/src/nemo_platform/filesets/resources.py is excluded by !sdk/**
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (18)

packages/filesets/src/filesets/filesystem/filesystem.py
packages/filesets/src/filesets/resources.py
packages/nemo_platform/pyproject.toml
packages/nemo_platform_plugin/pyproject.toml
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/response.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/storage_config.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py
packages/nemo_platform_plugin/tests/files/test_endpoints.py
packages/nmp_common/src/nmp/common/api/common.py
packages/nmp_common/src/nmp/common/files/metadata.py
packages/nmp_common/src/nmp/common/files/storage_config.py
plugins/nemo-data-designer/pyproject.toml
plugins/nemo-safe-synthesizer/pyproject.toml
services/core/files/tests/integration/test_files_basic.py
services/core/files/tests/integration/test_files_sdk.py
services/core/files/tests/integration/test_fileset_filesystem.py

💤 Files with no reviewable changes (1)

services/core/files/tests/integration/test_fileset_filesystem.py

🚧 Files skipped from review as they are similar to previous changes (4)

packages/nemo_platform_plugin/tests/files/test_endpoints.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py
packages/nemo_platform_plugin/src/nemo_platform_plugin/client/response.py
packages/filesets/src/filesets/filesystem/filesystem.py

coderabbitai · 2026-06-24T16:17:25Z

        elif name == "body":
            if not isinstance(value, BaseModel):
                raise TypeError(f"body must be a BaseModel instance, got {type(value).__name__}")
-            content = value.model_dump_json().encode()
+            content = value.model_dump_json(exclude_unset=True).encode()
            content_type = "application/json"


🗄️ Data Integrity & Integration | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all BaseModel request types passed as `body` and inspect their default-valued fields. rg -nP '@(post|put)\b' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/endpoints.py -A6 rg -nP 'class \w+Request\b' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py -A20

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 2453

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py printf '\n--- types.py excerpt ---\n' sed -n '90,220p' packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py printf '\n--- endpoint usages ---\n' rg -n "body:\s*\w+Request|model_dump_json$exclude_unset=True$" packages/nemo_platform_plugin/src/nemo_platform_plugin -n

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 10147

🏁 Script executed:

#!/bin/bash set -euo pipefail cat -n packages/nemo_platform_plugin/src/nemo_platform_plugin/files/types.py | sed -n '95,190p'

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 3498

exclude_unset=True drops default-valued request fields.

CreateFilesetRequest has defaults (purpose, metadata, custom_fields, cache), so create_fileset() will omit them unless callers set them explicitly. If the server doesn’t reapply those defaults, POST/PUT bodies can lose data.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/client/endpoint.py` around lines 74 - 78, The body serialization in the endpoint handling for "body" is dropping default-valued fields because `model_dump_json(exclude_unset=True)` omits them, which affects requests like `create_fileset()` using `CreateFilesetRequest`. Update the serialization in the `endpoint.py` body branch to include default values so request payloads preserve fields such as `purpose`, `metadata`, `custom_fields`, and `cache` unless there is an explicit reason to omit them.

coderabbitai · 2026-06-24T16:17:25Z

+    model_config = ConfigDict(serialize_by_alias=True)
+
+    schema_: dict | str | None = Field(
+        default=None,
+        alias="schema",
+        description="Default row schema for files in this fileset, either inline JSON Schema or a schema_defs key.",
+    )


🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' from pydantic import BaseModel, ConfigDict, Field class M(BaseModel): model_config = ConfigDict(serialize_by_alias=True) schema_: dict | str | None = Field(default=None, alias="schema") print("schema_ input:", M(schema_={"type": "object"}).schema_) print("schema alias input:", M(schema={"type": "object"}).schema_) PY

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 222

🏁 Script executed:

#!/bin/bash set -euo pipefail file='packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py' wc -l "$file" sed -n '1,220p' "$file"

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 4535

🏁 Script executed:

#!/bin/bash set -euo pipefail file='packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py' wc -l "$file" sed -n '1,220p' "$file"

Repository: NVIDIA-NeMo/nemo-platform

Length of output: 4535

Accept schema_= as well as schema.

With alias="schema" and no populate_by_name=True, DatasetMetadataContent(schema_=...) is silently ignored. Add name population.

Fix

- model_config = ConfigDict(serialize_by_alias=True) + model_config = ConfigDict(populate_by_name=True, serialize_by_alias=True)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model_config = ConfigDict(serialize_by_alias=True)

schema_: dict | str | None = Field(

default=None,

alias="schema",

description="Default row schema for files in this fileset, either inline JSON Schema or a schema_defs key.",

)

model_config = ConfigDict(populate_by_name=True, serialize_by_alias=True)

schema_: dict | str | None = Field(

default=None,

alias="schema",

description="Default row schema for files in this fileset, either inline JSON Schema or a schema_defs key.",

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/metadata.py` around lines 21 - 27, The DatasetMetadataContent model currently serializes with the schema alias but does not accept the field name schema_, so constructor calls using schema_=... are ignored. Update the model_config in DatasetMetadataContent to allow population by field name as well as alias, so both schema_ and schema map to the same schema field while keeping serialize_by_alias enabled.

coderabbitai · 2026-06-24T16:17:25Z

+These configs can be used by any service that needs to interact with storage backends.
+"""
+
+from __future__ import annotations


📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Drop from __future__ import annotations.

It converts every annotation to a string. As per coding guidelines: "Always prefer concrete type hints over string-based ones in Python code". The union (StorageConfig) is defined before its use, so removal is safe here.

♻️ Proposed change

-from __future__ import annotations - import os

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from __future__ import annotations

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/nemo_platform_plugin/src/nemo_platform_plugin/files/storage_config.py` at line 9, Drop the unnecessary future import in storage_config.py by removing from __future__ import annotations at the top of the module. This file can use concrete type hints directly because StorageConfig is defined before it is referenced, so keep the existing annotations as-is and ensure no string-based hints are introduced in the module.

Source: Coding guidelines

coderabbitai · 2026-06-24T16:17:25Z

        with pytest.raises(Exception):
-            sdk_user.files._upload_file(
-                "user.txt",
-                b"from user",
+            sdk_user.files.upload_content(
+                content=b"from user",
+                remote_path="user.txt",
+                fileset=name,
                workspace=workspace,


🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Assert the expected auth failure, not any exception.

When AIRCORE-828 is fixed and this test is re-enabled, pytest.raises(Exception) can pass on unrelated upload_content failures. Narrow this to the expected auth/API error and status.

Suggested test assertion

- with pytest.raises(Exception): + with pytest.raises(APIStatusError) as exc_info: sdk_user.files.upload_content( content=b"from user", remote_path="user.txt", fileset=name, workspace=workspace, ) + assert exc_info.value.status_code == 403

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with pytest.raises(Exception):

sdk_user.files._upload_file(

"user.txt",

b"from user",

sdk_user.files.upload_content(

content=b"from user",

remote_path="user.txt",

fileset=name,

workspace=workspace,

with pytest.raises(APIStatusError) as exc_info:

sdk_user.files.upload_content(

content=b"from user",

remote_path="user.txt",

fileset=name,

workspace=workspace,

)

assert exc_info.value.status_code == 403

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/core/files/tests/integration/test_files_sdk.py` around lines 1107 - 1112, The re-enabled upload_content test is too broad because pytest.raises(Exception) can hide unrelated failures; update the assertion in test_files_sdk.py around sdk_user.files.upload_content to expect the specific auth/API exception and verify the failure status returned by the SDK. Use the upload_content call and the surrounding auth-failure setup to pinpoint the exact exception type/value rather than accepting any Exception.

github-actions · 2026-06-24T18:30:26Z

🌿 Preview your docs: https://nvidia-preview-mgrossman-aircore-827-migrate-first-plugin-to-n.docs.buildwithfern.com/nemo-platform

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com> Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

…o-nemoclient-typed-http-client-files Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

SecretRef is a Pydantic RootModel which doesn't compare equal to plain strings. Use .root to access the underlying string value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

…o-nemoclient-typed-http-client-files Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

matthewgrossman requested review from a team as code owners June 24, 2026 01:28

github-actions Bot added the feat label Jun 24, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread packages/filesets/src/filesets/resources.py

matthewgrossman changed the title ~~feat(sdk): Migrate Files to use the new NemoClient SDK~~ feat(files): migrate filesets package to NemoClient typed HTTP client Jun 24, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

crookedstorm force-pushed the main branch from bfb0e6b to ed04d87 Compare June 24, 2026 18:07

crookedstorm requested a review from a team as a code owner June 24, 2026 18:07

matthewgrossman force-pushed the mgrossman/aircore-827-migrate-first-plugin-to-nemoclient-typed-http-client-files branch 2 times, most recently from f4c7aab to bcd3b8b Compare June 24, 2026 18:29

matthewgrossman force-pushed the mgrossman/aircore-827-migrate-first-plugin-to-nemoclient-typed-http-client-files branch from bcd3b8b to a7a009a Compare June 24, 2026 18:33

squash

82fb001

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com> Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

matthewgrossman force-pushed the mgrossman/aircore-827-migrate-first-plugin-to-nemoclient-typed-http-client-files branch from a7a009a to 82fb001 Compare June 24, 2026 18:58

matthewgrossman and others added 8 commits June 24, 2026 12:09

reduce diff

4c69439

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

make more concise

113160c

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

query params

05a8140

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

lint

3eb4961

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

Merge branch 'main' into mgrossman/aircore-827-migrate-first-plugin-t…

f10ea57

…o-nemoclient-typed-http-client-files Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

Merge branch 'main' into mgrossman/aircore-827-migrate-first-plugin-t…

c4b9e5e

…o-nemoclient-typed-http-client-files Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

vendor

567ecac

Signed-off-by: Matthew Grossman <mgrossman@nvidia.com>

Uh oh!

Conversation

matthewgrossman commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

What stays the same

Backward compatibility

Known limitations

Test changes

CI status

Test plan

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

matthewgrossman commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading