feat(ai) - store: list entries by stepmikhaylov · Pull Request #1044 · rocketride-org/rocketride-server

stepmikhaylov · 2026-06-01T18:31:34Z

Summary

Adds IStore.list_entries() abstract method — lists files and folders under a prefix with recursive, include_files, include_dirs, and glob_pattern filtering; directory entries carry a trailing /
Implements list_entries across all four backends: FilesystemStore, MemoryStore, S3Store, AzureBlobStore, each with integration tests
Moves s3.py and azure.py into their own sub-packages (store_providers/s3/, store_providers/azure/) with per-provider requirements.txt; deps are auto-installed by depends() on first import
Wires MinIO and Azurite into the Linux CI build so S3/Azure list_entries integration tests are mandatory on Ubuntu (docker not available on win/macOS, so tests skip cleanly there)

Test plan

test_store.py — FilesystemStore, MemoryStore, S3Store, AzureBlobStore list_entries tests pass locally
CI Ubuntu: MinIO and Azurite steps start cleanly, env vars exported, S3/Azure tests run (not skipped)
CI win/macOS: S3/Azure tests skip (env vars absent), no failure
S3Store / AzureBlobStore tests fail the job (not skip) if their respective service step fails to start

Summary by CodeRabbit

New Features
- Unified listing API across storage backends: recursive vs non-recursive, file/dir toggles, and name-pattern filtering.
- S3 accepts configurable custom endpoints; S3 and Azure providers exported for direct use.
- Path-traversal protections added across stores.
Tests
- Expanded and reorganized backend tests and integration scenarios covering listing behavior, patterns, and edge cases.
Chores
- Added commented .env template entries and CI steps to run local S3/Azure emulators; declared S3/Azure client dependencies.

github-actions · 2026-06-01T18:31:49Z

No description provided.

coderabbitai · 2026-06-01T18:31:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds IStore.list_entries(...) and implements it for S3 (optional endpoint), Azure, filesystem, and memory stores; registers provider requirements and entrypoints; refactors and expands store tests; and adds CI emulator steps plus .env.template entries for S3/Azure integration testing.

Changes

Storage List Entries API and Cloud Backend Support

Layer / File(s)	Summary
Core `list_entries()` interface and package notes `packages/ai/src/ai/account/store.py`, `packages/ai/src/ai/account/requirements.txt`	`IStore` replaces `list_files()` with `list_entries()` adding `recursive`, `include_files`, `include_dirs`, and `name_pattern`. Account package notes reference provider requirements and auto-install behavior.
S3 provider: entrypoint, requirements, endpoint, and listing `packages/ai/src/ai/account/store_providers/s3/__init__.py`, `packages/ai/src/ai/account/store_providers/s3/requirements.txt`, `packages/ai/src/ai/account/store_providers/s3/s3.py`	Adds S3 package entrypoint, declares `boto3`, supports JSON `endpoint` in `secret_key` stored to `_endpoint`, includes `endpoint_url` when constructing boto3 clients, and implements `list_entries()` with recursive/non-recursive modes, directory synthesis, pagination, and `fnmatch` filtering.
Azure provider: entrypoint, requirements, and listing `packages/ai/src/ai/account/store_providers/azure/__init__.py`, `packages/ai/src/ai/account/store_providers/azure/requirements.txt`, `packages/ai/src/ai/account/store_providers/azure/azure.py`	Adds Azure package entrypoint and requirements, updates imports, and implements `list_entries()` using `list_blobs`/`walk_blobs` with retry, directory deduplication, and `fnmatch` filtering; adds traversal check in `_get_blob_name`.
Filesystem and Memory stores `packages/ai/src/ai/account/store_providers/filesystem.py`, `packages/ai/src/ai/account/store_providers/memory.py`	FilesystemStore adds `list_entries()` using `glob`/`rglob` with `fnmatch` and pattern validation (rejects `/`, `\\`, `..`). MemoryStore adds `_check_path()` guards on mutate ops and implements regex-based `list_entries()` that synthesizes directories and supports wildcard patterns.
Tests: BaseStoreTest and comprehensive `list_entries()` coverage `packages/ai/tests/ai/account/test_store.py`	Introduce `BaseStoreTest`, centralize imports/fixtures, add a pre-populated filesystem fixture and an extensive `list_entries()` test matrix (recursive, include/exclude, prefix, name-pattern, traversal/invalid pattern cases), and rename mocked S3/Azure test classes.
CI emulators and `.env.template` `.github/workflows/_build.yaml`, `.env.template`	Adds Ubuntu-only GitHub Actions steps to launch MinIO and Azurite with readiness and smoke checks and export `ROCKETRIDE_TEST_S3_` / `ROCKETRIDE_TEST_AZURE_` to `$GITHUB_ENV`. `.env.template` documents commented S3/Azure test env vars.

Sequence Diagram

sequenceDiagram
  participant CI as GitHub Actions
  participant MinIO as MinIO (S3 emulator)
  participant Azurite as Azurite (Blob emulator)
  participant Tests as Integration Tests
  CI->>MinIO: start MinIO container
  MinIO-->>CI: /minio/health/live ready
  CI->>MinIO: run aws s3 create/put/list smoke
  CI->>CI: export ROCKETRIDE_TEST_S3_* to $GITHUB_ENV
  CI->>Azurite: start Azurite container
  Azurite-->>CI: TCP port 10000 accepting
  CI->>Azurite: create container upload/list blob smoke
  CI->>CI: export ROCKETRIDE_TEST_AZURE_* to $GITHUB_ENV
  CI->>Tests: run integration tests with exported env vars

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

jmaionchi
Rod-Christensen

"I hopped through code to count files and dirs,
matching patterns and guarding against sly traversals,
MinIO and Azurite hummed in CI's light,
tests list, filter, and sort each tiny sight,
a rabbit's whisker twitched at every storage trail."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(ai) - store: list entries' clearly and concisely summarizes the main change: adding a list_entries feature to the store module. It is specific, directly related to the primary objective, and follows conventional commit naming.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/store-list-entries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/_build.yaml:
- Around line 161-163: The workflow uses floating image tags for the storage
emulators in the docker run commands—specifically the MinIO image
`minio/minio:latest` and the Azurite image
`mcr.microsoft.com/azure-storage/azurite`—which makes CI non-deterministic;
update those image references to fixed versions or, preferably, immutable
digests (e.g., `minio/minio:<tag>` or `minio/minio@sha256:<digest>` and
`mcr.microsoft.com/azure-storage/azurite:<tag>` or `@sha256:<digest>`) in the
docker run lines so the CI uses pinned emulator releases for reproducible tests.

In `@packages/ai/src/ai/account/store_providers/azure/azure.py`:
- Around line 340-348: In list_entries, fix prefix handling: compute blob_prefix
as before, then build name_starts_with = blob_prefix if not
blob_prefix.endswith('/') else blob_prefix.rstrip('/'); for non-recursive
listings pass name_starts_with + '/' only when blob_prefix is non-empty (so root
'' doesn't become '/'), and adjust prefix_part_len to account for a non-empty
blob_prefix that doesn't end with '/' by adding 1 (e.g. prefix_part_len =
blob_prefix.count('/') - self._prefix.count('/') + (1 if blob_prefix and not
blob_prefix.endswith('/') else 0)); use these values when calling
container_client.list_blobs and container_client.walk_blobs and when computing
depth to decide whether to emit entries.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Around line 364-378: The branch must treat trailing-slash S3 marker keys
before file handling: in the block handling rel.endswith('/'), check the match
and add the directory to dirs when include_dirs is true (using the same
_match(rel.rstrip('/').split('/')[-1]) logic), then unconditionally continue to
skip further file logic so include_files cannot return keys like "sub/"; keep
using the existing locals (rel, include_dirs, include_files, _match, dirs,
files, parts, prefix_part_len, prefix) and ensure the trailing-slash case
returns early (continue) after optionally adding the dir.

In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 436-438: The section header preceding the TestAzureBlobStore class
is incorrect ("S3 Tests"); update the banner text to accurately reflect the
Azure integration (e.g., "Azure Blob Tests" or "Azure Blob Storage Tests") so it
matches the TestAzureBlobStore class and surrounding Azure tests; locate the
header comment above the TestAzureBlobStore declaration and replace the "S3
Tests" text with the corrected Azure label.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4bd30557-6d58-4094-bc70-f9e0e5ba00ed

📥 Commits

Reviewing files that changed from the base of the PR and between 9124b25 and ad141e2.

📒 Files selected for processing (13)

.env.template
.github/workflows/_build.yaml
packages/ai/src/ai/account/requirements.txt
packages/ai/src/ai/account/store.py
packages/ai/src/ai/account/store_providers/azure/__init__.py
packages/ai/src/ai/account/store_providers/azure/azure.py
packages/ai/src/ai/account/store_providers/azure/requirements.txt
packages/ai/src/ai/account/store_providers/filesystem.py
packages/ai/src/ai/account/store_providers/memory.py
packages/ai/src/ai/account/store_providers/s3/__init__.py
packages/ai/src/ai/account/store_providers/s3/requirements.txt
packages/ai/src/ai/account/store_providers/s3/s3.py
packages/ai/tests/ai/account/test_store.py

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

packages/ai/src/ai/account/store_providers/azure/azure.py (1)

341-341: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

prefix_part_len still under-counts for a no-base-prefix store (previously flagged).

The name_starts_with root-listing concern was addressed, but blob_prefix.count('/') - self._prefix.count('/') remains slash-based rather than segment-based. With an empty self._prefix, list_entries('sub') computes prefix_part_len=0 and the loop at Line 364 emits 'sub/' (the directory being listed) as a spurious entry, unlike the non-empty-prefix case.

🐛 Proposed fix

-            blob_prefix = self._get_blob_name(prefix) if prefix else self._prefix
-            prefix_part_len = blob_prefix.count('/') - self._prefix.count('/')
+            blob_prefix = self._get_blob_name(prefix) if prefix else self._prefix
+            base_depth = len([p for p in self._prefix.split('/') if p])
+            prefix_depth = len([p for p in blob_prefix.split('/') if p])
+            prefix_part_len = max(0, prefix_depth - base_depth)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/account/store_providers/azure/azure.py` at line 341, The
calculation of prefix_part_len in azure.py uses slash counts which miscounts
when self._prefix is empty; replace the slash-based logic in the method that
computes prefix_part_len (the variable currently set with blob_prefix.count('/')
- self._prefix.count('/')) with a segment-based calculation: split blob_prefix
and self._prefix on '/' and count non-empty segments (e.g. parts = lambda s: [p
for p in s.split('/') if p]) then set prefix_part_len = max(0,
len(parts(blob_prefix)) - len(parts(self._prefix))); update the subsequent loop
that emits entries (the code that produces 'sub/' entries) to use this new
prefix_part_len so it no longer yields the directory itself for no-base-prefix
stores.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.template:
- Around line 51-60: Update the .env.template to include the full Azurite
development account key for ROCKETRIDE_TEST_AZURE_ACCOUNT_KEY so developers can
run AzureBlobStore tests locally without looking it up; replace the placeholder
"..." with the known Azurite dev key value used in CI and add a gitleaks:allow
comment on that line (and a short inline note) so the secret scanner permits
this well-known dev credential; refer to the ROCKETRIDE_TEST_AZURE_ACCOUNT_KEY
variable in the AZURE TEST VARIABLES block when making the change.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Line 372: Remove the dead/debug expression len(str(prefix_part_len) + prefix)
— it computes a value and discards it; locate where prefix_part_len and prefix
are used in packages/ai/src/ai/account/store_providers/s3/s3.py (inside the
function/method containing that expression) and delete that standalone statement
so no unused computation remains.
- Line 333: The computed prefix_part_len using "key_prefix.count('/') -
self._prefix.count('/')" undercounts when self._prefix is empty because it
counts slashes rather than non-empty path segments; update the calculation in
the S3 provider (within list_entries and the variables key_prefix and
self._prefix) to compute depth from non-empty segments, e.g. split both
key_prefix and self._prefix on '/' and use len([s for s in ... if s]) to get
segment counts, then set prefix_part_len = len(key_prefix_segments) -
len(self_prefix_segments) so the synthesized-dir loop that builds directory
entries (the synthesized-dir logic near prefix_part_len usage) won't emit the
queried directory as a spurious entry.

In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 415-418: The try/except currently catches all Exceptions around
client.head_bucket(...) and may hide real errors; change it to catch
botocore.exceptions.ClientError (import ClientError from botocore.exceptions),
inspect the error code via e.response['Error']['Code'] and only call
client.create_bucket(Bucket=test_config['bucket']) when the code indicates the
bucket is missing (e.g., 'NoSuchBucket' or '404'); for any other ClientError
re-raise the exception so auth/permission errors still fail the test.

---

Duplicate comments:
In `@packages/ai/src/ai/account/store_providers/azure/azure.py`:
- Line 341: The calculation of prefix_part_len in azure.py uses slash counts
which miscounts when self._prefix is empty; replace the slash-based logic in the
method that computes prefix_part_len (the variable currently set with
blob_prefix.count('/') - self._prefix.count('/')) with a segment-based
calculation: split blob_prefix and self._prefix on '/' and count non-empty
segments (e.g. parts = lambda s: [p for p in s.split('/') if p]) then set
prefix_part_len = max(0, len(parts(blob_prefix)) - len(parts(self._prefix)));
update the subsequent loop that emits entries (the code that produces 'sub/'
entries) to use this new prefix_part_len so it no longer yields the directory
itself for no-base-prefix stores.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e77ec1e4-91b1-45e9-9c76-1760692500b0

📥 Commits

Reviewing files that changed from the base of the PR and between ad141e2 and 986d2c8.

📒 Files selected for processing (10)

.env.template
.github/workflows/_build.yaml
packages/ai/src/ai/account/requirements.txt
packages/ai/src/ai/account/store_providers/azure/__init__.py
packages/ai/src/ai/account/store_providers/azure/azure.py
packages/ai/src/ai/account/store_providers/azure/requirements.txt
packages/ai/src/ai/account/store_providers/s3/__init__.py
packages/ai/src/ai/account/store_providers/s3/requirements.txt
packages/ai/src/ai/account/store_providers/s3/s3.py
packages/ai/tests/ai/account/test_store.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/_build.yaml:
- Around line 215-216: Remove the unconditional failing CI step named "Temp
fail" that runs "exit 1" (the workflow step with name "Temp fail" and run: exit
1); either delete this step entirely or replace it with a conditional/debug-only
guard so it does not execute in normal runs (e.g., remove the run: exit 1 line
and the step block) to allow the emulator/test flow and entire workflow to run
and validate normally.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ecb3828-9d9f-4b15-81ed-49bd99795811

📥 Commits

Reviewing files that changed from the base of the PR and between 986d2c8 and a68da4b.

📒 Files selected for processing (1)

.github/workflows/_build.yaml

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

packages/ai/tests/ai/account/test_store.py (1)

416-421: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle NoSuchBucket in the S3 preflight check.

The missing-bucket branch only accepts error code '404'. Many S3-compatible responses use 'NoSuchBucket', which will incorrectly fail setup instead of creating the bucket.

Proposed fix

         try:
             client.head_bucket(Bucket=test_config['bucket'])
         except ClientError as e:
-            if e.response.get('Error', {}).get('Code') != '404':
-                raise e
+            code = e.response.get('Error', {}).get('Code')
+            if code not in ('404', 'NoSuchBucket'):
+                raise
             client.create_bucket(Bucket=test_config['bucket'])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/tests/ai/account/test_store.py` around lines 416 - 421, The S3
preflight except block should treat both HTTP 404 and S3 error code
"NoSuchBucket" as a missing-bucket condition; update the ClientError handling
around client.head_bucket(...) so it checks the error code via
e.response.get('Error', {}).get('Code') and only re-raises if the code is
neither '404' nor 'NoSuchBucket', otherwise call
client.create_bucket(Bucket=test_config['bucket']); keep the use of ClientError
and the same bucket variable (test_config['bucket']) and ensure the logic lives
in the same try/except that wraps client.head_bucket.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/account/store_providers/filesystem.py`:
- Around line 351-364: list_entries/list_files are returning internal lock
artifacts generated by read_file/read_file_with_metadata/write_file_atomic;
update the iteration in list_entries (and similarly in list_files) to skip files
whose names match the lock sibling pattern (e.g., names matching regex like
/^\..+\.lock$/ — dot-prefixed, ending with .lock). Concretely, inside the loop
over items in list_entries (the block using search_path.rglob / search_path.glob
and checking item.is_file()), add a guard to continue/skip if item.name matches
the dot-prefixed lock pattern so those entries are not appended to result; do
the same filtering in list_files if it has analogous iteration logic (refer to
functions list_entries and list_files and the
read_file/read_file_with_metadata/write_file_atomic/delete_file behaviors).

In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 425-429: The cleanup currently lists and deletes all objects with
Prefix='tmp_' which risks removing other tests' data; change the paginate Prefix
to use the test's unique prefix (e.g., combine the shared 'tmp_' base with the
current test's unique identifier like test_prefix or fixture-provided prefix) so
the call in the client.get_paginator('list_objects_v2') loop only targets
objects under that test-specific prefix, and apply the same change to the second
occurrence around lines 481-482; keep using
paginator.paginate(Bucket=test_config['bucket'], Prefix=...) and only modify the
Prefix value to the scoped test prefix.

---

Duplicate comments:
In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 416-421: The S3 preflight except block should treat both HTTP 404
and S3 error code "NoSuchBucket" as a missing-bucket condition; update the
ClientError handling around client.head_bucket(...) so it checks the error code
via e.response.get('Error', {}).get('Code') and only re-raises if the code is
neither '404' nor 'NoSuchBucket', otherwise call
client.create_bucket(Bucket=test_config['bucket']); keep the use of ClientError
and the same bucket variable (test_config['bucket']) and ensure the logic lives
in the same try/except that wraps client.head_bucket.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2fb013d6-7ffd-48b9-b5dc-98eb60673c35

📥 Commits

Reviewing files that changed from the base of the PR and between a68da4b and 65033f4.

📒 Files selected for processing (13)

.env.template
.github/workflows/_build.yaml
packages/ai/src/ai/account/requirements.txt
packages/ai/src/ai/account/store.py
packages/ai/src/ai/account/store_providers/azure/__init__.py
packages/ai/src/ai/account/store_providers/azure/azure.py
packages/ai/src/ai/account/store_providers/azure/requirements.txt
packages/ai/src/ai/account/store_providers/filesystem.py
packages/ai/src/ai/account/store_providers/memory.py
packages/ai/src/ai/account/store_providers/s3/__init__.py
packages/ai/src/ai/account/store_providers/s3/requirements.txt
packages/ai/src/ai/account/store_providers/s3/s3.py
packages/ai/tests/ai/account/test_store.py

kwit75

Solid shape on the list-entries work — pagination is correct across all four providers (S3 uses a list_objects_v2 paginator, Azure's list_blobs/walk_blobs iterators follow continuation tokens — no 1000-key truncation), and filesystem/memory parity is exact. Requesting changes on two things the current tests hide, because they bite in our prod store config:

1 — (major) off-by-one in S3 + Azure list_entries when the store has no URL prefix. prefix_part_len = key_prefix.count("/") - self._prefix.count("/") is wrong when self._prefix == "". With an empty store prefix, a recursive list_entries("sub") emits the queried directory itself (sub/) as a spurious entry. Every test passes only because the fixtures use s3://bucket/tmp_<uuid> (a 1-segment prefix), which happens to make the subtraction come out to 1. Under prod STORE_URL=s3://rocketride-saas-dev-storage (empty path), a FileStore-style key users/<id>/files/sub would leak users/<id>/files/sub/. (s3/s3.py ~L333/372, azure/azure.py ~L341/364.) Fix: derive base depth from the segment count of the relative query prefix rather than subtracting absolute-key slash counts — and please add a no-store-prefix test variant so it can't regress silently.

2 — (major) traversal guard is unreachable with no store prefix. The raise StorageError("Path traversal detected") in _get_key / _get_blob_name lives inside if self._prefix:. With an empty prefix, list_entries("sub/../..") normalizes to .. and does NOT raise — test_list_entries_prefix_traversal_protection only passes thanks to the tmp_ fixture prefix. It fails safe on object stores (Prefix ../ matches nothing) so it's not an actual escape, but the guarantee the test asserts doesn't hold in prod. Move the check so it also runs when self._prefix is empty.

Minors (non-blocking): no MaxKeys/limit on recursive listing (fine now; cap before exposing to a user-facing endpoint); list_entries doesn't filter FileStore's .dirmarker sentinel (decide at the wiring PR); ruff B007/A001 shadowing in _iter_keys (prefix/dir); memory _check_path uses ".." in filename substring (false-positives on my..file.txt).

Both majors are currently dormant (no caller — file_store.py is untouched), so functionality isn't at risk today — but the fix + a no-prefix test are tiny and this is exactly our prod config, so better to nail it here than land a passing-but-wrong baseline. 🙌

FYI unrelated to your code: the earlier red CI was just the leftover Temp fail / exit 1 debug step at the old head (gone now, re-run is green-trending). Merge-order: this overlaps #1048's _build.yaml Test-step edit — land #1048 first, then rebase, so this also inherits -s.

Wires a local MinIO into the Linux build's Test step so the S3Store integration tests (gated on boto3 + ROCKETRIDE_TEST_S3_*) run in CI. Ubuntu-only (service containers aren't available on win/macOS), and continue-on-error + late var-export keep a MinIO hiccup from ever blocking a CI/release build (tests skip cleanly). Preinstalls boto3 (an optional extra) so the suite actually activates. Tests themselves land via Stepan's feat/deploy-v2 (commit 027bb64).

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/ai/src/ai/account/store_providers/s3/s3.py (1)
45-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize blank endpoint values to None before creating the boto3 client.

self._endpoint is cached from creds.get('endpoint') and later passed as endpoint_url=self._endpoint. When the config provides endpoint: "", boto3 raises ValueError: Invalid endpoint: instead of using the default endpoint. Normalize falsey values to None when parsing.
Proposed fix
-                self._endpoint = creds.get('endpoint')
+                self._endpoint = creds.get('endpoint') or None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py` around lines 45 - 55,
When parsing credentials in the S3 provider (inside the block that sets
self._access_key_id / self._secret_access_key / self._region), normalize the
endpoint value to None for any falsey or blank string so boto3 receives None
instead of an empty string; replace the direct assignment self._endpoint =
creds.get('endpoint') with logic that sets self._endpoint to None when the
retrieved endpoint is missing or endpoint.strip() is empty, leaving valid
non-empty endpoint strings unchanged (so the later boto3 client creation using
endpoint_url=self._endpoint won't raise ValueError).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/account/store_providers/azure/azure.py`:
- Around line 329-332: The list_entries implementation is catching all
exceptions and wrapping them, which hides explicit StorageError validations;
update the exception handling in list_entries so that you re-raise existing
StorageError instances (add an except StorageError: raise) before the generic
except Exception that wraps errors into "Failed to list files ..." so
invalid-pattern/path-traversal validations keep their original StorageError; do
the same change for the other analogous try/except block (the second catch
around lines 381-384) to preserve StorageError there as well.
- Around line 343-353: The recursive branch currently calls
container_client.list_blobs with blob_prefix which can match siblings (e.g.,
"base" matches "baseball"); change the recursive path to use the same bounded
prefix as the non-recursive branch (use f'{blob_prefix}/' when blob_prefix is
truthy, or '' otherwise) so only descendants under the directory are returned;
update the code around blob_prefix, the list_blobs call, and any logic that
computes prefix_part_len (functions/members: _get_blob_name, _prefix, _part_len,
blob_prefix, container_client.list_blobs, container_client.walk_blobs,
recursive) and alternatively add a special-case exact-blob lookup if you need to
support listing the exact blob name.

In `@packages/ai/src/ai/account/store_providers/memory.py`:
- Around line 149-152: The _check_path function currently rejects any filename
containing the substring '..' which incorrectly blocks names like
'report..json'; change the check to only detect path traversal segments by
splitting the incoming filename on path separators (e.g., '/' and
os.path.sep/os.path.altsep or use pathlib.PurePosixPath.parts) and raising
StorageError only if any path segment equals '..'. Update the _check_path
implementation (referenced by function name _check_path and exception
StorageError) so it preserves valid filenames containing '..' in their names
while still preventing parent-directory traversal.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Around line 316-390: The list_entries function is not using the module's retry
policy like other S3 methods (e.g., list_files); wrap its S3 I/O in the same
retry logic so transient S3/MinIO failures get exponential backoff.
Specifically, apply the same `@retry`(...) decorator (the identical retry used by
other public S3 methods) to list_entries so the calls in _iter_keys
(client.get_paginator / paginator.paginate) and the
asyncio.to_thread(_list_entries) execution are retried; ensure you import/reuse
the same retry symbol and keep existing exception handling semantics.

In `@packages/ai/src/ai/account/store.py`:
- Around line 340-347: Update the docstring for the listing function in
packages/ai/src/ai/account/store.py to narrowly define name_pattern as a
basename-only glob: state that the pattern is applied only to the final path
segment (basename) and must not contain path separators ('/' or '\\'), and that
backends will reject patterns containing separators; replace the current
name_pattern description in the Args block accordingly so callers know to pass
only filename/glob for the basename.

---

Outside diff comments:
In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Around line 45-55: When parsing credentials in the S3 provider (inside the
block that sets self._access_key_id / self._secret_access_key / self._region),
normalize the endpoint value to None for any falsey or blank string so boto3
receives None instead of an empty string; replace the direct assignment
self._endpoint = creds.get('endpoint') with logic that sets self._endpoint to
None when the retrieved endpoint is missing or endpoint.strip() is empty,
leaving valid non-empty endpoint strings unchanged (so the later boto3 client
creation using endpoint_url=self._endpoint won't raise ValueError).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7b0b82fa-4905-4c49-a322-dd449d724ebf

📥 Commits

Reviewing files that changed from the base of the PR and between 65033f4 and e21f0ce.

📒 Files selected for processing (13)

.env.template
.github/workflows/_build.yaml
packages/ai/src/ai/account/requirements.txt
packages/ai/src/ai/account/store.py
packages/ai/src/ai/account/store_providers/azure/__init__.py
packages/ai/src/ai/account/store_providers/azure/azure.py
packages/ai/src/ai/account/store_providers/azure/requirements.txt
packages/ai/src/ai/account/store_providers/filesystem.py
packages/ai/src/ai/account/store_providers/memory.py
packages/ai/src/ai/account/store_providers/s3/__init__.py
packages/ai/src/ai/account/store_providers/s3/requirements.txt
packages/ai/src/ai/account/store_providers/s3/s3.py
packages/ai/tests/ai/account/test_store.py

stepmikhaylov · 2026-06-02T11:14:14Z

Addressed r3340603140: updated the name_pattern docstring in store.py to clarify it is a basename-only glob, must not contain path separators (/ or \), and that backends will raise ValueError for patterns that include them.

stepmikhaylov · 2026-06-02T11:23:48Z

@kwit75 CR addressed, please check.

#1048 merged
#1044 rebased onto develop with merged #1048, no conflicts encountered
Found bugs are fixed:
- #1995bcef all test for IO and list_entries applied to Azure/S3 stores w/o prefixes
- #e21f0cec Azure/S3 stores are fixed by test failures

kwit75

Re-reviewed — both bugs confirmed fixed, clearing my request-changes. ✅

Empty-prefix off-by-one: the new _part_len(p) = p.count('/') + 1 if p else 0 normalizes prefix depth, so prefix_part_len is consistent whether self._prefix is set or empty. Traced both backends: list_entries('sub') on a no-prefix store no longer re-emits sub/ itself (dir loop range(2, 2) is empty), and list_entries('') still emits top-level dirs. S3 (store_root empty-prefix handling) and Azure match.
Traversal guard: the name_pattern / \ .. checks are now unconditional (out of if self._prefix), and _get_key/_get_blob_name gained the explicit ..//.. check — reachable on no-prefix stores now.
Tests: the no-prefix store fixtures (S3Store(url) / Azure container-root, "no prefix") now run the full test_list_entries_* matrix incl. traversal + pattern protection — exactly the path the old tmp_-prefixed fixtures masked.

CI green on all platforms. Nice rework — merging.

stepmikhaylov requested review from Rod-Christensen, jmaionchi and kwit75 as code owners June 1, 2026 18:31

github-actions Bot added module:ai AI/ML modules ci/cd CI/CD and build system labels Jun 1, 2026

stepmikhaylov mentioned this pull request Jun 1, 2026

ci: MinIO + ROCKETRIDE_TEST_S3_* for S3Store integration tests (Linux) #1035

Closed

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread .github/workflows/_build.yaml Outdated

Comment thread packages/ai/src/ai/account/store_providers/s3/s3.py Outdated

Comment thread packages/ai/tests/ai/account/test_store.py

stepmikhaylov force-pushed the feat/store-list-entries branch from ad141e2 to 986d2c8 Compare June 1, 2026 20:09

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread .env.template

Comment thread packages/ai/src/ai/account/store_providers/s3/s3.py Outdated

Comment thread packages/ai/src/ai/account/store_providers/s3/s3.py Outdated

Comment thread packages/ai/tests/ai/account/test_store.py

stepmikhaylov force-pushed the feat/store-list-entries branch from 986d2c8 to a68da4b Compare June 1, 2026 20:25

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread .github/workflows/_build.yaml Outdated

stepmikhaylov force-pushed the feat/store-list-entries branch from a68da4b to 65033f4 Compare June 1, 2026 20:55

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread packages/ai/src/ai/account/store_providers/filesystem.py

Comment thread packages/ai/tests/ai/account/test_store.py Outdated

kwit75 requested changes Jun 1, 2026

View reviewed changes

stepmikhaylov and others added 10 commits June 2, 2026 12:40

chore(ai): IStore.list_entries, FilesystemStore.list_entries

bf72d07

chore(ai): MemoryStore.list_entries, MemoryStore tests

10b3e4b

chore(ai): S3Store.list_entries, S3Store tests

9d5ac1c

chore(ai): AzureBlobStore.list_entries, AzureBlobStore tests

7fabef6

chore(ai): s3 and azure module folders

5cc65a0

chore(ci): azurite for azure store test

c3d3c5a

chore: address coderabbit issues (1)

2381d62

chore(ai): azure/s3 store prefix/no-prefix tests

1995bce

chore(ai): azure/s3 store prefix/no-prefix fixes

e21f0ce

stepmikhaylov force-pushed the feat/store-list-entries branch from 65033f4 to e21f0ce Compare June 2, 2026 10:41

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

chore(ai): address rabbit's issues

9423abb

asclearuc approved these changes Jun 2, 2026

View reviewed changes

kwit75 approved these changes Jun 2, 2026

View reviewed changes

kwit75 merged commit eae941c into develop Jun 2, 2026
20 checks passed

kwit75 deleted the feat/store-list-entries branch June 2, 2026 15:27

Conversation

stepmikhaylov commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kwit75 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stepmikhaylov commented Jun 2, 2026

Uh oh!

stepmikhaylov commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwit75 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stepmikhaylov commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

stepmikhaylov commented Jun 2, 2026 •

edited

Loading