Skip to content

feat(ai) - store: list entries#1044

Merged
kwit75 merged 11 commits into
developfrom
feat/store-list-entries
Jun 2, 2026
Merged

feat(ai) - store: list entries#1044
kwit75 merged 11 commits into
developfrom
feat/store-list-entries

Conversation

@stepmikhaylov
Copy link
Copy Markdown
Collaborator

@stepmikhaylov stepmikhaylov commented Jun 1, 2026

Summary

  • Adds IStore.list_entries() abstract method — lists files and folders under a prefix with recursive, include_files, include_dirs, and glob_pattern filtering; directory entries carry a trailing /
  • Implements list_entries across all four backends: FilesystemStore, MemoryStore, S3Store, AzureBlobStore, each with integration tests
  • Moves s3.py and azure.py into their own sub-packages (store_providers/s3/, store_providers/azure/) with per-provider requirements.txt; deps are auto-installed by depends() on first import
  • Wires MinIO and Azurite into the Linux CI build so S3/Azure list_entries integration tests are mandatory on Ubuntu (docker not available on win/macOS, so tests skip cleanly there)

Test plan

  • test_store.pyFilesystemStore, MemoryStore, S3Store, AzureBlobStore list_entries tests pass locally
  • CI Ubuntu: MinIO and Azurite steps start cleanly, env vars exported, S3/Azure tests run (not skipped)
  • CI win/macOS: S3/Azure tests skip (env vars absent), no failure
  • S3Store / AzureBlobStore tests fail the job (not skip) if their respective service step fails to start

Summary by CodeRabbit

  • New Features

    • Unified listing API across storage backends: recursive vs non-recursive, file/dir toggles, and name-pattern filtering.
    • S3 accepts configurable custom endpoints; S3 and Azure providers exported for direct use.
    • Path-traversal protections added across stores.
  • Tests

    • Expanded and reorganized backend tests and integration scenarios covering listing behavior, patterns, and edge cases.
  • Chores

    • Added commented .env template entries and CI steps to run local S3/Azure emulators; declared S3/Azure client dependencies.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

No description provided.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds IStore.list_entries(...) and implements it for S3 (optional endpoint), Azure, filesystem, and memory stores; registers provider requirements and entrypoints; refactors and expands store tests; and adds CI emulator steps plus .env.template entries for S3/Azure integration testing.

Changes

Storage List Entries API and Cloud Backend Support

Layer / File(s) Summary
Core list_entries() interface and package notes
packages/ai/src/ai/account/store.py, packages/ai/src/ai/account/requirements.txt
IStore replaces list_files() with list_entries() adding recursive, include_files, include_dirs, and name_pattern. Account package notes reference provider requirements and auto-install behavior.
S3 provider: entrypoint, requirements, endpoint, and listing
packages/ai/src/ai/account/store_providers/s3/__init__.py, packages/ai/src/ai/account/store_providers/s3/requirements.txt, packages/ai/src/ai/account/store_providers/s3/s3.py
Adds S3 package entrypoint, declares boto3, supports JSON endpoint in secret_key stored to _endpoint, includes endpoint_url when constructing boto3 clients, and implements list_entries() with recursive/non-recursive modes, directory synthesis, pagination, and fnmatch filtering.
Azure provider: entrypoint, requirements, and listing
packages/ai/src/ai/account/store_providers/azure/__init__.py, packages/ai/src/ai/account/store_providers/azure/requirements.txt, packages/ai/src/ai/account/store_providers/azure/azure.py
Adds Azure package entrypoint and requirements, updates imports, and implements list_entries() using list_blobs/walk_blobs with retry, directory deduplication, and fnmatch filtering; adds traversal check in _get_blob_name.
Filesystem and Memory stores
packages/ai/src/ai/account/store_providers/filesystem.py, packages/ai/src/ai/account/store_providers/memory.py
FilesystemStore adds list_entries() using glob/rglob with fnmatch and pattern validation (rejects /, \\, ..). MemoryStore adds _check_path() guards on mutate ops and implements regex-based list_entries() that synthesizes directories and supports wildcard patterns.
Tests: BaseStoreTest and comprehensive list_entries() coverage
packages/ai/tests/ai/account/test_store.py
Introduce BaseStoreTest, centralize imports/fixtures, add a pre-populated filesystem fixture and an extensive list_entries() test matrix (recursive, include/exclude, prefix, name-pattern, traversal/invalid pattern cases), and rename mocked S3/Azure test classes.
CI emulators and .env.template
.github/workflows/_build.yaml, .env.template
Adds Ubuntu-only GitHub Actions steps to launch MinIO and Azurite with readiness and smoke checks and export ROCKETRIDE_TEST_S3_* / ROCKETRIDE_TEST_AZURE_* to $GITHUB_ENV. .env.template documents commented S3/Azure test env vars.

Sequence Diagram

sequenceDiagram
  participant CI as GitHub Actions
  participant MinIO as MinIO (S3 emulator)
  participant Azurite as Azurite (Blob emulator)
  participant Tests as Integration Tests
  CI->>MinIO: start MinIO container
  MinIO-->>CI: /minio/health/live ready
  CI->>MinIO: run aws s3 create/put/list smoke
  CI->>CI: export ROCKETRIDE_TEST_S3_* to $GITHUB_ENV
  CI->>Azurite: start Azurite container
  Azurite-->>CI: TCP port 10000 accepting
  CI->>Azurite: create container upload/list blob smoke
  CI->>CI: export ROCKETRIDE_TEST_AZURE_* to $GITHUB_ENV
  CI->>Tests: run integration tests with exported env vars
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • jmaionchi
  • Rod-Christensen

"I hopped through code to count files and dirs,
matching patterns and guarding against sly traversals,
MinIO and Azurite hummed in CI's light,
tests list, filter, and sort each tiny sight,
a rabbit's whisker twitched at every storage trail."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(ai) - store: list entries' clearly and concisely summarizes the main change: adding a list_entries feature to the store module. It is specific, directly related to the primary objective, and follows conventional commit naming.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/store-list-entries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/_build.yaml:
- Around line 161-163: The workflow uses floating image tags for the storage
emulators in the docker run commands—specifically the MinIO image
`minio/minio:latest` and the Azurite image
`mcr.microsoft.com/azure-storage/azurite`—which makes CI non-deterministic;
update those image references to fixed versions or, preferably, immutable
digests (e.g., `minio/minio:<tag>` or `minio/minio@sha256:<digest>` and
`mcr.microsoft.com/azure-storage/azurite:<tag>` or `@sha256:<digest>`) in the
docker run lines so the CI uses pinned emulator releases for reproducible tests.

In `@packages/ai/src/ai/account/store_providers/azure/azure.py`:
- Around line 340-348: In list_entries, fix prefix handling: compute blob_prefix
as before, then build name_starts_with = blob_prefix if not
blob_prefix.endswith('/') else blob_prefix.rstrip('/'); for non-recursive
listings pass name_starts_with + '/' only when blob_prefix is non-empty (so root
'' doesn't become '/'), and adjust prefix_part_len to account for a non-empty
blob_prefix that doesn't end with '/' by adding 1 (e.g. prefix_part_len =
blob_prefix.count('/') - self._prefix.count('/') + (1 if blob_prefix and not
blob_prefix.endswith('/') else 0)); use these values when calling
container_client.list_blobs and container_client.walk_blobs and when computing
depth to decide whether to emit entries.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Around line 364-378: The branch must treat trailing-slash S3 marker keys
before file handling: in the block handling rel.endswith('/'), check the match
and add the directory to dirs when include_dirs is true (using the same
_match(rel.rstrip('/').split('/')[-1]) logic), then unconditionally continue to
skip further file logic so include_files cannot return keys like "sub/"; keep
using the existing locals (rel, include_dirs, include_files, _match, dirs,
files, parts, prefix_part_len, prefix) and ensure the trailing-slash case
returns early (continue) after optionally adding the dir.

In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 436-438: The section header preceding the TestAzureBlobStore class
is incorrect ("S3 Tests"); update the banner text to accurately reflect the
Azure integration (e.g., "Azure Blob Tests" or "Azure Blob Storage Tests") so it
matches the TestAzureBlobStore class and surrounding Azure tests; locate the
header comment above the TestAzureBlobStore declaration and replace the "S3
Tests" text with the corrected Azure label.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4bd30557-6d58-4094-bc70-f9e0e5ba00ed

📥 Commits

Reviewing files that changed from the base of the PR and between 9124b25 and ad141e2.

📒 Files selected for processing (13)
  • .env.template
  • .github/workflows/_build.yaml
  • packages/ai/src/ai/account/requirements.txt
  • packages/ai/src/ai/account/store.py
  • packages/ai/src/ai/account/store_providers/azure/__init__.py
  • packages/ai/src/ai/account/store_providers/azure/azure.py
  • packages/ai/src/ai/account/store_providers/azure/requirements.txt
  • packages/ai/src/ai/account/store_providers/filesystem.py
  • packages/ai/src/ai/account/store_providers/memory.py
  • packages/ai/src/ai/account/store_providers/s3/__init__.py
  • packages/ai/src/ai/account/store_providers/s3/requirements.txt
  • packages/ai/src/ai/account/store_providers/s3/s3.py
  • packages/ai/tests/ai/account/test_store.py

Comment thread .github/workflows/_build.yaml Outdated
Comment thread packages/ai/src/ai/account/store_providers/s3/s3.py Outdated
Comment thread packages/ai/tests/ai/account/test_store.py
@stepmikhaylov stepmikhaylov force-pushed the feat/store-list-entries branch from ad141e2 to 986d2c8 Compare June 1, 2026 20:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
packages/ai/src/ai/account/store_providers/azure/azure.py (1)

341-341: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

prefix_part_len still under-counts for a no-base-prefix store (previously flagged).

The name_starts_with root-listing concern was addressed, but blob_prefix.count('/') - self._prefix.count('/') remains slash-based rather than segment-based. With an empty self._prefix, list_entries('sub') computes prefix_part_len=0 and the loop at Line 364 emits 'sub/' (the directory being listed) as a spurious entry, unlike the non-empty-prefix case.

🐛 Proposed fix
-            blob_prefix = self._get_blob_name(prefix) if prefix else self._prefix
-            prefix_part_len = blob_prefix.count('/') - self._prefix.count('/')
+            blob_prefix = self._get_blob_name(prefix) if prefix else self._prefix
+            base_depth = len([p for p in self._prefix.split('/') if p])
+            prefix_depth = len([p for p in blob_prefix.split('/') if p])
+            prefix_part_len = max(0, prefix_depth - base_depth)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/account/store_providers/azure/azure.py` at line 341, The
calculation of prefix_part_len in azure.py uses slash counts which miscounts
when self._prefix is empty; replace the slash-based logic in the method that
computes prefix_part_len (the variable currently set with blob_prefix.count('/')
- self._prefix.count('/')) with a segment-based calculation: split blob_prefix
and self._prefix on '/' and count non-empty segments (e.g. parts = lambda s: [p
for p in s.split('/') if p]) then set prefix_part_len = max(0,
len(parts(blob_prefix)) - len(parts(self._prefix))); update the subsequent loop
that emits entries (the code that produces 'sub/' entries) to use this new
prefix_part_len so it no longer yields the directory itself for no-base-prefix
stores.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.template:
- Around line 51-60: Update the .env.template to include the full Azurite
development account key for ROCKETRIDE_TEST_AZURE_ACCOUNT_KEY so developers can
run AzureBlobStore tests locally without looking it up; replace the placeholder
"..." with the known Azurite dev key value used in CI and add a gitleaks:allow
comment on that line (and a short inline note) so the secret scanner permits
this well-known dev credential; refer to the ROCKETRIDE_TEST_AZURE_ACCOUNT_KEY
variable in the AZURE TEST VARIABLES block when making the change.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Line 372: Remove the dead/debug expression len(str(prefix_part_len) + prefix)
— it computes a value and discards it; locate where prefix_part_len and prefix
are used in packages/ai/src/ai/account/store_providers/s3/s3.py (inside the
function/method containing that expression) and delete that standalone statement
so no unused computation remains.
- Line 333: The computed prefix_part_len using "key_prefix.count('/') -
self._prefix.count('/')" undercounts when self._prefix is empty because it
counts slashes rather than non-empty path segments; update the calculation in
the S3 provider (within list_entries and the variables key_prefix and
self._prefix) to compute depth from non-empty segments, e.g. split both
key_prefix and self._prefix on '/' and use len([s for s in ... if s]) to get
segment counts, then set prefix_part_len = len(key_prefix_segments) -
len(self_prefix_segments) so the synthesized-dir loop that builds directory
entries (the synthesized-dir logic near prefix_part_len usage) won't emit the
queried directory as a spurious entry.

In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 415-418: The try/except currently catches all Exceptions around
client.head_bucket(...) and may hide real errors; change it to catch
botocore.exceptions.ClientError (import ClientError from botocore.exceptions),
inspect the error code via e.response['Error']['Code'] and only call
client.create_bucket(Bucket=test_config['bucket']) when the code indicates the
bucket is missing (e.g., 'NoSuchBucket' or '404'); for any other ClientError
re-raise the exception so auth/permission errors still fail the test.

---

Duplicate comments:
In `@packages/ai/src/ai/account/store_providers/azure/azure.py`:
- Line 341: The calculation of prefix_part_len in azure.py uses slash counts
which miscounts when self._prefix is empty; replace the slash-based logic in the
method that computes prefix_part_len (the variable currently set with
blob_prefix.count('/') - self._prefix.count('/')) with a segment-based
calculation: split blob_prefix and self._prefix on '/' and count non-empty
segments (e.g. parts = lambda s: [p for p in s.split('/') if p]) then set
prefix_part_len = max(0, len(parts(blob_prefix)) - len(parts(self._prefix)));
update the subsequent loop that emits entries (the code that produces 'sub/'
entries) to use this new prefix_part_len so it no longer yields the directory
itself for no-base-prefix stores.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e77ec1e4-91b1-45e9-9c76-1760692500b0

📥 Commits

Reviewing files that changed from the base of the PR and between ad141e2 and 986d2c8.

📒 Files selected for processing (10)
  • .env.template
  • .github/workflows/_build.yaml
  • packages/ai/src/ai/account/requirements.txt
  • packages/ai/src/ai/account/store_providers/azure/__init__.py
  • packages/ai/src/ai/account/store_providers/azure/azure.py
  • packages/ai/src/ai/account/store_providers/azure/requirements.txt
  • packages/ai/src/ai/account/store_providers/s3/__init__.py
  • packages/ai/src/ai/account/store_providers/s3/requirements.txt
  • packages/ai/src/ai/account/store_providers/s3/s3.py
  • packages/ai/tests/ai/account/test_store.py

Comment thread .env.template
Comment thread packages/ai/src/ai/account/store_providers/s3/s3.py Outdated
Comment thread packages/ai/src/ai/account/store_providers/s3/s3.py Outdated
Comment thread packages/ai/tests/ai/account/test_store.py
@stepmikhaylov stepmikhaylov force-pushed the feat/store-list-entries branch from 986d2c8 to a68da4b Compare June 1, 2026 20:25
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/_build.yaml:
- Around line 215-216: Remove the unconditional failing CI step named "Temp
fail" that runs "exit 1" (the workflow step with name "Temp fail" and run: exit
1); either delete this step entirely or replace it with a conditional/debug-only
guard so it does not execute in normal runs (e.g., remove the run: exit 1 line
and the step block) to allow the emulator/test flow and entire workflow to run
and validate normally.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ecb3828-9d9f-4b15-81ed-49bd99795811

📥 Commits

Reviewing files that changed from the base of the PR and between 986d2c8 and a68da4b.

📒 Files selected for processing (1)
  • .github/workflows/_build.yaml

Comment thread .github/workflows/_build.yaml Outdated
@stepmikhaylov stepmikhaylov force-pushed the feat/store-list-entries branch from a68da4b to 65033f4 Compare June 1, 2026 20:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
packages/ai/tests/ai/account/test_store.py (1)

416-421: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle NoSuchBucket in the S3 preflight check.

The missing-bucket branch only accepts error code '404'. Many S3-compatible responses use 'NoSuchBucket', which will incorrectly fail setup instead of creating the bucket.

Proposed fix
         try:
             client.head_bucket(Bucket=test_config['bucket'])
         except ClientError as e:
-            if e.response.get('Error', {}).get('Code') != '404':
-                raise e
+            code = e.response.get('Error', {}).get('Code')
+            if code not in ('404', 'NoSuchBucket'):
+                raise
             client.create_bucket(Bucket=test_config['bucket'])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/tests/ai/account/test_store.py` around lines 416 - 421, The S3
preflight except block should treat both HTTP 404 and S3 error code
"NoSuchBucket" as a missing-bucket condition; update the ClientError handling
around client.head_bucket(...) so it checks the error code via
e.response.get('Error', {}).get('Code') and only re-raises if the code is
neither '404' nor 'NoSuchBucket', otherwise call
client.create_bucket(Bucket=test_config['bucket']); keep the use of ClientError
and the same bucket variable (test_config['bucket']) and ensure the logic lives
in the same try/except that wraps client.head_bucket.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/account/store_providers/filesystem.py`:
- Around line 351-364: list_entries/list_files are returning internal lock
artifacts generated by read_file/read_file_with_metadata/write_file_atomic;
update the iteration in list_entries (and similarly in list_files) to skip files
whose names match the lock sibling pattern (e.g., names matching regex like
/^\..+\.lock$/ — dot-prefixed, ending with .lock). Concretely, inside the loop
over items in list_entries (the block using search_path.rglob / search_path.glob
and checking item.is_file()), add a guard to continue/skip if item.name matches
the dot-prefixed lock pattern so those entries are not appended to result; do
the same filtering in list_files if it has analogous iteration logic (refer to
functions list_entries and list_files and the
read_file/read_file_with_metadata/write_file_atomic/delete_file behaviors).

In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 425-429: The cleanup currently lists and deletes all objects with
Prefix='tmp_' which risks removing other tests' data; change the paginate Prefix
to use the test's unique prefix (e.g., combine the shared 'tmp_' base with the
current test's unique identifier like test_prefix or fixture-provided prefix) so
the call in the client.get_paginator('list_objects_v2') loop only targets
objects under that test-specific prefix, and apply the same change to the second
occurrence around lines 481-482; keep using
paginator.paginate(Bucket=test_config['bucket'], Prefix=...) and only modify the
Prefix value to the scoped test prefix.

---

Duplicate comments:
In `@packages/ai/tests/ai/account/test_store.py`:
- Around line 416-421: The S3 preflight except block should treat both HTTP 404
and S3 error code "NoSuchBucket" as a missing-bucket condition; update the
ClientError handling around client.head_bucket(...) so it checks the error code
via e.response.get('Error', {}).get('Code') and only re-raises if the code is
neither '404' nor 'NoSuchBucket', otherwise call
client.create_bucket(Bucket=test_config['bucket']); keep the use of ClientError
and the same bucket variable (test_config['bucket']) and ensure the logic lives
in the same try/except that wraps client.head_bucket.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2fb013d6-7ffd-48b9-b5dc-98eb60673c35

📥 Commits

Reviewing files that changed from the base of the PR and between a68da4b and 65033f4.

📒 Files selected for processing (13)
  • .env.template
  • .github/workflows/_build.yaml
  • packages/ai/src/ai/account/requirements.txt
  • packages/ai/src/ai/account/store.py
  • packages/ai/src/ai/account/store_providers/azure/__init__.py
  • packages/ai/src/ai/account/store_providers/azure/azure.py
  • packages/ai/src/ai/account/store_providers/azure/requirements.txt
  • packages/ai/src/ai/account/store_providers/filesystem.py
  • packages/ai/src/ai/account/store_providers/memory.py
  • packages/ai/src/ai/account/store_providers/s3/__init__.py
  • packages/ai/src/ai/account/store_providers/s3/requirements.txt
  • packages/ai/src/ai/account/store_providers/s3/s3.py
  • packages/ai/tests/ai/account/test_store.py

Comment thread packages/ai/src/ai/account/store_providers/filesystem.py
Comment thread packages/ai/tests/ai/account/test_store.py Outdated
Copy link
Copy Markdown
Collaborator

@kwit75 kwit75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid shape on the list-entries work — pagination is correct across all four providers (S3 uses a list_objects_v2 paginator, Azure's list_blobs/walk_blobs iterators follow continuation tokens — no 1000-key truncation), and filesystem/memory parity is exact. Requesting changes on two things the current tests hide, because they bite in our prod store config:

1 — (major) off-by-one in S3 + Azure list_entries when the store has no URL prefix. prefix_part_len = key_prefix.count("/") - self._prefix.count("/") is wrong when self._prefix == "". With an empty store prefix, a recursive list_entries("sub") emits the queried directory itself (sub/) as a spurious entry. Every test passes only because the fixtures use s3://bucket/tmp_<uuid> (a 1-segment prefix), which happens to make the subtraction come out to 1. Under prod STORE_URL=s3://rocketride-saas-dev-storage (empty path), a FileStore-style key users/<id>/files/sub would leak users/<id>/files/sub/. (s3/s3.py ~L333/372, azure/azure.py ~L341/364.) Fix: derive base depth from the segment count of the relative query prefix rather than subtracting absolute-key slash counts — and please add a no-store-prefix test variant so it can't regress silently.

2 — (major) traversal guard is unreachable with no store prefix. The raise StorageError("Path traversal detected") in _get_key / _get_blob_name lives inside if self._prefix:. With an empty prefix, list_entries("sub/../..") normalizes to .. and does NOT raise — test_list_entries_prefix_traversal_protection only passes thanks to the tmp_ fixture prefix. It fails safe on object stores (Prefix ../ matches nothing) so it's not an actual escape, but the guarantee the test asserts doesn't hold in prod. Move the check so it also runs when self._prefix is empty.

Minors (non-blocking): no MaxKeys/limit on recursive listing (fine now; cap before exposing to a user-facing endpoint); list_entries doesn't filter FileStore's .dirmarker sentinel (decide at the wiring PR); ruff B007/A001 shadowing in _iter_keys (prefix/dir); memory _check_path uses ".." in filename substring (false-positives on my..file.txt).

Both majors are currently dormant (no caller — file_store.py is untouched), so functionality isn't at risk today — but the fix + a no-prefix test are tiny and this is exactly our prod config, so better to nail it here than land a passing-but-wrong baseline. 🙌

FYI unrelated to your code: the earlier red CI was just the leftover Temp fail / exit 1 debug step at the old head (gone now, re-run is green-trending). Merge-order: this overlaps #1048's _build.yaml Test-step edit — land #1048 first, then rebase, so this also inherits -s.

stepmikhaylov and others added 10 commits June 2, 2026 12:40
Wires a local MinIO into the Linux build's Test step so the S3Store
integration tests (gated on boto3 + ROCKETRIDE_TEST_S3_*) run in CI.
Ubuntu-only (service containers aren't available on win/macOS), and
continue-on-error + late var-export keep a MinIO hiccup from ever
blocking a CI/release build (tests skip cleanly). Preinstalls boto3
(an optional extra) so the suite actually activates.

Tests themselves land via Stepan's feat/deploy-v2 (commit 027bb64).
@stepmikhaylov stepmikhaylov force-pushed the feat/store-list-entries branch from 65033f4 to e21f0ce Compare June 2, 2026 10:41
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/ai/src/ai/account/store_providers/s3/s3.py (1)

45-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize blank endpoint values to None before creating the boto3 client.

self._endpoint is cached from creds.get('endpoint') and later passed as endpoint_url=self._endpoint. When the config provides endpoint: "", boto3 raises ValueError: Invalid endpoint: instead of using the default endpoint. Normalize falsey values to None when parsing.

Proposed fix
-                self._endpoint = creds.get('endpoint')
+                self._endpoint = creds.get('endpoint') or None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py` around lines 45 - 55,
When parsing credentials in the S3 provider (inside the block that sets
self._access_key_id / self._secret_access_key / self._region), normalize the
endpoint value to None for any falsey or blank string so boto3 receives None
instead of an empty string; replace the direct assignment self._endpoint =
creds.get('endpoint') with logic that sets self._endpoint to None when the
retrieved endpoint is missing or endpoint.strip() is empty, leaving valid
non-empty endpoint strings unchanged (so the later boto3 client creation using
endpoint_url=self._endpoint won't raise ValueError).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/account/store_providers/azure/azure.py`:
- Around line 329-332: The list_entries implementation is catching all
exceptions and wrapping them, which hides explicit StorageError validations;
update the exception handling in list_entries so that you re-raise existing
StorageError instances (add an except StorageError: raise) before the generic
except Exception that wraps errors into "Failed to list files ..." so
invalid-pattern/path-traversal validations keep their original StorageError; do
the same change for the other analogous try/except block (the second catch
around lines 381-384) to preserve StorageError there as well.
- Around line 343-353: The recursive branch currently calls
container_client.list_blobs with blob_prefix which can match siblings (e.g.,
"base" matches "baseball"); change the recursive path to use the same bounded
prefix as the non-recursive branch (use f'{blob_prefix}/' when blob_prefix is
truthy, or '' otherwise) so only descendants under the directory are returned;
update the code around blob_prefix, the list_blobs call, and any logic that
computes prefix_part_len (functions/members: _get_blob_name, _prefix, _part_len,
blob_prefix, container_client.list_blobs, container_client.walk_blobs,
recursive) and alternatively add a special-case exact-blob lookup if you need to
support listing the exact blob name.

In `@packages/ai/src/ai/account/store_providers/memory.py`:
- Around line 149-152: The _check_path function currently rejects any filename
containing the substring '..' which incorrectly blocks names like
'report..json'; change the check to only detect path traversal segments by
splitting the incoming filename on path separators (e.g., '/' and
os.path.sep/os.path.altsep or use pathlib.PurePosixPath.parts) and raising
StorageError only if any path segment equals '..'. Update the _check_path
implementation (referenced by function name _check_path and exception
StorageError) so it preserves valid filenames containing '..' in their names
while still preventing parent-directory traversal.

In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Around line 316-390: The list_entries function is not using the module's retry
policy like other S3 methods (e.g., list_files); wrap its S3 I/O in the same
retry logic so transient S3/MinIO failures get exponential backoff.
Specifically, apply the same `@retry`(...) decorator (the identical retry used by
other public S3 methods) to list_entries so the calls in _iter_keys
(client.get_paginator / paginator.paginate) and the
asyncio.to_thread(_list_entries) execution are retried; ensure you import/reuse
the same retry symbol and keep existing exception handling semantics.

In `@packages/ai/src/ai/account/store.py`:
- Around line 340-347: Update the docstring for the listing function in
packages/ai/src/ai/account/store.py to narrowly define name_pattern as a
basename-only glob: state that the pattern is applied only to the final path
segment (basename) and must not contain path separators ('/' or '\\'), and that
backends will reject patterns containing separators; replace the current
name_pattern description in the Args block accordingly so callers know to pass
only filename/glob for the basename.

---

Outside diff comments:
In `@packages/ai/src/ai/account/store_providers/s3/s3.py`:
- Around line 45-55: When parsing credentials in the S3 provider (inside the
block that sets self._access_key_id / self._secret_access_key / self._region),
normalize the endpoint value to None for any falsey or blank string so boto3
receives None instead of an empty string; replace the direct assignment
self._endpoint = creds.get('endpoint') with logic that sets self._endpoint to
None when the retrieved endpoint is missing or endpoint.strip() is empty,
leaving valid non-empty endpoint strings unchanged (so the later boto3 client
creation using endpoint_url=self._endpoint won't raise ValueError).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7b0b82fa-4905-4c49-a322-dd449d724ebf

📥 Commits

Reviewing files that changed from the base of the PR and between 65033f4 and e21f0ce.

📒 Files selected for processing (13)
  • .env.template
  • .github/workflows/_build.yaml
  • packages/ai/src/ai/account/requirements.txt
  • packages/ai/src/ai/account/store.py
  • packages/ai/src/ai/account/store_providers/azure/__init__.py
  • packages/ai/src/ai/account/store_providers/azure/azure.py
  • packages/ai/src/ai/account/store_providers/azure/requirements.txt
  • packages/ai/src/ai/account/store_providers/filesystem.py
  • packages/ai/src/ai/account/store_providers/memory.py
  • packages/ai/src/ai/account/store_providers/s3/__init__.py
  • packages/ai/src/ai/account/store_providers/s3/requirements.txt
  • packages/ai/src/ai/account/store_providers/s3/s3.py
  • packages/ai/tests/ai/account/test_store.py

Comment thread packages/ai/src/ai/account/store_providers/azure/azure.py Outdated
Comment thread packages/ai/src/ai/account/store_providers/azure/azure.py
Comment thread packages/ai/src/ai/account/store_providers/memory.py Outdated
Comment thread packages/ai/src/ai/account/store_providers/s3/s3.py
Comment thread packages/ai/src/ai/account/store.py Outdated
@stepmikhaylov
Copy link
Copy Markdown
Collaborator Author

Addressed r3340603140: updated the name_pattern docstring in store.py to clarify it is a basename-only glob, must not contain path separators (/ or \), and that backends will raise ValueError for patterns that include them.

@stepmikhaylov
Copy link
Copy Markdown
Collaborator Author

stepmikhaylov commented Jun 2, 2026

@kwit75 CR addressed, please check.

  • #1048 merged
  • #1044 rebased onto develop with merged #1048, no conflicts encountered
  • Found bugs are fixed:
    • #1995bcef all test for IO and list_entries applied to Azure/S3 stores w/o prefixes
    • #e21f0cec Azure/S3 stores are fixed by test failures

Copy link
Copy Markdown
Collaborator

@kwit75 kwit75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed — both bugs confirmed fixed, clearing my request-changes. ✅

  • Empty-prefix off-by-one: the new _part_len(p) = p.count('/') + 1 if p else 0 normalizes prefix depth, so prefix_part_len is consistent whether self._prefix is set or empty. Traced both backends: list_entries('sub') on a no-prefix store no longer re-emits sub/ itself (dir loop range(2, 2) is empty), and list_entries('') still emits top-level dirs. S3 (store_root empty-prefix handling) and Azure match.
  • Traversal guard: the name_pattern / \ .. checks are now unconditional (out of if self._prefix), and _get_key/_get_blob_name gained the explicit ..//.. check — reachable on no-prefix stores now.
  • Tests: the no-prefix store fixtures (S3Store(url) / Azure container-root, "no prefix") now run the full test_list_entries_* matrix incl. traversal + pattern protection — exactly the path the old tmp_-prefixed fixtures masked.

CI green on all platforms. Nice rework — merging.

@kwit75 kwit75 merged commit eae941c into develop Jun 2, 2026
20 checks passed
@kwit75 kwit75 deleted the feat/store-list-entries branch June 2, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd CI/CD and build system module:ai AI/ML modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants