PLU-347: feat(box): add ACL permissions metadata to Box connector#704
PLU-347: feat(box): add ACL permissions metadata to Box connector#704danielle-unstructured-io wants to merge 2 commits intomainfrom
Conversation
Fetches collaborations from Box API (direct + inherited via path_collection ancestor walk) and normalizes into permissions_data on FileDataSourceMetadata, consistent with Confluence and Google Drive connectors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 99299bf. Configure here.
| BOX_ROLE_MAPPING: dict[str, list[str]] = { | ||
| "owner": ["read", "update", "delete"], | ||
| "co-owner": ["read", "update", "delete"], | ||
| "editor": ["read", "update"], |
There was a problem hiding this comment.
Editor delete permission omitted
Medium Severity
editor collaborations are normalized without delete, but Box Editors can delete files and folders. permissions_data therefore understates delete access for valid Box users and groups.
Reviewed by Cursor Bugbot for commit 99299bf. Configure here.
| type_key = entity_type + "s" | ||
| for op in operations: | ||
| normalized[op][type_key].add(entity_id) | ||
| total[0] += 1 |
There was a problem hiding this comment.
Access-only collaborations overgrant permissions
High Severity
is_access_only Box collaborations are normalized like regular permissions. During ancestor-folder walks, users or groups with access only to another nested item can be added to unrelated files’ permissions_data, overgranting downstream ACLs.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 99299bf. Configure here.
There was a problem hiding this comment.
I'd have claude take a look at this one. (This being the Cursor comment about overgranted permissions)
| @@ -0,0 +1,5 @@ | |||
| { | |||
| "directory_structure": [ | |||
| "unstructured_aqpewcxk/Billing issue - Example 1.pdf" | |||
There was a problem hiding this comment.
Randomized download paths in fixtures
Medium Severity
The Box fixtures assert captured unstructured_* temp-directory suffixes. FsspecDownloader creates a fresh random suffix each run, so directory-structure validation will fail even when downloads succeed.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 99299bf. Configure here.
| try: | ||
| file_data.metadata.permissions_data = _get_permissions_for_file( | ||
| client, file_id, cache | ||
| ) |
There was a problem hiding this comment.
Permission cap is ignored
Medium Severity
BoxIndexer.run() always uses _get_permissions_for_file()’s default limit, so BoxDownloaderConfig.max_num_metadata_permissions is ignored in normal pipelines where the downloader fallback does not run.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 99299bf. Configure here.
awalker4
left a comment
There was a problem hiding this comment.
Will just need to add a new block to the changelog for whatever the next version is. This value also goes into __version__.py
| type_key = entity_type + "s" | ||
| for op in operations: | ||
| normalized[op][type_key].add(entity_id) | ||
| total[0] += 1 |
There was a problem hiding this comment.
I'd have claude take a look at this one. (This being the Cursor comment about overgranted permissions)


Summary
permissions_dataonFileDataSourceMetadata, consistent with the Confluence and Google Drive implementationsBoxIndexer.run()so they're available to all downstream pipeline stagesBoxDownloader.run()retains a fallback for standalone usage (CLI, integration tests without the SND plugin layer)What changed
unstructured_ingest/processes/connectors/fsspec/box.pyBOX_ROLE_MAPPING— maps Box collaboration roles to[read],[read, update], or[read, update, delete];uploaderexcluded (write-only)_normalize_collaborations(),_get_collaborations_for_folder(),_get_permissions_for_file()helpersBoxIndexer.run()override — initializes a Box SDK client once, then for each indexed file walkspath_collectionancestor folders (LRU-cached, max 5 entries) plus direct file collaborations to build normalized permissionsBoxDownloader.run()fallback — only fetches permissions ifpermissions_data is None(i.e., indexer wasn't run)BoxConnectionConfig.get_box_client()— returns an authenticatedboxsdk.Clientvia JWTTests
BOX_ROLE_MAPPING,_normalize_collaborations, and_get_permissions_for_file(all mock-based)Test plan
permissions_datapopulated with correct user IDsCloses PLU-347
🤖 Generated with Claude Code
Note
Medium Risk
Adds Box SDK calls during indexing/downloading to fetch and attach ACL-derived
permissions_data, which can impact correctness, performance, and error handling when talking to Box APIs.Overview
Box source metadata is extended to include ACL permissions. The Box connector now derives
permissions_datafrom Box collaborations (including inherited folder permissions) and attaches it toFileDataSourceMetadata.Permissions are fetched at index time via a new authenticated
boxsdkclient and normalized through new helpers/role mappings with a small LRU cache and a configurable cap (max_num_metadata_permissions); the downloader retains a fallback to populate permissions when the indexer wasn’t used.Adds Box integration coverage and fixtures for top-level vs nested folder ACL behavior, plus unit tests for role-to-operation mapping and permission normalization/caching. Documentation image links were also updated to use local
pipeline.png/sequence.pngpaths.Reviewed by Cursor Bugbot for commit 99299bf. Bugbot is set up for automated code reviews on this repo. Configure here.