-
Notifications
You must be signed in to change notification settings - Fork 58
PLU-347: feat(box): add ACL permissions metadata to Box connector #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
0241e50
feat(box): add ACL permissions metadata to Box connector
danielle-unstructured-io 99299bf
refactor: move permissions fetching to indexer, extract module-level …
danielle-unstructured-io 7fc3a81
fix(box): move max_num_metadata_permissions to indexer config
danielle-unstructured-io 6b9683e
fix(box): skip is_access_only collaborations to prevent ACL overgrant
danielle-unstructured-io c99b150
fix(box): grant editor role delete permission
danielle-unstructured-io b6059ab
fix(test): strip randomized tempdir prefix from fixture paths
danielle-unstructured-io bcd8622
chore: bump version to 1.6.0 and add changelog entry
danielle-unstructured-io 580bb7a
Merge remote-tracking branch 'origin/main' into feature/box-acl-permi…
awalker4 6b45a60
chore: bump version to 1.6.0 instead of 1.5.3
awalker4 b29c6a5
fix(box): don't cache folder collabs on API failure
awalker4 af78a0e
fix(test): walk on-disk tree in check_raw_file_contents
awalker4 45bdbab
fix(box): drop previewer roles from BOX_ROLE_MAPPING
awalker4 8416d47
make tidy
awalker4 ff0861a
perf(box): skip file-collab fetch when has_collaborations is False; r…
awalker4 312f7c5
make tidy
awalker4 e4e521c
fix(PLU-347): address box ACL review feedback (extras, configurable c…
awalker4 4a2f93c
fix(PLU-347): align permissions_cache_max_size help text across box c…
awalker4 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 changes: 5 additions & 0 deletions
5
test/integration/connectors/expected_results/box_second_tier/directory_structure.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| { | ||
| "directory_structure": [ | ||
| "catalog.pdf" | ||
| ] | ||
| } |
61 changes: 61 additions & 0 deletions
61
...tors/expected_results/box_second_tier/file_data/8b0303ba-7c77-5e47-b0ff-790b8fc9881f.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| { | ||
| "identifier": "8b0303ba-7c77-5e47-b0ff-790b8fc9881f", | ||
| "connector_type": "box", | ||
| "source_identifiers": { | ||
| "filename": "catalog.pdf", | ||
| "fullpath": "/TestACLs-topfolder/TestACLs-secondtier/catalog.pdf", | ||
| "rel_path": "catalog.pdf" | ||
| }, | ||
| "metadata": { | ||
| "url": "box:///TestACLs-topfolder/TestACLs-secondtier/catalog.pdf", | ||
| "version": "2216144540657", | ||
| "record_locator": { | ||
| "protocol": "box", | ||
| "remote_file_path": "box://TestACLs-topfolder/TestACLs-secondtier", | ||
| "file_id": "2216144540657" | ||
| }, | ||
| "date_created": "1777662782.0", | ||
| "date_modified": "1777662782.0", | ||
| "date_processed": "1777665707.7073228", | ||
| "permissions_data": [ | ||
| { | ||
| "read": { | ||
| "users": [ | ||
| "50881967280", | ||
| "50882409531" | ||
| ], | ||
| "groups": [] | ||
| } | ||
| }, | ||
| { | ||
| "update": { | ||
| "users": [ | ||
| "50881967280" | ||
| ], | ||
| "groups": [] | ||
| } | ||
| }, | ||
| { | ||
| "delete": { | ||
| "users": [ | ||
| "50881967280" | ||
| ], | ||
| "groups": [] | ||
| } | ||
| } | ||
| ], | ||
| "filesize_bytes": 296006 | ||
| }, | ||
| "additional_metadata": { | ||
| "name": "/TestACLs-topfolder/TestACLs-secondtier/catalog.pdf", | ||
| "size": 296006, | ||
| "type": "file", | ||
| "id": "2216144540657", | ||
| "modified_at": "2026-05-01T12:13:02-07:00", | ||
| "created_at": "2026-05-01T12:13:02-07:00", | ||
| "original_file_path": "/TestACLs-topfolder/TestACLs-secondtier/catalog.pdf" | ||
| }, | ||
| "reprocess": false, | ||
| "local_download_path": "/private/var/folders/gf/qwh2bdg93kb9gzxd_xhb49wc0000gn/T/tmpekwnxs4a/unstructured_uvopv4ry/catalog.pdf", | ||
| "display_name": "/TestACLs-topfolder/TestACLs-secondtier/catalog.pdf" | ||
| } |
5 changes: 5 additions & 0 deletions
5
test/integration/connectors/expected_results/box_top_folder/directory_structure.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| { | ||
| "directory_structure": [ | ||
| "Billing issue - Example 1.pdf" | ||
| ] | ||
| } |
61 changes: 61 additions & 0 deletions
61
...ctors/expected_results/box_top_folder/file_data/11333818-b47e-5991-b32f-701975b2caca.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| { | ||
| "identifier": "11333818-b47e-5991-b32f-701975b2caca", | ||
| "connector_type": "box", | ||
| "source_identifiers": { | ||
| "filename": "Billing issue - Example 1.pdf", | ||
| "fullpath": "/TestACLs-topfolder/Billing issue - Example 1.pdf", | ||
| "rel_path": "Billing issue - Example 1.pdf" | ||
| }, | ||
| "metadata": { | ||
| "url": "box:///TestACLs-topfolder/Billing issue - Example 1.pdf", | ||
| "version": "2216145342898", | ||
| "record_locator": { | ||
| "protocol": "box", | ||
| "remote_file_path": "box://TestACLs-topfolder", | ||
| "file_id": "2216145342898" | ||
| }, | ||
| "date_created": "1777662769.0", | ||
| "date_modified": "1777662769.0", | ||
| "date_processed": "1777665696.530676", | ||
| "permissions_data": [ | ||
| { | ||
| "read": { | ||
| "users": [ | ||
| "50881967280", | ||
| "50882409531" | ||
| ], | ||
| "groups": [] | ||
| } | ||
| }, | ||
| { | ||
| "update": { | ||
| "users": [ | ||
| "50881967280" | ||
| ], | ||
| "groups": [] | ||
| } | ||
| }, | ||
| { | ||
| "delete": { | ||
| "users": [ | ||
| "50881967280" | ||
| ], | ||
| "groups": [] | ||
| } | ||
| } | ||
| ], | ||
| "filesize_bytes": 142776 | ||
| }, | ||
| "additional_metadata": { | ||
| "name": "/TestACLs-topfolder/Billing issue - Example 1.pdf", | ||
| "size": 142776, | ||
| "type": "file", | ||
| "id": "2216145342898", | ||
| "modified_at": "2026-05-01T12:12:49-07:00", | ||
| "created_at": "2026-05-01T12:12:49-07:00", | ||
| "original_file_path": "/TestACLs-topfolder/Billing issue - Example 1.pdf" | ||
| }, | ||
| "reprocess": false, | ||
| "local_download_path": "/private/var/folders/gf/qwh2bdg93kb9gzxd_xhb49wc0000gn/T/tmpqw6nq7zk/unstructured_aqpewcxk/Billing issue - Example 1.pdf", | ||
| "display_name": "/TestACLs-topfolder/Billing issue - Example 1.pdf" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| import os | ||
|
|
||
| import pytest | ||
|
|
||
| from test.integration.connectors.utils.constants import BLOB_STORAGE_TAG, SOURCE_TAG | ||
| from test.integration.connectors.utils.validation.source import ( | ||
| SourceValidationConfigs, | ||
| source_connector_validation, | ||
| ) | ||
| from test.integration.utils import requires_env | ||
| from unstructured_ingest.processes.connectors.fsspec.box import ( | ||
| CONNECTOR_TYPE, | ||
| BoxAccessConfig, | ||
| BoxConnectionConfig, | ||
| BoxDownloader, | ||
| BoxDownloaderConfig, | ||
| BoxIndexer, | ||
| BoxIndexerConfig, | ||
| ) | ||
|
|
||
|
|
||
| def make_box_components(remote_url: str, download_dir): | ||
| app_config = os.environ["BOX_APP_CONFIG"] | ||
| connection_config = BoxConnectionConfig( | ||
| access_config=BoxAccessConfig(box_app_config=app_config) | ||
| ) | ||
| index_config = BoxIndexerConfig(remote_url=remote_url) | ||
| download_config = BoxDownloaderConfig(download_dir=download_dir) | ||
| indexer = BoxIndexer(connection_config=connection_config, index_config=index_config) | ||
| downloader = BoxDownloader(connection_config=connection_config, download_config=download_config) | ||
| return indexer, downloader | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| @pytest.mark.tags(CONNECTOR_TYPE, SOURCE_TAG, BLOB_STORAGE_TAG) | ||
| @requires_env("BOX_APP_CONFIG") | ||
| async def test_box_top_folder(temp_dir): | ||
| """ | ||
| Integration test for Box source connector against the top-level ACL test folder. | ||
| Validates that permissions_data is populated from direct folder collaborations. | ||
| """ | ||
| indexer, downloader = make_box_components( | ||
| remote_url="box://TestACLs-topfolder", | ||
| download_dir=temp_dir, | ||
| ) | ||
| await source_connector_validation( | ||
| indexer=indexer, | ||
| downloader=downloader, | ||
| configs=SourceValidationConfigs( | ||
| test_id="box_top_folder", | ||
| validate_downloaded_files=False, | ||
| validate_file_data=True, | ||
| exclude_fields_extend=[ | ||
| "metadata.date_processed", | ||
| ], | ||
| ), | ||
| ) | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| @pytest.mark.tags(CONNECTOR_TYPE, SOURCE_TAG, BLOB_STORAGE_TAG) | ||
| @requires_env("BOX_APP_CONFIG") | ||
| async def test_box_second_tier(temp_dir): | ||
| """ | ||
| Integration test for Box source connector against the nested ACL test folder. | ||
| Validates that permissions_data reflects inherited permissions from the parent folder. | ||
| """ | ||
| indexer, downloader = make_box_components( | ||
| remote_url="box://TestACLs-topfolder/TestACLs-secondtier", | ||
| download_dir=temp_dir, | ||
| ) | ||
| await source_connector_validation( | ||
| indexer=indexer, | ||
| downloader=downloader, | ||
| configs=SourceValidationConfigs( | ||
| test_id="box_second_tier", | ||
| validate_downloaded_files=False, | ||
| validate_file_data=True, | ||
| exclude_fields_extend=[ | ||
| "metadata.date_processed", | ||
| ], | ||
| ), | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.