Skip to content

feat(sharepoint): pass through ACL permission metadata#699

Merged
brndnblck merged 6 commits intomainfrom
brandon/plu-331-customer-request-support-for-passing-through-sharepoint
Apr 30, 2026
Merged

feat(sharepoint): pass through ACL permission metadata#699
brndnblck merged 6 commits intomainfrom
brandon/plu-331-customer-request-support-for-passing-through-sharepoint

Conversation

@brndnblck
Copy link
Copy Markdown
Contributor

@brndnblck brndnblck commented Apr 28, 2026

Summary

  • Extract SharePoint permission data from the Graph API and normalize to the standard [{read: {users, groups}}, {update: ...}, {delete: ...}] schema used by Google Drive and Confluence connectors.
  • Fetch permissions via Graph JSON batching (/$batch, up to 20 per round-trip) with automatic per-item fallback on batch failure.
  • Parse identities from raw permission.properties JSON to avoid cross-field data bleed in the office365 typed accessors.

Details

  • Added MICROSOFT_ROLE_MAPPING covering Graph standard roles and SharePoint-specific sp.* roles.
  • Added _extract_identity_ids_from_raw() to pull Azure AD user/group IDs from grantedToV2 (preferred) with grantedTo fallback. SharePoint siteGroup numeric IDs are excluded (not resolvable via Graph).
  • Added extract_permissions() to normalize raw Graph permissions into the canonical schema.
  • Added _fetch_permissions_batched() with queue-drain guard on batch failure and ClientRequestException handling per item in fallback.
  • Refactored run_async to window-and-yield in chunks of 20 instead of materializing the full file list.
  • 32 unit tests covering identity extraction, role mapping, batching, fallback, and edge cases.

Limitations

  • Requires SharePoint Online (M365) with Azure AD OAuth. SharePoint Server on-prem is not supported (no Graph API).
  • Graph API does not support $expand=permissions on DriveItem collections, so permissions are fetched separately per batch.
  • ACL metadata is a best-effort snapshot; it is not runtime authorization.

Closes PLU-331


Note

Medium Risk
Touches SharePoint indexing flow and adds external Graph batching/fallback logic; failures could impact metadata completeness or indexing performance, though behavior is largely additive and covered by extensive tests.

Overview
Adds SharePoint ACL pass-through by fetching Graph DriveItem permissions and normalizing them into the canonical [{read},{update},{delete}] permissions_data schema.

Implements Graph /$batch permission hydration (20 items per batch) with a per-item fallback on batch failure, plus raw-JSON identity extraction and a MICROSOFT_ROLE_MAPPING to translate Microsoft/SharePoint roles into operations. Updates run_async to process files in permission-hydrated chunks, and extends unit + integration fixtures to validate deterministic, merged ACL output.

Bumps package version to 1.5.0 and documents the enhancement in CHANGELOG.md.

Reviewed by Cursor Bugbot for commit 7cb9a08. Bugbot is set up for automated code reviews on this repo. Configure here.

Extract SharePoint permission data from Graph API and normalize to the
standard read/update/delete schema used by other connectors. Permissions
are fetched via Graph JSON batching (up to 20 per round-trip) with
per-item fallback on batch failure.

PLU-331
@brndnblck
Copy link
Copy Markdown
Contributor Author

brndnblck commented Apr 29, 2026

Validated end-to-end locally against a live SharePoint tenant using download_only=True to isolate the source connector (thank you @potter-potter for the guidance). permissions_data is populated on FileData.metadata with the expected schema:

"permissions_data": [
  {"read":    {"users": ["e3638d7e-708b-..."], "groups": ["e3638d7e-708b-..."]}},
  {"update":  {"users": ["e3638d7e-708b-..."], "groups": ["e3638d7e-708b-..."]}},
  {"delete":  {"users": ["e3638d7e-708b-..."], "groups": ["e3638d7e-708b-..."]}}
]

Both fixes confirmed working:

  • to_json() produces plain dicts that identity extraction parses without error
  • Pinned permissions collection survives through to file data construction (was None without the pin)

"""
perms = drive_item.permissions
drive_item.properties["permissions"] = perms
return perms
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinned permissions may not survive batch execution

High Severity

_pin_permissions stores the PermissionCollection in drive_item.properties["permissions"], but the office365 library's execute_batch() may reset object properties during response processing (e.g. via clear_state()). If the pin is cleared, drive_item.permissions in drive_item_to_file_data_sync returns a fresh empty collection, causing list(drive_item.permissions) to be empty and permissions_data to silently remain None for every file. This matches the reviewer's observation that the fetched collection is "ephemeral and discarded after execute_batch."

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 786d018. Configure here.

Copy link
Copy Markdown
Contributor Author

@brndnblck brndnblck Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive. This has been addressed. Both validation and the regenerated fixtures would show null if execute_batch was still clearing the pin.

#699 (comment)

Regenerate expected results against live SharePoint to include
the permissions_data field now populated by the ACL passthrough.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7cb9a08. Configure here.

"""Normalize Graph API permissions to the standard read/update/delete format."""
if not permissions:
logger.debug("no permissions found")
return [{}]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent return schema for empty permissions list

Low Severity

extract_permissions returns [{}] when the input list is empty, which doesn't conform to the standard [{"read": ...}, {"update": ...}, {"delete": ...}] schema documented in the docstring and used by Google Drive and Confluence connectors. Any downstream consumer iterating over the result and accessing e.g. list(entry.keys())[0] would get an IndexError on the empty dict. While the current call site in drive_item_to_file_data_sync guards with if perms, the method is public and the inconsistency could cause issues if reused elsewhere.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7cb9a08. Configure here.

Copy link
Copy Markdown
Contributor Author

@brndnblck brndnblck Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix.

I don't like the inconsistency here either, but this is following a pre-existing convention from the other connectors and is a non-issue in practice since drive_item_to_file_data_sync is guarded by if perms:.

I chose to stay consistent with existing behavior here, and if we want to revisit that (breaking change on multiple connectors) we should make that change separately and version accordingly.

@brndnblck brndnblck requested review from a team and potter-potter April 29, 2026 01:49
Copy link
Copy Markdown
Contributor

@potter-potter potter-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

@brndnblck brndnblck merged commit 2bd3dc0 into main Apr 30, 2026
45 checks passed
@brndnblck brndnblck deleted the brandon/plu-331-customer-request-support-for-passing-through-sharepoint branch April 30, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants