Skip to content

Fix CorpusReference visibility checks#2080

Open
JSv4 wants to merge 3 commits into
mainfrom
codex/fix-corpusreference-visibility-issue
Open

Fix CorpusReference visibility checks#2080
JSv4 wants to merge 3 commits into
mainfrom
codex/fix-corpusreference-visibility-issue

Conversation

@JSv4

@JSv4 JSv4 commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Motivation

  • The prior service returned CorpusReference rows based only on parent-corpus visibility which could expose source/target documents, annotations, and related corpus metadata to callers who lack document-level access.
  • This created an IDOR/authorization bypass allowing GraphQL clients to traverse nested fields and leak private document-derived text, metadata, or file URLs.
  • The change restores the expected MIN(document_permission, corpus_permission) semantics so a readable corpus cannot disclose private document or target-corpus data.

Description

  • Update opencontractserver/enrichment/services/corpus_reference_service.py to filter visible_to_user by readable corpora and readable documents and to require that source_annotation's document and any target_document/target_annotation/target_corpus are visible to the requesting user.
  • Add Q + Document imports and expand the service docstring to document the enforced MIN-permission semantics.
  • Add regression tests in opencontractserver/tests/test_corpus_reference_model.py (CorpusReferenceVisibilityTests) that assert references are filtered out when the source document, target document, or target corpus is not visible and that references remain when all edges are visible.

Testing

  • Ran python -m py_compile opencontractserver/enrichment/services/corpus_reference_service.py opencontractserver/tests/test_corpus_reference_model.py which succeeded.
  • Ran git diff --check which reported no problems.
  • Attempted pytest opencontractserver/tests/test_corpus_reference_model.py -q but test execution was blocked by a runtime dependency error (ModuleNotFoundError: No module named 'django'), so the new tests could not be executed in this environment.

Codex Task

@claude

claude Bot commented Jun 29, 2026

Copy link
Copy Markdown

Code Review

This PR patches an IDOR in CorpusReferenceService.visible_to_user by adding document- and corpus-level visibility gates on the source annotation and all nullable target pointers. The direction is right and the docstring clearly documents the enforced MIN-permission semantics. Three issues worth addressing before merge:


1. source_annotation__document has no null guard — structural annotations are silently excluded

File: opencontractserver/enrichment/services/corpus_reference_service.py, line 37

Annotation.document is declared null=True on the model (for structural annotations that belong to a structural_set rather than a single document). The filter

source_annotation__document__in=visible_documents

applies a SQL IN (SELECT id FROM documents WHERE …) against the annotation's document_id column. When document_id IS NULL, the expression evaluates to UNKNOWN (never true in SQL), so any CorpusReference whose source_annotation is a structural annotation would be permanently invisible to every user, including owners and superusers.

The target nullable FK lookups already follow the correct pattern:

Q(target_annotation__isnull=True)
| Q(target_annotation__document__in=visible_documents)

The source annotation lookup needs the same treatment:

(Q(source_annotation__document__isnull=True)
 | Q(source_annotation__document__in=visible_documents))

The enrichment pipeline currently only writes references whose source annotation is document-scoped, so this is low-probability today, but the schema allows it and a future caller would hit this silently.


2. target_annotation visibility clause is untested

File: opencontractserver/tests/test_corpus_reference_model.py, line 115 (end of new test class)

The service filters:

Q(target_annotation__isnull=True)
| Q(target_annotation__document__in=visible_documents)

but CorpusReferenceVisibilityTests has no negative test case for a non-null target_annotation whose document is private. The three other nullable FK paths (target_document, target_corpus, source_annotation.document) each have a dedicated negative test; this one is simply missing.

If the target_annotation clause were accidentally removed from the service, the existing test suite would stay green. The fix is a fifth test method analogous to test_visible_to_user_requires_visible_target_document.


3. Happy-path test uses null targets — visible non-null targets are never exercised

File: opencontractserver/tests/test_corpus_reference_model.py, line 197

test_visible_to_user_returns_reference_when_all_edges_visible creates a reference with every target field null. It confirms that a null-target reference is accessible but says nothing about whether a reference with a visible non-null target_document, target_corpus, or target_annotation also passes. If the filter accidentally required all targets to be null, this test would still pass. A supplementary positive test that sets target_document=self.visible_doc and asserts the viewer can see the reference would lock in that branch.


Minor: bootstrap_authority_corpus defaults make_public=False

Not a bug introduced here, but worth noting for awareness: authorities.py::bootstrap_authority_corpus has make_public: bool = False (line 344). Any operator who calls that function directly with the default and then expects CorpusReferenceService to surface the newly created references will find them invisible until the corpus is published — because the new source_annotation__document__in=visible_documents filter won't match the private authority documents for users who aren't the creator. The BFS crawl in CrawlAuthoritiesService is unaffected because it passes user = User.objects.get(pk=creator_id) and visible_to_user includes Q(creator=user), so the creator always sees their own private documents. But a service call with a different user would silently return zero references. A note in the docstring of bootstrap_authority_corpus or CorpusReferenceService.visible_to_user would prevent future confusion.

…sibility

#2080's CorpusReferenceService.visible_to_user added a target-visibility
filter (correct for the corpusReferences GraphQL query, which exposes
target_document). But the governance graph consumes references and ghosts
invisible targets itself (the degrade-to-ghost contract in
test_invisible_authority_target_degrades_to_external_ghost) — the target
pre-filter dropped those references before the graph could degrade them.

Split the visibility: add visible_to_user_by_source / for_corpus_by_source
(corpus READ + visible source document, target NOT filtered) and point the
governance graph at it. visible_to_user now builds on the source-only base
plus the target filter, so its behaviour and the GraphQL surface are
unchanged. The graph's existing per-endpoint checks drop hidden-source edges
and degrade hidden targets to external ghost nodes.
@claude

claude Bot commented Jun 29, 2026

Copy link
Copy Markdown

Code Review — Fix CorpusReference visibility checks

This PR addresses a real IDOR/authorization bypass: the old visible_to_user let a corpus-READ grant expose source annotations, source documents, and target metadata for documents the requester cannot read. The two-tier split (visible_to_user for FK-exposing surfaces, visible_to_user_by_source for ghost-node consumers) is architecturally sound and the test coverage is well-structured. That said, there are two data-loss bugs and a couple of secondary issues worth fixing before merge.


Finding 1 — Data loss: structural annotations silently dropped from all surfaces

opencontractserver/enrichment/services/corpus_reference_service.py — line 45

Annotation.document is null=True ("Nullable for structural annotations in shared sets", model line 967). source_annotation__document__in=visible_documents uses a Django __in lookup, and NULL is never a member of any __in list — it evaluates to FALSE, not UNKNOWN. So any CorpusReference whose source_annotation is a structural annotation (document FK is NULL) is silently dropped by both visible_to_user_by_source and visible_to_user, regardless of who the requesting user is or what permissions they hold. References authored by the corpus owner will disappear from corpusReferences queries and wanted_authorities with no error.

Fix: add a null guard — Q(source_annotation__document__isnull=True) | Q(source_annotation__document__in=visible_documents) — mirroring the same pattern already used for the target fields in visible_to_user.


Finding 2 — Data loss: target structural annotations silently filter the reference out

opencontractserver/enrichment/services/corpus_reference_service.py — line 73

The same __in / NULL mismatch applies to target_annotation__document__in=visible_documents. When a CorpusReference points to a structural annotation (its document is NULL), the Q(target_annotation__isnull=True) branch does not fire (the FK itself is non-null), and Q(target_annotation__document__in=visible_documents) is FALSE. The whole reference is dropped by visible_to_user.

Fix: Q(target_annotation__document__isnull=True) | Q(target_annotation__document__in=visible_documents).


Finding 3 — IDOR gap: target_annotation corpus visibility is not checked

opencontractserver/enrichment/services/corpus_reference_service.py — line 73

The filter Q(target_annotation__document__in=visible_documents) checks whether the annotation's parent document is readable, but not whether the annotation's parent corpus is readable. The project permission model requires MIN(document_permission, corpus_permission) (CLAUDE.md). An annotation whose document is public but whose corpus is private passes this filter and the annotation FK (and everything you can traverse from it) is exposed to the requester.

target_corpus on the CorpusReference row itself is a separate field and may be NULL even when target_annotation.corpus is non-NULL and private.

Fix: add & (Q(target_annotation__isnull=True) | Q(target_annotation__corpus__in=visible_corpora) | Q(target_annotation__corpus__isnull=True)) (null guard needed because Annotation.corpus is nullable).


Finding 4 — Tests only cover is_public visibility; guardian-granted paths are untested

opencontractserver/tests/test_corpus_reference_model.py — lines 120–241

Every test in CorpusReferenceVisibilityTests grants visibility by setting is_public=True. The BaseVisibilityManager.visible_to_user has three paths: public, creator, and explicit guardian READ grant. The most common real-world sharing scenario — is_public=False with an explicit assign_perm("read_document", viewer, doc) — is not exercised. A regression in the guardian branch of visible_to_user would not be caught by these tests.

Suggestion: add at least one positive test where the viewer has a guardian grant (assign_perm) on the corpus and all relevant documents, and one negative test where the guardian grant is absent.


Finding 5 — Redundant queryset computation in visible_to_user (dead assignments)

opencontractserver/enrichment/services/corpus_reference_service.py — lines 65–66

visible_to_user computes visible_corpora and visible_documents at lines 65–66, then calls visible_to_user_by_source(user) at line 68, which independently re-computes both (visible_to_user_by_source lines 40–41). The outer assignments exist solely to supply the target-side Q(...) filter at lines 69–75, but the inner method builds its own copies for the source-side filter. The final SQL will contain four subquery expressions where two would suffice.

Fix: extract a private _build_visibility_querysets(user) helper that returns (visible_corpora, visible_documents) and call it from both methods, or pass them as arguments.

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant