Skip to content

Commit 7d50c0f

Browse files
authored
Merge pull request #2047 from Open-Source-Legal/claude/gifted-darwin-4e01ww
Address post-merge review of the bulk-trash primitive (#2039)
2 parents 1290ff7 + d894fb8 commit 7d50c0f

5 files changed

Lines changed: 78 additions & 30 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
- **Post-merge review follow-ups for the bulk-trash primitive (#2039, follow-up
2+
to #2030).** Documentation + test polish, no behavior change. This work landed
3+
after the overlapping #2046 (review #2036) merged, so the
4+
`bulk_soft_delete_documents` parts are reconciled with it: in
5+
`opencontractserver/corpuses/services/lifecycle.py` the *memory* follow-up
6+
references (the docstring "Scaling caveat" and the `PERF/MEMORY` comment) are
7+
retargeted from the now-closed #1951 to the open #2045 — completing the #2039
8+
point that the `(#1951)` forward-reference dangles once #1951 is closed (#2046
9+
added the caveat but kept pointing at the closed #1951). The separate #2039
10+
nit about `trashed_doc_ids` being read outside its `with` block is now moot:
11+
#2046 moved the audit log into `transaction.on_commit` *inside* the block, so
12+
there is no outside-block reference left to guard and no pre-initialisation is
13+
needed. Both the module and class docstrings in
14+
`opencontractserver/corpuses/services/paths.py` are corrected to describe the
15+
mixed convention accurately — they previously claimed *all* methods were
16+
underscore-prefixed, which the public `disambiguate_path` /
17+
`reconcile_paths_after_folder_change` contradict — and record that the
18+
underscore helpers (e.g. `_dispatch_document_path_created_signals`) are
19+
deliberately shared across sibling services, while public `disambiguate_path`
20+
is also called from the model layer (`Corpus.add_document`) and
21+
`documents.versioning`. `mypy.ini`'s `python_version` comment is trimmed from
22+
12 lines to 6 — dropping prose while keeping the numpy detail and the
23+
dual-usage note (pre-commit hook + standalone CI `Run mypy` step) inline,
24+
rather than cross-referencing the `2030-mypy-py312` changelog fragment (which
25+
`collate_changelog.py --apply` deletes at release).
26+
`test_query_count_independent_of_document_count` gains `msg=` strings on both
27+
its query-count floor and the O(1) equality assertions so a future failure is
28+
self-documenting.

mypy.ini

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,12 @@
1010
# ---------------------------------------------------------------------------
1111

1212
[mypy]
13-
# Must track the runtime Python the project actually ships
14-
# (``compose/*/django/Dockerfile`` -> ``ARG PYTHON_VERSION=3.12.*``). It was
15-
# stale at 3.11, which made mypy reject PEP 695 ``type`` statements that newer
16-
# dependency stubs now emit -- e.g. numpy 2.5.0's ``numpy/__init__.pyi`` ships
17-
# a ``type`` alias that errors with "Type statement is only supported in
18-
# Python 3.12 and greater" under a sub-3.12 target, aborting the whole run
19-
# ("errors prevented further checking"). numpy is an unpinned transitive dep,
20-
# so a fresh CI env resolves it to latest and any PR's linter breaks. Keeping
21-
# this in sync with the Dockerfile is the durable fix (it also covers the
22-
# standalone ``Run mypy`` CI step, not just the pre-commit hook).
13+
# Must match ARG PYTHON_VERSION in compose/*/django/Dockerfile (3.12.*); it
14+
# governs both the pre-commit ``mypy`` hook and the standalone ``Run mypy`` CI
15+
# step (.github/workflows/backend.yml). A stale sub-3.12 target makes mypy
16+
# reject PEP 695 ``type`` statements emitted by newer (unpinned) dependency
17+
# stubs (e.g. numpy >= 2.5.0's ``__init__.pyi``), aborting the whole run with
18+
# "errors prevented further checking".
2319
python_version = 3.12
2420
check_untyped_defs = True
2521
ignore_missing_imports = True

opencontractserver/corpuses/services/lifecycle.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -467,7 +467,8 @@ def bulk_soft_delete_documents(
467467
fully into Python lists, so there is **no built-in document-count
468468
ceiling**. That is fine for realistic corpus sizes, but chunk/iterate
469469
the doc set here before exposing an unbounded ``empty_corpus`` / folder
470-
cascade-delete to arbitrarily large corpora (memory follow-up to #1951).
470+
cascade-delete to arbitrarily large corpora (memory follow-up tracked
471+
in #2045; #1951 was the now-closed query-count fix).
471472
472473
NOTE: This is an internal primitive that performs **no permission
473474
check** — callers (``empty_corpus``, folder cascade-delete) must already
@@ -515,7 +516,7 @@ def bulk_soft_delete_documents(
515516
# PERF/MEMORY: this ``list()`` (plus the caller's ``document_ids``
516517
# snapshot) is the unbounded allocation point behind the docstring
517518
# "Scaling caveat" — chunk/iterate here to add a document-count
518-
# ceiling for the empty_corpus / cascade-delete callers (#1951).
519+
# ceiling for the empty_corpus / cascade-delete callers (#2045).
519520
active_paths = list(
520521
DocumentPath.objects.select_for_update(of=("self",))
521522
.select_related("document")

opencontractserver/corpuses/services/paths.py

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
"""DocumentPath disambiguation internals for the corpus service layer.
1+
"""DocumentPath path-manipulation + signal-replay internals for the corpus service layer.
22
3-
``CorpusPathService`` holds the low-level :class:`DocumentPath` path
4-
manipulation helpers used by the folder write operations when documents are
5-
moved between folders or displaced by a folder deletion. Every method here is
6-
an internal helper (underscore-prefixed): the path layer performs NO
7-
permission checks — callers gate corpus permissions *before* reaching these
8-
helpers.
3+
``CorpusPathService`` holds the low-level :class:`DocumentPath` helpers used
4+
across the corpus/document layers when documents are moved between folders,
5+
displaced by a folder deletion, soft-deleted, or restored. The naming
6+
convention is mixed (a couple of methods are public, the rest are
7+
underscore-prefixed package-internal helpers) — see the class docstring for
8+
the distinction. No method here performs permission checks — callers gate
9+
corpus permissions *before* reaching these helpers.
910
1011
Split out of the former ``corpus_objs_service.py`` monolith — see
1112
``docs/refactor_plans/2026-05-21-service-layer-phase2-corpus-services-plan.md``
@@ -37,15 +38,25 @@
3738

3839

3940
class CorpusPathService(BaseService):
40-
"""Low-level :class:`DocumentPath` disambiguation helpers.
41-
42-
All methods are internal helpers (underscore-prefixed) shared by the
43-
folder write operations in
44-
:class:`~opencontractserver.corpuses.services.folders.FolderCRUDService`
41+
"""Low-level :class:`DocumentPath` path-manipulation + signal-replay helpers.
42+
43+
Shared across the corpus/document layers; they perform NO permission
44+
checks — the calling code gates corpus permissions first.
45+
46+
The naming convention is mixed. ``disambiguate_path`` and
47+
``reconcile_paths_after_folder_change`` are **public**: ``disambiguate_path``
48+
is called throughout the corpus service layer and beyond it — e.g.
49+
``Corpus.add_document`` in the model layer and
50+
``documents.versioning.restore_document``. The remaining methods are
51+
underscore-prefixed **package-internal** helpers — NOT class-private.
52+
``_dispatch_document_path_created_signals`` in particular is deliberately
53+
invoked from the bulk write paths in sibling services
54+
(:class:`~opencontractserver.corpuses.services.folders.FolderCRUDService`,
55+
:class:`~opencontractserver.corpuses.services.folder_documents.FolderDocumentService`,
4556
and
46-
:class:`~opencontractserver.corpuses.services.folder_documents.FolderDocumentService`.
47-
They perform NO permission checks — the calling service is responsible
48-
for gating corpus permissions first.
57+
:class:`~opencontractserver.corpuses.services.lifecycle.DocumentLifecycleService`)
58+
after each ``bulk_create`` of path rows, so a cross-service caller reaching
59+
it is expected, not a leak.
4960
"""
5061

5162
@staticmethod

opencontractserver/tests/test_corpus_objs_service.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1190,8 +1190,16 @@ def test_query_count_independent_of_document_count(self):
11901190
# Floor so the equality below isn't vacuously satisfied by 0 == 0 (a
11911191
# regression that silently skips all work): real trashing always runs at
11921192
# least the SELECT-FOR-UPDATE + is_current UPDATE + bulk_create.
1193-
self.assertGreaterEqual(len(small_ctx), 3)
1194-
self.assertEqual(len(small_ctx), len(large_ctx))
1193+
self.assertGreaterEqual(
1194+
len(small_ctx),
1195+
3,
1196+
"Expected at least SELECT FOR UPDATE + is_current UPDATE + bulk_create",
1197+
)
1198+
self.assertEqual(
1199+
len(small_ctx),
1200+
len(large_ctx),
1201+
"Query count must not vary with document count (the O(1) guarantee)",
1202+
)
11951203

11961204
# The ``is_public`` revocation branch is skipped for private corpora
11971205
# above, so prove it is ALSO O(1) in the document count: the
@@ -1215,7 +1223,11 @@ def test_query_count_independent_of_document_count(self):
12151223
large_pub, large_pub_ids, self.owner
12161224
)
12171225

1218-
self.assertEqual(len(small_pub_ctx), len(large_pub_ctx))
1226+
self.assertEqual(
1227+
len(small_pub_ctx),
1228+
len(large_pub_ctx),
1229+
"is_public revocation query count must not vary with document count",
1230+
)
12191231
# The public path adds the (constant) is_public revocation cost on top
12201232
# of the private path. Assert only that it is strictly greater — the
12211233
# cross-size equality above is the real O(1) guarantee; a hardcoded delta

0 commit comments

Comments
 (0)