Cap public crawl_authorities analyzer bounds to prevent unbounded/DoS crawls#2083
Cap public crawl_authorities analyzer bounds to prevent unbounded/DoS crawls#2083JSv4 wants to merge 4 commits into
crawl_authorities analyzer bounds to prevent unbounded/DoS crawls#2083Conversation
…orities The decorator in shared/decorators.py attaches _oc_corpus_analyzer_input_schema on the wrapper at runtime, but its declared return type is Callable[..., Any], which mypy rightly rejects. Added a scoped # type: ignore[attr-defined] on the single access site; the test remains a direct assertion (not getattr) so intent is preserved. Applied black reformatting to the same file.
Code Review — PR #2083: Cap public
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The existing test exercised only the clamp path; add assertions for the value=None -> default fallback and the ValueError raised for non-integers (including bool), covering the two lines codecov flagged in corpus_analysis_tasks.py.
Code ReviewThis PR introduces server-side caps on the 1. File:
2. Files: The service documents and implements 3. Files: The 4. File: All five 5. File: This is a pure stateless helper with no Celery or task-specific dependency. Per CLAUDE.md: "Before writing new utilities: Check existing utility files first ( 6. Type hint File:
|
Code ReviewThis PR adds server-side safety caps to 1. LLM tool path bypasses every safety cap (Security)
Fix: apply the same 2.
|
Motivation
crawl_authoritiescorpus analyzer accepted attacker-controlled numeric bounds from GraphQL which were propagated into the crawl loop with no server-side validation or safe upper bounds, allowing authenticated users to request arbitrarily large crawls.token_budget=0was treated as unbounded by the service, enabling a trivial bypass of budget enforcement.Description
opencontractserver/enrichment/constants.py(CRAWL_MAX_*constants) to define server-side maximums for public analyzer parameters._resolve_crawl_boundhelper inopencontractserver/tasks/corpus_analysis_tasks.pythat validates and clamps user-supplied bounds to safeminimum/maximumranges and rejects non-integer/bool inputs.crawl_authoritiesanalyzerinput_schemato advertise the enforced maximums and requiretoken_budget >= 1, and call_resolve_crawl_boundto pass clamped values intoCrawlAuthoritiesService.crawl.opencontractserver/tests/test_crawl_authorities.pyasserting the schema maximums and that_resolve_crawl_boundclamps/normalizes extreme or disallowed inputs.Testing
opencontractserver/enrichment/constants.pywithpython -m py_compileand parsed modified Python modules withast.parse, both succeeded.python -m pytest opencontractserver/tests/test_crawl_authorities.py::CeleryTaskImportTest -q) but it could not execute in this environment due to missing runtime dependency (ModuleNotFoundError: No module named 'django').Codex Task