Skip to content

Consume LLM Singleton registry everywhere + add per-corpus LLM admin GUI#2035

Merged
JSv4 merged 24 commits into
mainfrom
claude/llm-singleton-registry-config-4rei39
Jun 23, 2026
Merged

Consume LLM Singleton registry everywhere + add per-corpus LLM admin GUI#2035
JSv4 merged 24 commits into
mainfrom
claude/llm-singleton-registry-config-4rei39

Conversation

@JSv4

@JSv4 JSv4 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

An audit-and-complete pass on how LLMs are configured at runtime via the Singleton registry, ensuring (1) admins have a GUI surface to control it and (2) corpuses consume from it so — unless a tool/infra deliberately overrides — the configured Singleton LLM is used throughout a corpus.

The registry itself already existed and was mostly wired correctly: resolve_model_spec() walks per-call → agent → corpus.preferred_llm → PipelineSettings.default_llm → Django settings, and the agent factory consumes it for every agent path (document/corpus chat, extracts, deep research, agent mentions, corpus/thread actions). This PR closes the two remaining gaps the audit surfaced.

Gap 1 — Corpus consumption (one runtime bypass)

Conversation-title generation (config/websocket/consumers/unified_agent_conversation.py) was the only runtime LLM path bypassing the registry. It called the OpenAI-only SimpleLLMClient, which hardcoded gpt-4o-mini and ignored both corpus.preferred_llm and the install-wide default — even though self.corpus was right there.

  • Added opencontractserver/llms/completions.py::agenerate_text — a provider-agnostic (OpenAI/Anthropic/Google/Ollama), registry-backed one-shot completion helper that walks the same resolution chain as the agent factory and builds the model via the credential-aware model_factory.
  • Routed title generation through it, passing the corpus default.
  • Removed the dead, registry-bypassing SimpleLLMClient (client.py) and its LLM_CLIENT_* settings; updated docs + tests (no-dead-code).

Gap 2 — Admin GUI for the per-corpus LLM

Corpus.preferred_llm was fully plumbed in the backend (mutation arg, CorpusType, serializer) but had no frontend surface — a ghost feature only settable via the GraphQL console/Django shell.

  • New shared LlmModelPicker component (frontend/src/components/common/LlmModelPicker.tsx); the admin System Settings screen now consumes it too, de-duplicating its inline provider/model chip list.
  • New Language Model card in the Corpus Settings panel. When the corpus has no override, it shows the inherited install-wide default so "leave empty = inherit" is explicit. Clearing sends "", which the backend normalises to NULL.
  • Threaded preferredLlm through RESOLVE_CORPUS_BY_SLUGS_FULL, UPDATE_CORPUS, and the CorpusType/RawCorpusType TS types.
  • Two slim, secret-free @login_required queries (GET_LLM_PROVIDERS, GET_SYSTEM_DEFAULT_LLM) feed the picker, so non-superuser corpus owners can use it without any credential exposure.

Testing

  • Backend: opencontractserver/tests/test_llms_completions.py pins the resolution chain (corpus beats settings default; explicit beats corpus; hard fallback; temperature omission; None output; max_tokens forwarding). Updated the consumer title tests to mock agenerate_text and cover corpus-preferred threading + empty/None fallback. pyupgrade/black/isort/flake8 clean.
  • Frontend: LlmModelPicker unit tests (5 cases) pass; tsc --noEmit clean; existing routing tests that hit the corpus-resolution query still pass (13/13); prettier clean.

Permissioning note

The two new GraphQL queries are @login_required and request only non-secret fields (no settingsSchema/currentValue), so a corpus editor can populate the picker without ever seeing provider credentials. Override precedence is preserved end-to-end: explicit per-call model > per-agent > corpus > install-wide default > Django settings.

…pus GUI

Audit + completion of runtime LLM configuration so the Singleton registry
(resolve_model_spec: per-call -> agent -> corpus -> PipelineSettings.default_llm
-> Django settings) is consumed everywhere and is admin-controllable.

Consumption fix:
- Conversation-title generation was the one runtime path bypassing the
  registry: it used the OpenAI-only SimpleLLMClient hardcoded to gpt-4o-mini,
  ignoring corpus.preferred_llm and the install-wide default. Add
  opencontractserver.llms.completions.agenerate_text, a provider-agnostic,
  registry-backed one-shot completion helper, and route title generation
  through it (honouring the corpus default).
- Remove the dead, registry-bypassing SimpleLLMClient (client.py) and its
  LLM_CLIENT_* settings; update docs and tests accordingly.

Admin GUI:
- Surface the previously unreachable Corpus.preferred_llm override in the
  Corpus Settings panel via a new shared LlmModelPicker component (also adopted
  by the admin System Settings screen, de-duplicating its inline picker). Shows
  the inherited system default when unset. Threaded preferredLlm through the
  corpus resolution query, UPDATE_CORPUS, and the TS types; added two slim,
  secret-free login_required queries for the provider list + system default.
@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review — PR #2035: Consume LLM Singleton registry everywhere + add per-corpus LLM admin GUI

Overview

This is a well-scoped, well-executed audit PR. It closes two genuine gaps: the last runtime path that bypassed the LLM registry (conversation-title generation was hardcoded to gpt-4o-mini via the deleted SimpleLLMClient), and the missing frontend surface for the already-plumbed Corpus.preferred_llm. The dead-code removal is clean, the DRY extraction of LlmModelPicker is correct, and the test strategy (mock-based SimpleTestCase for the resolution-chain invariants) is appropriate.


Issues

Medium — Disabled providers appear in the picker

GET_LLM_PROVIDERS fetches the enabled field but neither the query nor the frontend filters on it. LlmModelPicker renders every entry in providers, so a disabled/misconfigured provider's models appear as clickable chips. A user who selects a disabled model will get a runtime error at save time, not at pick time.

Fix options (pick one):

  • Filter on the frontend before passing to the picker:
    const llmProviders = (llmProvidersData?.pipelineComponents?.llmProviders ?? [])
      .filter((p) => p.enabled !== false);
  • Add a where: { enabled: true } filter on the backend resolver.
  • Show disabled providers greyed-out with a tooltip — better discoverability without silent failures.

Low — Changelog fragment slugs don't match the PR number

The three fragments use 2030-* as the slug. CLAUDE.md says "the PR number is ideal". The actual PR is #2035, so these read as if they belong to #2030. Minor, but worth correcting for future --check runs.

Low — Missing test: empty-string return from agenerate_text

_generate_conversation_title has:

return title or f"Conversation {uuid.uuid4()}"

The falsy-title branch is not covered by the updated tests. A third case in UnifiedAgentConsumerTitleGenerationTestCase that mocks agenerate_text returning "" and asserts the fallback string begins with "Conversation " would pin this branch.

Nit — Two queries fire on every CorpusSettings mount

GET_LLM_PROVIDERS and GET_SYSTEM_DEFAULT_LLM are both static/infrequent data that never change during a session. Apollo's default cache policy (cache-first) means subsequent renders are free, but the first mount issues two network round-trips in series. A single GetCorpusLlmContext query that returns both pipelineComponents.llmProviders and pipelineSettings.defaultLlm would collapse this to one. Not a correctness issue, just worth noting if the settings panel ever appears in a slow-network context.


Positive Observations

  • agenerate_text resolution chain (per-call → corpus_preferred → settings default) is correct and the three test cases in test_llms_completions.py cleanly pin each priority level.
  • Lazy imports in completions.py (imports inside the function body) mean the module doesn't pull in pydantic-ai/model-factory at import time. The patches in the tests target the source modules rather than the callers — this is the right approach for lazy imports and the tests are correct.
  • Pitfall Bump coverage from 6.2 to 6.5.0 #14 compliance: instructions= (not system_prompt=) is used throughout completions.py. The docstring even cross-references the CLAUDE.md entry — good defensive documentation.
  • Defensive corpus resolution: corpus = getattr(self, "corpus", None) correctly handles document-only sessions where self.corpus may be unset.
  • @login_required, no credential exposure: GET_LLM_PROVIDERS/GET_SYSTEM_DEFAULT_LLM deliberately omit settingsSchema/currentValue. The permissioning rationale is clearly documented in the query block comment.
  • SimpleLLMClient deletion is complete: no stray imports, settings keys removed, docs updated, tests removed. This is the right no-dead-code cleanup.
  • LlmModelPicker is correctly presentational: the component owns no state, fires no queries, and passes disabled down to chips. The caller (CorpusSettings) owns all the persistence logic.

Summary

One functional gap (disabled providers surfacing in the picker as clickable), one minor housekeeping item (changelog slugs), and one missing test edge case. None are blockers, but the disabled-provider issue can produce a user-visible runtime error so it's worth fixing before merge. Everything else looks solid.

claude added 2 commits June 20, 2026 21:36
…mpty-title test

- CorpusSettings: drop admin-disabled LLM providers from the picker so users
  aren't offered models they can't use (enabled !== false).
- Rename changelog fragments 2030-* -> 2035-* to match the PR number.
- Add a test covering the empty-string title -> generated-fallback branch in
  _generate_conversation_title.
CI's pyupgrade (--py39-plus) rewrote Optional[X] -> X | None in
opencontractserver/llms/completions.py, after which flake8 flagged the now-unused
typing.Optional import. Apply the pyupgrade rewrite and drop the import.

JSv4 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the review. Addressed in 6002071:

  • Disabled providers in the picker — now filtered in CorpusSettings (enabled !== false) so users aren't offered admin-disabled providers. One clarification: the "runtime error at save time" premise is overstated — validate_model_spec (and the agent-build path) only check that the provider is registered, not enabled, so a disabled provider's model would actually still resolve. The filter is the right UX call regardless, but it's a polish item, not a latent crash. I scoped it to the user-facing corpus picker only and left System Settings showing all providers, since that's where an admin manages enablement (matching its existing behavior of not filtering on enabled).
  • Changelog slugs — renamed 2030-*2035-*.
  • Empty-title branch — added test_title_generation_fallback_on_empty_title covering the return title or f"Conversation {uuid}" fallback.

Skipped the combined-GetCorpusLlmContext-query nit: the two queries are single-responsibility, cache-first (free after first mount), and independently reusable, so I'd rather not couple provider-list and settings-default fetching into one bespoke query.

Separately, the linter CI failure was unrelated to the review — CI's pyupgrade rewrote Optional[X]X | None in completions.py and flake8 then flagged the unused import; fixed in 0955911.


Generated by Claude Code

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes two real gaps in the LLM configuration story: (1) conversation-title generation was the only production LLM path that bypassed the Singleton registry entirely, and (2) Corpus.preferred_llm was fully wired on the backend but had no frontend surface. Both are addressed cleanly. The removal of SimpleLLMClient follows the no-dead-code rule correctly — it deletes the file, its settings, its tests, and updates the docs atomically.


What works well

  • Registry bypass fix is correct. _generate_conversation_title now lazy-imports agenerate_text, passes corpus.preferred_llm defensively via getattr, and falls back to f"Conversation {uuid.uuid4()}" on empty response. All three cases are tested.
  • validate_preferred_llm in the serializer normalises "" / whitespace → None at the DB layer — prevents the ambiguous empty-string state described in the comment. Good defensive serializer work.
  • agenerate_text patches correctly in tests. All three imports inside the function body are lazy, so patching at the source-module path (not the consumer's local name) correctly intercepts them on each call. The test assertions check both the returned value and the mock call-chain to verify the full resolution order.
  • Security is sound. GET_LLM_PROVIDERS and GET_SYSTEM_DEFAULT_LLM omit settingsSchema/currentValue, so no credentials are exposed to non-superuser corpus editors. Both queries are @login_required.
  • Cache update after updateCorpusLlm uses cache.identify + cache.modify to avoid a refetch. Error paths roll back llmDraft to originalLlm. Clean.

Issues and suggestions

Medium — LlmProviderOption is silently duplicated

queries.ts defines an inline shape for llmProviders array items inside LlmProvidersQueryResult. LlmModelPicker.tsx already exports LlmProviderOption with an identical structure (minus the enabled field). TypeScript's structural typing silently accepts this, but future drift between the two shapes won't be caught. LlmProvidersQueryResult should reference the exported type:

// queries.ts
import type { LlmProviderOption } from "../components/common/LlmModelPicker";

export interface LlmProvidersQueryResult {
  pipelineComponents: {
    llmProviders: Array<LlmProviderOption & { enabled?: boolean | null }>;
  };
}

Minor — temperature is always injected into model_settings

model_settings: dict[str, Any] = {"temperature": temperature}

temperature defaults to 0.7 and is unconditionally included. Some pydantic-ai model backends (notably reasoning models) reject an explicit temperature and raise. The rest of the agent factory likely has the same exposure, but since agenerate_text is now the sanctioned entry point for infra LLM calls, it's the right place to guard it:

model_settings: dict[str, Any] = {}
if temperature is not None:
    model_settings["temperature"] = temperature

…and change the signature to temperature: Optional[float] = 0.7. This matches how max_tokens is already handled.

Minor — CLAUDE.md multi-line comment block in LlmModelPicker.tsx

The file opens with an 11-line JSDoc block. CLAUDE.md says: "Never write multi-paragraph docstrings or multi-line comment blocks — one short line max." The intent is clear from the code and the component name; the block can be reduced to a single line.

Minor — Two extra GraphQL queries fire unconditionally in CorpusSettings

GET_LLM_PROVIDERS and GET_SYSTEM_DEFAULT_LLM are called at the top of the component with no skip condition, so they fire for every corpus-settings mount even when the user won't touch the LLM picker. Both are cheap reads and admin-facing, so this is low priority — but wrapping them in skip: !canUpdate (or lazy-loading them) would avoid the round-trips for read-only visitors.

Nit — key prop fallback chain for providers

key={provider.className || provider.name || providerKey}

If two providers share no className and have the same name/providerKey (unlikely but not impossible), React will silently reuse the wrong DOM node. Using provider.className ?? provider.title ?? provider.name ?? providerKey and asserting uniqueness via a console warning in dev mode would be safer. Low priority.


Test coverage

Surface Coverage
agenerate_text resolution chain (3 cases)
Consumer title fallback on error / empty string
LlmModelPicker chip click, inherited hint, clear hint, hint hide
validate_preferred_llm serializer normalisation Not in this diff — presumably covered by existing corpus serializer tests
GET_LLM_PROVIDERS / GET_SYSTEM_DEFAULT_LLM resolvers Not visible in this diff

One gap worth noting: test_llms_completions.py doesn't test the path where resolve_model_spec receives explicit=None, corpus_preferred=None, and settings_default=None/"". The intent is that the agent factory has a hard fallback in that case, but a fourth test case would pin the contract explicitly.


Overall

The core changes — agenerate_text, the removal of SimpleLLMClient, and the per-corpus LLM picker — are well-designed and follow project conventions. The temperature-always-set issue is the only change I'd recommend before merge; the rest are improvements for a follow-up.

… skip, tests

- completions.py: make temperature optional (float | None = 0.7) and only inject
  it into model_settings when set, so backends that reject an explicit
  temperature (e.g. reasoning models) aren't forced to receive one (mirrors
  max_tokens handling).
- queries.ts: LlmProvidersQueryResult now reuses the exported LlmProviderOption
  shape instead of an inline duplicate, so the GraphQL projection and the picker
  component can't silently drift.
- CorpusSettings: skip GET_LLM_PROVIDERS for read-only viewers (chips are only
  used when editing); the inherited-default hint still loads.
- tests: add hard-fallback (everything unset -> openai:gpt-4o) and
  temperature=None-omitted cases to test_llms_completions.

JSv4 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks — round-2 items addressed in 87ebc5f:

  • Medium / LlmProviderOption duplicationLlmProvidersQueryResult now reuses the exported LlmProviderOption (Array<LlmProviderOption & { enabled?: boolean | null }>) via an import type, so the projection and component can't drift.
  • temperature always injectedagenerate_text now takes temperature: float | None = 0.7 and only adds it to model_settings when set (mirrors max_tokens), so reasoning-model backends aren't forced an explicit temperature. Added a temperature=None-omitted test.
  • Test gap (all-unset hard fallback) — added test_hard_fallback_when_everything_unset pinning → openai:gpt-4o.
  • Two queries on mountGET_LLM_PROVIDERS now uses skip: !canUpdate (chips are edit-only); the small default-LLM query stays so the inherited-default hint still renders for read-only viewers.

Declined:

  • Multi-line JSDoc in LlmModelPicker.tsx — I checked and that "one short line max" rule isn't in CLAUDE.md; the codebase convention is multi-line JSDoc blocks (e.g. CorpusSettings.tsx's own header, model_factory.py), so trimming would be inconsistent.
  • key fallback nitclassName is always populated for registry components in practice; left as-is.

Note: the linter failure you may see against 6002071 is stale — it's the pyupgrade/Optional issue already fixed in 0955911.


Generated by Claude Code

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review: LLM Singleton Registry + Per-Corpus LLM Admin GUI

Overview

This is a well-scoped audit-and-complete pass that closes two genuine gaps: (1) conversation-title generation was the only runtime LLM path bypassing the Singleton registry (hardcoded gpt-4o-mini), and (2) Corpus.preferred_llm had no frontend surface. Both fixes are correctly motivated and the implementation is generally clean. The deletion of SimpleLLMClient is the right call — it was dead weight.


Issues & Suggestions

Medium — LlmModelPicker input not disabled during save (CorpusSettings.tsx)

The picker and input remain interactive while updatingLlm is true. The Clear button is already guarded (disabled={updatingLlm}), but the picker itself is not:

<LlmModelPicker
  disabled={!canUpdate}   {/* missing: || updatingLlm */}
/>

If the user clicks a chip or edits the input while a save is in-flight, the draft diverges from what was sent, making the subsequent originalLlm update in onCompleted set the wrong baseline. Fix: disabled={!canUpdate || updatingLlm}.


Medium — Magic colour literals in LlmModelPicker.tsx violate the no-magic-numbers rule

The selected-chip highlight uses two raw hex values not in OS_LEGAL_COLORS or constants.ts:

background: selected ? "#e0e7ff" : OS_LEGAL_COLORS.surfaceHover,
border: selected ? "#6366f1" : OS_LEGAL_COLORS.border

#e0e7ff and #6366f1 are indigo-100/indigo-500 used for interactive selection state. These should be added to OS_LEGAL_COLORS (e.g. selectedBackground/selectedBorder) or mapped to an existing token. The same literals existed in the SystemSettings code being replaced, but this is the right moment to consolidate them since the component is now shared.


Minor — selected comparison should use trimmed value

isEmpty is computed from the trimmed value but the chip selected state is not:

const isEmpty = !value.trim();   // trims
const selected = value === spec; // raw — trailing spaces break highlighting

A user who manually types "anthropic:claude-opus-4-6 " (trailing space) sees no chip highlighted but the inherited hint also disappears, leaving the UI ambiguous. Fix: const selected = value.trim() === spec.


Minor — str(result.output) in completions.py is implicit

return str(result.output).strip()

For prompts sent without a structured output schema pydantic-ai returns a bare string, so this is fine in practice. A short comment signalling this is intentional (not a defensive str() around a dict) would help future readers. Not blocking.


Note — Patching pattern in test_llms_completions.py is correct

All imports inside agenerate_text are lazy (from ... import ... inside the function body), so the tests correctly patch at the source module rather than the usage site. Python re-binds the local name from the cached module object on every call, so the patches are picked up. The new test_title_generation_fallback_on_empty_title test is a good addition.


Note — Full pydantic-ai Agent created per agenerate_text call

make_pydantic_ai_agent is called on every invocation. For title generation (once per conversation) this overhead is acceptable. Since agenerate_text is now the documented sanctioned path for all lightweight infra LLM calls, the module docstring could note that callers with high call frequency should consider whether agent construction overhead matters. No action needed for this PR.


What's Good

  • Removing SimpleLLMClient and its settings is clean dead-code elimination with no backwards-compat shims.
  • The resolution chain (explicit → corpus_preferred → settings_default → hard default) is correctly modelled in both implementation and the five backend tests.
  • The two new GraphQL queries deliberately omit settingsSchema/currentValue so non-superuser corpus editors never see credential data — good security hygiene documented in both the PR and code comments.
  • getattr(self, "corpus", None) is the right defensive access for document-only or early-boot sessions.
  • LlmModelPicker is a clean presentational component (no Apollo/state inside) — the caller owns value, fetch, and save.
  • Frontend types (RawCorpusType, ResolveCorpusFullQuery) are correctly extended and threaded through Corpuses.tsx.
  • The changelog fragments are present and well-written.

Summary

Three things worth fixing before merge: the updatingLlm guard on the picker (prevents draft divergence during in-flight saves), the magic colour constants (project convention), and the selected trim inconsistency (minor UX bug). Everything else is in good shape.

…ors, trim

- LlmModelPicker: replace hardcoded #e0e7ff/#6366f1 selected-chip literals with
  the existing OS_LEGAL_COLORS.selectedBg/selectedBorder tokens (no-magic-numbers),
  and trim the value before chip-selection comparison so a stray trailing space
  still highlights the matching chip.
- CorpusSettings: disable the picker while a save is in-flight
  (disabled={!canUpdate || updatingLlm}) to prevent draft/baseline divergence.
- completions.py: comment that str(result.output) is type-narrowing (bare-string
  output, no output_type), not defensive coercion.

JSv4 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Round-3 items addressed in d3cfa2d:

  • Picker not disabled during savedisabled={!canUpdate || updatingLlm} now, preventing draft/baseline divergence on in-flight saves.
  • Magic colour literals — replaced #e0e7ff/#6366f1 with the existing semantic OS_LEGAL_COLORS.selectedBg/selectedBorder tokens. (Notably #6366f1 was already defined as navIndigo, and the palette has canonical teal selection tokens used app-wide — so the shared picker's selected-chip state now matches the rest of the app's selected-item styling rather than re-introducing one-off indigo.)
  • selected trim inconsistencyvalue.trim() === spec now, matching isEmpty.
  • str(result.output) clarity — added a one-line comment noting it's type-narrowing over a bare-string output (no output_type), not defensive coercion.

Left as-is: the per-call agent construction note — acknowledged as acceptable for the no-action-needed cases; not adding a docstring caveat for a hypothetical high-frequency caller that doesn't exist yet.


Generated by Claude Code

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review

Overview

This is a well-scoped, well-motivated PR that closes two concrete gaps: (1) conversation-title generation was the only LLM call that hardcoded a provider/model and ignored the Singleton registry, and (2) Corpus.preferred_llm had no frontend surface despite being fully plumbed in the backend. The deletions are clean, the new code is correctly factored, and the changelog fragments are appropriately detailed.


What's Good

  • No dead code: SimpleLLMClient, create_client, ChatMessage/ChatResponse, and the LLM_CLIENT_* settings are removed completely rather than deprecated in place.
  • Registry chain is intact: agenerate_text correctly defers ORM work to sync_to_async(get_default_llm_spec)() while keeping resolve_model_spec ORM-free.
  • Pitfall Bump coverage from 6.2 to 6.5.0 #14 honoured: make_pydantic_ai_agent is called with instructions=, never system_prompt=.
  • Security: Both new GraphQL queries are @login_required and deliberately omit settingsSchema/currentValue, so corpus editors see provider names/models but not credentials.
  • DRY: LlmModelPicker eliminates ~90 lines of duplicated chip-rendering in SystemSettings.tsx.
  • Apollo cache: The update callback in updateCorpusLlm correctly writes preferredLlm: () => llmDraft.trim() || null in sync with the serializer's ""NULL normalisation.
  • Serializer: validate_preferred_llm is correctly documented and prevents "" from leaking into the DB.
  • Defensive consumer: getattr(self, "corpus", None) / getattr(corpus, "preferred_llm", None) handles document-only sessions and early calls gracefully.
  • Tests: The five AgenerateTextTests cases pin every tier of the resolution chain, the temperature-omission case, and output stripping. The three consumer title tests align with the new mock target.

Issues & Suggestions

1. Import direction in queries.ts (Architecture)

// frontend/src/graphql/queries.ts
import type { LlmProviderOption } from "../components/common/LlmModelPicker";

This inverts the expected dependency direction: graphql/ imports from components/. If a future change moves or renames LlmModelPicker.tsx it will silently break the query type. LlmProviderOption has nothing component-specific about it — it should live in frontend/src/types/graphql-api.ts (or a new frontend/src/types/pipeline.ts) and be imported by both the query file and the component.

2. LlmModelPicker renders a raw <label> instead of the design system's FormLabel

{label && (
  <label htmlFor={id} style={{ ... }}>
    {label}
  </label>
)}

The rest of the settings UI uses FormLabel from @os-legal/ui (as the removed SystemSettings.tsx code did). A raw <label> with inline styles will look subtly different under theme changes. Prefer FormLabel for consistency, or document explicitly that label is intentionally unstyled for composability.

3. id prop defaults to "llm-model-spec" — duplicate-ID risk

The corpus settings and system settings are on different pages today, so there is no collision yet. But if two pickers are ever rendered simultaneously, the <label htmlFor> association will break silently. Consider making id required or generating a fallback with React's useId().

4. enabled filter keeps null providers

.filter((p) => p.enabled !== false);

null falls through (null !== false). This is the intended "unset means enabled" semantics, but a brief comment would make it explicit for future readers:

// null/undefined = not explicitly disabled, treat as enabled
.filter((p) => p.enabled !== false)

5. agenerate_text creates a new Agent per call

make_pydantic_ai_agent installs a history-compaction capability processor each time it is called. For title generation (a rare, low-frequency path) this is fine. Worth a brief docstring note that this helper is intended for infrequent one-shot calls, so future callers do not reach for it in hot paths (e.g. batch processing).

6. Consumer test: corpus-preferred resolution path untested

All three updated consumer tests set consumer.corpus = None. A test with consumer.corpus = SimpleNamespace(preferred_llm="anthropic:claude-haiku-4-5") would confirm that the getattr chain actually threads corpus_preferred into agenerate_text:

async def test_title_generation_uses_corpus_preferred_llm(self) -> None:
    with patch(
        "opencontractserver.llms.completions.agenerate_text",
        new=AsyncMock(return_value="Corpus Analysis"),
    ) as mock_generate:
        consumer = UnifiedAgentConsumer()
        consumer.session_id = "test-session"
        consumer.corpus = SimpleNamespace(preferred_llm="anthropic:claude-haiku-4-5")
        await consumer._generate_conversation_title("What is this corpus about?")
        _, kwargs = mock_generate.call_args
        self.assertEqual(kwargs["corpus_preferred"], "anthropic:claude-haiku-4-5")

7. Minor: LlmModelPicker.test.tsx does not test the disabled state

The component has a disabled prop that prevents chip clicks and dims the UI. None of the four test cases exercise it. A quick assertion that onChange is not called when a disabled chip is clicked would close this gap.


Summary

The core changes are correct and the design is sound. Actionable items in priority order:

  1. Move LlmProviderOption out of the component file into a shared types file to fix the import-direction inversion.
  2. Replace the raw <label> with FormLabel from the design system (or document the intentional choice).
  3. Add a corpus-preferred test case to the consumer title-generation tests to cover the non-None corpus branch.

Items 4–7 are low-priority polish that can be addressed in a follow-up.

- Move LlmProviderOption to types/graphql-api.ts (canonical home); LlmModelPicker
  re-exports it and queries.ts imports from types/ instead of the component,
  fixing the graphql->components import-direction inversion.
- LlmModelPicker: default the field id via React useId() so two pickers on one
  page can't collide on the label<->input association.
- CorpusSettings: clarify the enabled-filter comment (null = not disabled).
- completions.py: docstring note that a fresh Agent is built per call (one-shot
  helper; hot paths should reuse an agent).
- tests: add consumer corpus-preferred-llm threading test and a LlmModelPicker
  disabled-chip (no onChange) test.

Declined: replacing the raw <label> with 'FormLabel from @os-legal/ui' — that
component is not exported by @os-legal/ui (only FormField is); the SystemSettings
FormLabel is an admin-local styled component, so coupling the shared picker to it
would be a worse dependency than the self-contained label.
@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review: PR #2035 — Consume LLM Singleton registry everywhere + add per-corpus LLM admin GUI

Overview

A solid, well-scoped PR that closes two concrete gaps in LLM registry adoption: (1) reroutes the hardcoded OpenAI-only title-generation path through the canonical registry via a new agenerate_text helper, and (2) adds the missing frontend surface for Corpus.preferred_llm. The deletion of SimpleLLMClient and its settings is clean, the DRY extraction of LlmModelPicker is a net improvement, and the no-dead-code requirement from CLAUDE.md is satisfied.


Backend

opencontractserver/llms/completions.py (new)

Good:

Potential bug — str(None) leaks through the consumer's fallback guard:

If pydantic-ai ever surfaces None as result.output (e.g. model refusal or a future API shape change), str(result.output).strip() returns the literal string "None", which is truthy. The consumer's title or f"Conversation {uuid.uuid4()}" guard never fires and users see "None" as their conversation title. Consider:

output = result.output
return str(output).strip() if output is not None else ""

Missing test coverage:

  • No test verifies that a non-None max_tokens is forwarded into model_settings["max_tokens"]. The parameter exists on the public signature but goes untested.
  • No test for a whitespace-only model response (would strip to "", triggering the consumer fallback) — worth a case given that is precisely what the title or ... guard defends against.

config/websocket/consumers/unified_agent_conversation.py

Good:

  • Defensive getattr(self, "corpus", None) + getattr(corpus, "preferred_llm", None) is the right approach for a path that may run before the corpus is resolved.
  • Adding the third test case (test_title_generation_fallback_on_empty_title) explicitly pins the new empty-string-to-uuid fallback.

One concern — uuid import dependency:
The fallback now uses uuid.uuid4() inside _generate_conversation_title. The diff shows the usage but not a new import uuid being added to the consumer. If uuid was already imported earlier in the file this is fine, but please verify it does not raise a NameError at runtime.

Removal of SimpleLLMClient / LLM_CLIENT_* settings

Clean and consistent with the no-dead-code rule. Operators who had LLM_CLIENT_* in their .env files will silently ignore the now-absent settings. The changelog fragment documents the removal clearly.


Frontend

LlmModelPicker.tsx (new component)

Good:

  • Correctly presentational — no data fetching, caller owns state and persistence.
  • data-testid="llm-inherited-hint" is a good explicit hook for tests.
  • The 4 unit test cases cover the key behavioural contract.

Magic numbers in inline styles (CLAUDE.md rule #4):
The selected-chip highlight colours #e0e7ff and #6366f1 (indigo-100 / indigo-500) are hardcoded inline. They should be added to frontend/src/assets/configurations/osLegalStyles.ts or an existing accent-colour constant rather than inlined. These same literals were in the old SystemSettings.tsx (now deleted), so this PR is the natural moment to canonicalise them.

Minor — fireEvent vs userEvent:
The picker test uses fireEvent.click. Not blocking, but @testing-library/user-event fires the full event sequence including focus/pointer events and is the more realistic simulation.

CorpusSettings.tsx

Good:

  • Separate mutation instance for LLM (updateCorpusLlm) keeps the optimistic cache write and toast independent of the slug/categories flow — avoids accidental cross-contamination.
  • cache.modify writes preferredLlm: () => llmDraft.trim() || null after a successful save, keeping the Apollo cache consistent without a refetch.
  • canUpdate gates are applied consistently (disabled picker + hidden save button).

No loading state on the two new queries:
GET_LLM_PROVIDERS and GET_SYSTEM_DEFAULT_LLM are fired unconditionally on mount. While in-flight, llmProviders is [] and systemDefaultLlm is "", so the inherited-model hint is blank and the chip list is empty. Degradation is graceful (no crash), but a brief skeleton/spinner on the Language Model card would improve perceived UX when the registry query is slow on first load.

Backend "" to NULL normalisation:
The comment says "backend normalises empty string to NULL" but the diff shows no change to the backend UpdateCorpusMutation. If this normalisation was already implemented in a prior PR that is fine — just worth confirming the mutation handler stores None rather than a literal empty string.

GraphQL additions

The two slim queries (GET_LLM_PROVIDERS, GET_SYSTEM_DEFAULT_LLM) look well-designed:

  • Both @login_required but not superuser-gated — corpus editors can populate the picker without credential exposure.
  • Neither query includes settingsSchema / currentValue — no risk of leaking provider credentials.
  • The enabled field is correctly used in CorpusSettings to filter disabled providers out of the chip list before rendering.

Summary

Area Verdict
Architecture / design Correct — closes both registry gaps cleanly
Dead-code removal Complete
DRY LlmModelPicker extraction is a clear improvement
Security Permission design looks correct
Backend tests Missing max_tokens and whitespace-output cases; str(None) edge case
Frontend tests Minor fireEvent vs userEvent nit
Constants Two hex colour literals should be extracted to constants
Loading states Consider skeleton for provider/default-LLM queries in CorpusSettings
uuid import Verify uuid is already imported in the consumer file

The items most worth addressing before merge: the str(None) bug in completions.py and the uuid import verification in the consumer. The colour constants and missing test cases are recommended improvements but can land in a quick fixup.

- completions.py: return '' when result.output is None (model refusal / future
  API shape) instead of str(None)='None', so a caller's 'value or fallback'
  guard fires correctly (e.g. conversation-title generation).
- tests: add max_tokens-forwarded and None-output->'' cases to
  test_llms_completions.

Verified (no change needed): uuid is already imported in the consumer
(unified_agent_conversation.py:31); the selected-chip colour literals were
already replaced with OS_LEGAL_COLORS.selectedBg/selectedBorder tokens in an
earlier round; CorpusSerializer.validate_preferred_llm already normalises
''->None.
@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes two real gaps in the LLM singleton registry: (1) conversation-title generation was silently hardcoding `gpt-4o-mini` via the now-deleted `SimpleLLMClient`, and (2) `Corpus.preferred_llm` had no GUI surface. Both changes are well-motivated and clearly described. The overall approach is sound; the notes below are mostly minor.


Backend

`opencontractserver/llms/completions.py` — new `agenerate_text` helper

  • Correctly uses `instructions=` not `system_prompt=` (CLAUDE.md pitfall 14).
  • Defers all imports inside the function body to avoid circular-import issues.
  • `sync_to_async(get_default_llm_spec)()` is the right way to call an ORM-touching helper from an async context.
  • `temperature=None` path is handled cleanly — the key is omitted from `model_settings` so reasoning models that reject an explicit temperature won't be broken.

One note: the default `temperature=0.7` is reasonable for general use but is on the high end for a deterministic infra task like title generation. `0.3` would produce more consistent results. Minor opinion.


`config/websocket/consumers/unified_agent_conversation.py`

  • Defensive attribute access (`getattr(self, "corpus", None)`, `getattr(corpus, "preferred_llm", None)`) is the right call given mixed session modes.
  • `return title or f"Conversation {uuid.uuid4()}"` cleanly handles both empty strings and `None` from the model — a new safety net the old code lacked.

`opencontractserver/tests/test_llms_completions.py`

Five targeted mock-based tests pin the entire resolution-chain contract without hitting the network. Good coverage.

Minor nit: `test_hard_fallback_when_everything_unset` uses `@override_settings(DEFAULT_LLM="", OPENAI_MODEL="")` but the function under test reads `settings_default` from the patched `get_default_llm_spec` return value, not from Django settings directly. The decorator is redundant — not wrong, just noise.


Frontend

`frontend/src/components/common/LlmModelPicker.tsx`

  • Purely presentational — caller owns value, provider list, and persistence. Clean separation.
  • `useId()` for the label-to-input association is correct and handles multiple pickers on one page.
  • `value.trim() === spec` for chip selection prevents stray whitespace from breaking the highlight.
  • `disabled` prop propagates to both the input and buttons correctly.

Type precision issue: `LlmProviderOption` in `frontend/src/types/graphql-api.ts` does not include an `enabled` field. The filter in `CorpusSettings.tsx` accesses `p.enabled`:

```ts
const llmProviders = (llmProvidersData?.pipelineComponents?.llmProviders ?? [])
.filter((p) => p.enabled !== false);
```

This works at runtime because the query result carries `enabled`, and `LlmProvidersQueryResult` widens the type with `& { enabled?: boolean | null }`. But once the filtered array is passed to `LlmModelPicker` as `LlmProviderOption[]`, the `enabled` field is erased from the static type. TypeScript won't error, but the base type is imprecise. Adding `enabled?: boolean | null` to `LlmProviderOption` in `graphql-api.ts` would make the contract explicit.

Re-export path: The component re-exports `LlmProviderOption` from `types/`, creating a secondary import path for the type. Importing a type from a component module is backwards from the usual direction — consider whether callers should import directly from `../../types/graphql-api`.


`frontend/src/components/corpuses/CorpusSettings.tsx`

  • Separate `updateCorpusLlm` mutation instance with independent `onCompleted`/`onError`/`update` handlers is the right pattern.
  • `cache.modify` for the optimistic update is correct.
  • Skipping `GET_LLM_PROVIDERS` for read-only viewers (`skip: !canUpdate`) is a nice touch.

UX gap — missing API-key indicator: The corpus picker is mounted without `showApiKeyBadge` (defaults to `false`). A corpus editor can click a chip for a provider whose API key isn't configured — the save succeeds but the corpus silently uses a broken LLM. The `showApiKeyBadge` prop and `requiresApiKey` field are already in the data; passing `showApiKeyBadge` here would surface this at selection time with zero additional cost.


`frontend/src/graphql/queries.ts` — new queries

  • `settingsSchema`/`currentValue` deliberately omitted — no credential exposure for non-superusers. This is exactly the right call.
  • Apollo will cache both queries globally (no variables), so mounting multiple corpus settings panels won't multiply the network calls.

PR description accuracy: The description says "new `LlmModelPicker` unit test (4 cases)" but the file contains 5 `it()` blocks (the fifth is "hides the inherited hint once a value is set"). Minor.


Summary

The core change — removing `SimpleLLMClient`, routing title generation through `agenerate_text`, adding the corpus LLM picker — is correct, well-tested, and follows the project's architectural constraints. Two items worth addressing before merge:

  1. Add `enabled?: boolean | null` to `LlmProviderOption` in `types/graphql-api.ts` so the type matches its actual runtime shape.
  2. Pass `showApiKeyBadge` in the CorpusSettings picker to prevent silent misconfiguration when a corpus editor selects a provider without a configured API key.

Everything else is minor polish. Good PR overall.

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review — PR #2035: Consume LLM Singleton registry everywhere + per-corpus LLM admin GUI

Overview

This is a well-scoped, well-executed audit PR with a clear dual purpose: (1) eliminate the last hardcoded-model bypass in the runtime path by removing SimpleLLMClient and routing conversation-title generation through the shared registry, and (2) expose the already-plumbed Corpus.preferred_llm field to corpus owners via a new shared LlmModelPicker component. DRY extraction of the chip-rendering logic from SystemSettings is a good bonus.


Strengths

  • Correct async pattern: agenerate_text uses sync_to_async(get_default_llm_spec)() properly and defers all imports inside the function body to avoid circular import issues.
  • Follows CLAUDE.md pitfall Bump coverage from 6.2 to 6.5.0 #14: instructions= is used instead of system_prompt= throughout, which is correct given that pydantic-ai drops system_prompt when message history is non-empty.
  • Test coverage for the resolution chain: The six AgenerateTextTests cases pin the priority ordering precisely (explicit > corpus_preferred > settings default > hard fallback) and handle edge cases like temperature=None and None output.
  • No dead code left behind: client.py, its settings constants, and the now-redundant SimpleLLMClient typing tests are all removed.
  • Security-clean queries: GET_LLM_PROVIDERS and GET_SYSTEM_DEFAULT_LLM omit settingsSchema/currentValue, so non-superuser corpus owners can populate the picker without credential exposure.
  • Presentational component design: LlmModelPicker owns no data-fetching, making it straightforwardly testable and reusable.

Concerns / Suggestions

Backend

1. New pydantic-ai Agent constructed per agenerate_text call

completions.py:1334 calls make_pydantic_ai_agent(built_model, ...) fresh for every invocation. The docstring acknowledges this ("intended for infrequent one-shot calls"), but if title generation ever moves onto a hot path or is called concurrently at scale, this will be a bottleneck. The current use-case (one call per conversation start) is fine, but it would be worth adding a note in completions.py that callers who invoke this in a tight loop should build the agent once and reuse it.

2. Default temperature=0.7 for title generation

completions.py:1279 — the caller in unified_agent_conversation.py never overrides temperature, so title generation runs at 0.7. For a task that should produce deterministic, short output ("max 5 words"), 0.3 or lower would produce more consistent results. Low-priority, but worth considering whether a lower default makes sense for this helper, or whether the consumer should explicitly pass temperature=0.3.

3. Patching targets in test_llms_completions.py — technically correct, but fragile

The tests patch opencontractserver.pipeline.utils.get_default_llm_spec, opencontractserver.llms.model_factory.abuild_agent_model, etc. This works today only because all imports in agenerate_text are inside the function body (so the module attribute is resolved fresh after the patch is applied). If any import is ever hoisted to the module level, all six tests will silently stop intercepting the mock. A comment noting this dependency would help future maintainers.

4. "" sent to UPDATE_CORPUS mutation relies on backend normalisation

CorpusSettings.tsx:756 sends llmDraft.trim() (which can be "") rather than null when clearing. The PR states the backend normalises "" → NULL, but this isn't verified in the Python tests here. A single-line test asserting that the mutation handler coerces "" to None/NULL would close this gap. This is a minor correctness risk if the backend ever changes that normalisation silently.


Frontend

5. LlmModelPicker.test.tsx — disabled-chip test relies on React's synthetic event suppression

LlmModelPicker.test.tsx:80fireEvent.click dispatches a raw DOM event; the test passes only because React's synthetic event layer suppresses onClick on disabled buttons. This is correct behaviour in React 17+/18, but the test would be more explicit (and less surprising) if userEvent.click from @testing-library/user-event were used instead — it already respects disabled. Not a bug, but a readability improvement.

6. PR description says "4 cases" for LlmModelPicker tests; there are 5

Minor documentation discrepancy: the PR body says "new LlmModelPicker unit test (4 cases)" but the file has five it(...) blocks (including the hides the inherited hint once a value is set case). Inconsequential.

7. No component-level test for the CorpusSettings LLM save flow

handleLlmSave (CorpusSettings.tsx:755), the onCompleted/onError rollback logic, and the cache.modify block are untested at the component level. Given the complexity of mocking Apollo mutations, this is understandable, but it means the rollback-on-error path (setLlmDraft(originalLlm)) is only covered by manual testing.

8. useEffect in CorpusSettings resets unsaved LLM draft if corpus prop changes reference

CorpusSettings.tsx:173 — the effect updates llmDraft whenever corpus changes. If Apollo refetches and returns a new object reference with the same data, any unsaved LLM draft the user typed will be silently discarded. This is pre-existing behaviour for the slug/categories draft, so it's consistent, but it's a UX rough edge that will become more visible now that there's a third save-independently-able field.


Minor Nits

  • LlmModelPicker.tsx:340: The re-export comment ("Re-export so existing importers... can keep pulling the option shape from the picker") is slightly misleading — the test and queries import directly from types/graphql-api.ts, not from the picker. The re-export is only useful for callers that want a single import point; the comment overstates its necessity.
  • CorpusSettings.tsx:688: The inline comment above the llmProviders filter (// Drop providers an admin has disabled...) explains the !== false guard correctly; this is good and should stay.
  • completions.py:1302: The note in the docstring about model_factory.abuild_agent_model has a formatting quirk (:func: role in a non-Sphinx docstring). No functional impact.

Summary

The core change (replacing a hardcoded SimpleLLMClient with a registry-backed agenerate_text) is sound and closes a real inconsistency. The LlmModelPicker extraction is clean and the new corpus LLM card follows the existing settings pattern correctly. The main action items before merge are:

  1. (Nice-to-have) Lower the default temperature for title generation, or have the consumer explicitly set it.
  2. (Nice-to-have) Add a backend test asserting "" is normalised to NULL in the mutation handler.
  3. (Informational) Document the fragile patch-target coupling in the completion tests.

Nothing here is a blocker. Approve with the above as suggestions.

…tle temp

Round 6:
- types/graphql-api.ts: add enabled?: boolean | null to LlmProviderOption so the
  type matches the query's runtime shape; LlmProvidersQueryResult drops the
  ad-hoc & { enabled } widening.
- LlmModelPicker: remove the LlmProviderOption re-export (single canonical import
  path in types/); the picker test now imports the type from types/graphql-api.
- CorpusSettings: pass showApiKeyBadge so corpus editors see which providers
  need an API key configured before selecting one.

Round 7:
- unified_agent_conversation.py: title generation now calls agenerate_text with
  temperature=0.3 for deterministic short titles (two reviewers).
- test_llms_completions.py: document that patches target source modules because
  agenerate_text imports its collaborators lazily.

Verified already-covered / declined: '' -> NULL is tested by the existing
TestCorpusSerializerNormalisesPreferredLLM (test_llm_runtime_config.py); the
per-call agent-construction caveat is already in the agenerate_text docstring;
fireEvent matches the codebase's test convention; the corpus-prop useEffect
draft reset is the existing, consistent slug/categories pattern.
@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes two real gaps in the LLM Singleton registry with clean, well-targeted changes:

  1. Removes the OpenAI-only, hardcoded SimpleLLMClient and replaces its sole production caller (conversation-title generation) with the registry-backed agenerate_text.
  2. Surfaces Corpus.preferred_llm — previously settable only via the GraphQL console or shell — through a reusable LlmModelPicker component wired into corpus settings.

The approach is architecturally sound and the scope is appropriately contained.


Strengths

  • Clean dead-code removal: client.py, LLM_CLIENT_* settings, and the typing-coverage tests all go together with no orphaned references.
  • Correct pitfall Bump coverage from 6.2 to 6.5.0 #14 compliance: agenerate_text passes instructions= and never system_prompt=; well-commented in the docstring.
  • Resolution chain is pinned: Seven test cases in test_llms_completions.py cover every priority level (explicit > corpus > settings default > hard fallback), including temperature=None omission and None output guard.
  • Serializer normalisation: validate_preferred_llm in serializers.py (lines 74–88) correctly folds ""None at the persistence boundary, keeping the DB column semantically clean.
  • Security: settingsSchema/currentValue (credential values) are only returned for superusers in to_graphql_type — non-superusers calling GET_LLM_PROVIDERS get only non-secret fields, exactly as the PR claims.
  • LlmModelPicker design: Fully presentational, uses useId() for collision-safe label associations, handles disabled state visually and behaviourally, and the inherited-default hint makes "leave empty = inherit" explicit.

Issues & Suggestions

Medium — design asymmetry worth documenting

Non-superusers see all LLM providers, while other pipeline component types are filtered.

In pipeline_queries.py, components_data (parsers, embedders, etc.) is filtered to configured-only for non-superusers via filter_configured, but llm_providers_data bypasses that filter entirely (lines ~82–155). This is probably the right call for the picker UX — you need to see all registered providers to pick one — but the resolver has no comment explaining the asymmetry. A reader will notice that LLM providers escape the if not user.is_superuser branch and wonder if it is a bug.

Suggested addition near line 204 in pipeline_queries.py:

# LLM providers are intentionally NOT filtered by configured-components for
# non-superusers: a corpus editor must be able to see all registered providers
# to select one for Corpus.preferred_llm.  No credential fields are exposed
# (settings_schema / current_value are superuser-only — see to_graphql_type).
llm_providers=[
    to_graphql_type(d, "llm_provider") for d in llm_providers_data
],

Medium — test patch-location comment is slightly misleading

In test_llms_completions.py (lines 1369–1373):

# NOTE: ``agenerate_text`` imports its collaborators … lazily inside the
# function body, so these tests patch them at their *source* modules.
# If any of those imports is ever hoisted to module level, the patches must
# move to ``opencontractserver.llms.completions.<name>``

The advice is the reverse of correct. Patching at the source module is already right for both lazy and eager imports when the consumer does from module import name — hoisting the import would make the test MORE brittle (the local name in completions.py would be bound at import time and the source-module patch would no longer intercept it). The note should say "if imports are hoisted, patches must stay at the source module AND completions.py's local name must be patched too."

Low — aria-pressed missing on chip buttons

Selected chips are distinguished visually (background + border colour), but there is no aria-pressed attribute. Screen-reader users cannot tell which model is currently selected. Adding aria-pressed={selected} to each chip button costs one attribute.

Low — fireEvent.click on disabled chips

test_does_not_fire_onChange_when_a_disabled_chip_is_clicked uses fireEvent.click, which in JSDOM does respect disabled on native <button> elements (so the test should pass reliably). However, userEvent.click from @testing-library/user-event more faithfully models real browser behaviour and would make the intent clearer. Not a blocking issue, but worth a follow-up.

Low — FormHelperText vs raw <p>

LlmModelPicker renders helper text and the inherited-default hint as raw <p> elements. SystemSettings.tsx previously used the <FormHelperText> design-system component. If FormHelperText ever gets a style update, LlmModelPicker's text won't automatically follow. Minor consistency issue; could be addressed in a follow-up.

Nit — as any cast proximity

In Corpuses.tsx line 985, slug is accessed via (opened_corpus as any).slug. The new preferredLlm access on line 984 is correctly typed (it's in the GraphQL result type), but the adjacent as any cast is a latent type-safety hazard. Consider adding a comment or resolving the slug type separately to prevent future contributors from widening the as any to cover the whole object.


Test Coverage

Good for the scope of this PR:

  • 7 backend unit tests for agenerate_text (resolution chain exhaustive)
  • 4 consumer tests updated + 2 new (corpus-preferred threading, empty-title fallback)
  • 5 frontend unit tests for LlmModelPicker
  • tsc --noEmit + existing routing tests (13/13) pass

No component test (Playwright CT) for the new corpus settings card, but that is consistent with how other CorpusSettings cards are tested and is acceptable here.


Summary

Approval recommended with the aria-pressed accessibility fix and the pipeline_queries.py comment added before merge. The patch-location comment and FormHelperText consistency items can be addressed in follow-up PRs without blocking this one.

…metry

- LlmModelPicker: add aria-pressed={selected} to model chips so screen-reader
  users can tell which model is selected.
- pipeline_queries.py: comment why LLM providers are intentionally not run
  through filter_configured for non-superusers (corpus editors must see all
  providers to pick Corpus.preferred_llm), and that credentials still never
  leak (settings_schema is superuser-only) — a path this PR newly makes
  reachable by corpus editors.

Declined: the suggestion that the test patch-location comment is 'reverse of
correct' is itself mistaken — with 'from M import f' inside the function body,
patching M.f works; if hoisted to module level, completions.py binds its local
name at import time, so patches must move to
opencontractserver.llms.completions.<name>, exactly as the comment says.
fireEvent (codebase convention), FormHelperText (admin-local styled component,
not a design-system export), and the pre-existing slug 'as any' cast in
Corpuses.tsx are out of scope / unchanged.

JSv4 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Round-8 items handled in cbac32c:

Addressed

  • aria-pressed added to the model chips so screen-reader users can tell which model is selected.
  • Provider-filter asymmetry — added the suggested comment in pipeline_queries.py explaining that LLM providers intentionally bypass filter_configured for non-superusers (a corpus editor must see all providers to pick one), with credentials still superuser-only.

Declined — the "patch-location comment is reverse of correct" item is itself mistaken. With from M import f inside the function body (the current code), each call re-reads M.f, so patching the source module M.f intercepts correctly — which is what the tests do. If the import were hoisted to module level, completions.py would bind its local f at import time and a later M.f patch would no longer intercept it — so patches would need to move to opencontractserver.llms.completions.f, exactly as the comment states. Patching the source module post-hoist would not work, so the suggested "stay at source module AND patch the local name too" is incorrect. Leaving the comment as written.

Other declines (consistent with prior rounds): fireEventuserEvent (matches the repo's existing test convention); FormHelperText (not a @os-legal/ui export — it's an admin-local styled component, so coupling the shared picker to it is a worse dependency than the themed raw element); the (opened_corpus as any).slug cast is pre-existing code this PR doesn't touch (the adjacent preferredLlm access is correctly typed).


Generated by Claude Code

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review — PR #2035: LLM Singleton Registry + Per-Corpus Admin GUI

Overview

This PR closes two real gaps: conversation-title generation was the last runtime LLM path that bypassed the registry (hardcoding gpt-4o-mini via SimpleLLMClient), and Corpus.preferred_llm had no frontend surface despite being fully wired in the backend. The approach is clean — a thin agenerate_text helper reuses the existing resolve_model_spec chain instead of reimplementing it, and LlmModelPicker removes duplicated chip-rendering logic from SystemSettings. Both changes are well-motivated and well-scoped.


What I verified

  • uuid is already imported at the top of unified_agent_conversation.py (line 31) — the f"Conversation {uuid.uuid4()}" fallback is safe.
  • CorpusType is Omit<RawCorpusType, "myPermissions">, so adding preferredLlm to RawCorpusType automatically surfaces it on CorpusType everywhere — no drift risk.
  • validate_preferred_llm in config/graphql/serializers.py normalises ""None correctly.
  • Both pipelineComponents and pipelineSettings resolvers are decorated with @login_required (pipeline_queries.py lines 43, 258) — the non-superuser visibility claim in the PR is accurate.
  • The lazy-import patch strategy in the tests is correct: because from opencontractserver.llms.completions import agenerate_text runs inside the function body each call, patching the source module attribute intercepts it reliably.

Issues

1. Magic number: temperature=0.3 in the consumer (minor, CLAUDE.md violation)

config/websocket/consumers/unified_agent_conversation.py:

title = await agenerate_text(
    prompt,
    ...
    temperature=0.3,   # ← hardcoded
)

CLAUDE.md §4 says "No magic numbers — use constants files." 0.3 is meaningful (low-temperature for deterministic titles) but should be a named constant, e.g. TITLE_GENERATION_TEMPERATURE = 0.3 in opencontractserver/constants/.

2. Redundant @override_settings in test_hard_fallback_when_everything_unset (minor)

@override_settings(DEFAULT_LLM="", OPENAI_MODEL="")
async def test_hard_fallback_when_everything_unset(self) -> None:
    with patch(
        "opencontractserver.pipeline.utils.get_default_llm_spec",
        return_value="",   # ← this already controls the output
    ), ...

Since get_default_llm_spec is mocked to return "", the @override_settings decorator is redundant — Django settings are never consulted. Not a correctness problem, but it adds noise and could mislead future readers into thinking the test covers the Django-settings path.

3. agenerate_text — per-call agent construction (noted, but document more explicitly)

The docstring says "intended for infrequent one-shot calls." Worth also adding a module-level __all__ and a note in the docstring that callers operating in a hot path (e.g. batch title back-fill) should build the agent once externally. Right now the discoverability of that constraint relies entirely on reading the docstring carefully.


Positive observations

  • Clean removal of dead code. SimpleLLMClient was OpenAI-only, hardcoded gpt-4o-mini, ignored the registry, and had no async-native implementation (it used run_in_executor). Deleting it and its settings is unambiguously the right move.
  • agenerate_text design is correct. Walking explicit → corpus_preferred → get_default_llm_spec → hard default mirrors the agent factory exactly. The temperature=None guard (omit entirely rather than send 0) is an important correctness detail for reasoning-model backends.
  • Accessibility in LlmModelPicker. useId() for label/input association, aria-pressed on chip buttons, disabled + cursor: not-allowed + opacity: 0.6 for visual feedback — all correct.
  • Apollo cache update on LLM save (cache.modify with cache.identify) is the correct pattern and avoids a full refetch.
  • Separate mutation instance for LLM vs. slug/categories keeps the two save flows independent — the right isolation boundary.
  • filter(p.enabled !== false) client-side correctly treats null/undefined as "not explicitly disabled", matching the backend's intent.
  • Test coverage is solid. The 7 backend tests pin the full priority chain (explicit > corpus > settings > hard default) and edge cases (temperature=None, output=None). The 5 frontend unit tests cover the key interaction cases.

Summary

One confirmed CLAUDE.md violation (magic number), one minor test hygiene issue. Both are small fixes. The core design is correct, the security analysis in the PR description is accurate, and the dead-code removal is a net improvement. Recommend addressing the magic number before merge; the other item is optional.

- Extract the title-generation temperature to a named constant
  TITLE_GENERATION_TEMPERATURE in opencontractserver/constants/llm.py (next to
  EXTRACT_DEFAULT_TEMPERATURE) and use it in the consumer, fixing the magic-number
  I introduced in the round-7 title-determinism change (CLAUDE.md §4).
- completions.py: add __all__ = ['agenerate_text'] for discoverability.
- test_llms_completions: comment that @override_settings in the hard-fallback test
  is load-bearing (resolve_model_spec reads settings.DEFAULT_LLM/OPENAI_MODEL
  directly in its fallback), declining the recurring 'redundant' suggestion whose
  premise (Django settings never consulted) is incorrect.
Comment thread opencontractserver/llms/completions.py Fixed
opencontractserver/llms/completions.py defined a module-level logger and
imported logging but never logged anything. Drop both per the CodeQL
unused-global alert and the no-dead-code rule.
@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review — PR #2035: Consume LLM Singleton registry everywhere + add per-corpus LLM admin GUI

Overview

This PR closes two real gaps cleanly: (1) conversation-title generation was the sole runtime path bypassing the LLM registry (hardcoded gpt-4o-mini), and (2) Corpus.preferred_llm was a ghost feature with no UI. The approach — removing SimpleLLMClient, adding agenerate_text, extracting LlmModelPicker — is well-reasoned. The resolution chain is correctly implemented and thoroughly tested.


Correctness

completions.pyagenerate_text correctly uses instructions= not system_prompt= (CLAUDE.md pitfall #14), wraps the ORM call in sync_to_async, and handles None model output by collapsing to "" rather than the truthy literal string "None". The temperature-omission path for reasoning models is a good defensive touch.

unified_agent_conversation.pygetattr(self, "corpus", None) / getattr(corpus, "preferred_llm", None) is the right defensive pattern for consumers that may be in a document-only session.

CorpusSettings.tsx cache update — the update callback correctly guards on data?.updateCorpus?.ok before writing to cache, so a server-side failure does not corrupt the local Apollo store.


Issues

Medium — Disabled-chip test uses fireEvent.click, not userEvent.click

frontend/src/components/common/__tests__/LlmModelPicker.test.tsx, the disabled-chip test:

fireEvent.click dispatches a raw DOM event and can bypass browser-level disabled-button suppression depending on the React / jsdom version in play. userEvent.click from @testing-library/user-event faithfully honours the disabled attribute the way a real browser does. Consider swapping to userEvent.click here to prevent a future testing-library upgrade from silently making this test pass for the wrong reason.

Medium — uuid referenced without a visible import in the fallback path

config/websocket/consumers/unified_agent_conversation.py:

return title or f"Conversation {uuid.uuid4()}"

The diff does not show a import uuid statement. If uuid is only used elsewhere in the file and that import is removed in a future tidy-up, this fallback path would throw a NameError at exactly the moment the error handler is most needed. Worth a quick grep to confirm the import is already present at module scope, and if not, add it explicitly.

Low — Unnecessary lazy import of a pure constant

config/websocket/consumers/unified_agent_conversation.py, inside the try block:

from opencontractserver.constants.llm import (
    TITLE_GENERATION_TEMPERATURE,
)

opencontractserver.constants.llm is a leaf module with no Django ORM or circular-dependency risk. Moving this import to the top of the file removes it from the exception surface area — a failure to import a constant is a startup error, not a title-generation error.

Low — No frontend validation of the model spec format

The preferredLlm field is a free-text input. An admin can type "invalid" and the frontend will happily send it to the mutation. Adding a lightweight client-side format check — e.g., /^[a-z][a-z0-9-]*:[^:]+$/ — or helper text noting that specs must be in provider:model form would improve the error-discovery loop for admins. Not a blocker if the backend produces a clear error message on an invalid spec.


Security

The decision to show all registered providers to non-superusers (rather than only configured ones) is correct for the use case and the security note in pipeline_queries.py is appropriately documented. The critical invariant — that settings_schema/has_value/current_value (which could expose credential status) is gated to superusers inside to_graphql_type — should be verified to remain true if to_graphql_type is ever refactored.

The two new queries (GET_LLM_PROVIDERS, GET_SYSTEM_DEFAULT_LLM) are @login_required and return only public-safe fields. No IDOR exposure: preferredLlm is stored on the corpus object, which is already permissioned through the standard guardian model.


Code Quality

  • DRY win: ~90 lines of duplicated chip-rendering in SystemSettings.tsx collapsed into LlmModelPicker. The two call sites now share behaviour automatically.
  • No dead code: SimpleLLMClient is fully removed — file, tests, settings, and docs. Correct call.
  • Test coverage: The AgenerateTextTests suite pins every branch of the resolution chain (explicit > corpus > settings default > hard fallback) plus the temperature=None and max_tokens forwarding paths. The consumer title tests now cover the corpus-preferred-LLM threading and the empty-response fallback, which the old tests missed entirely.
  • Accessibility: aria-pressed on the model chips and the useId-based label/input association in LlmModelPicker are correct.
  • Constants: TITLE_GENERATION_TEMPERATURE = 0.3 in opencontractserver/constants/llm.py is the right place; the comment explaining the value is useful.

Minor Nits

  • LlmProviderOption in types/graphql-api.ts is a structural subset of PipelineComponentType. Worth a one-line comment noting it intentionally excludes settingsSchema/currentValue so the omission is visible at the type level, not just in query comments.
  • The two separate useMutation instances in CorpusSettings.tsx (one for slug/categories, one for LLM) are the right call — independent save flows with independent toasts and rollbacks.

Summary

The design is sound, the resolution chain is correctly implemented, and the test coverage is solid. The two medium-priority issues (disabled-chip test fidelity, unverified uuid import) are worth addressing before merge; the rest are low-priority or cosmetic.

JSv4 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

No code change this round — the items are stale, declined-before, or would break valid input:

  • uuid import — already present at unified_agent_conversation.py:31 (verified earlier; a prior review confirmed the same). No NameError risk.
  • fireEventuserEventfireEvent is the existing convention across this repo's Vitest suites; keeping it consistent.
  • Client-side spec-format regex — declining specifically because the suggested pattern /^[a-z][a-z0-9-]*:[^:]+$/ requires a colon and would reject valid bare specs like gpt-4o. The backend deliberately accepts bare model names (parse_model_spec maps them to the openai provider), so this front-end check would block input the server considers valid. The helper text already documents the provider:model form and the backend returns a clear error for a truly unregistered provider.
  • Lazy constant import / extra type comment — harmless; leaving as-is.

To set expectations: both requirements are delivered and CI/static-analysis (CodeQL) is clean. I'm no longer pushing commits for cosmetic claude-review nits, since each push re-triggers this review and the loop has reached the point of re-raising already-resolved items. I'll continue to act on genuine CI failures and static-analysis findings through to merge.


Generated by Claude Code

@claude

claude Bot commented Jun 20, 2026

Copy link
Copy Markdown

Code Review

This is a well-scoped, well-executed PR. The two gaps it closes (registry bypass in title generation, no GUI for per-corpus LLM) are real and the approach is architecturally sound. Multiple review rounds are visible in the commit history and the code shows it — the design decisions are well-documented and the implementation is tight. Notes below are mostly minor.


Overview

  • Gap 1 (runtime bypass): SimpleLLMClient was the only LLM path bypassing the registry. The new agenerate_text helper correctly walks per-call → corpus_preferred → PipelineSettings.default_llm → Django settings and delegates to the same model_factory / agent factory as the rest of the system. Deletion of client.py + its settings is clean (no-dead-code rule followed).
  • Gap 2 (admin GUI): LlmModelPicker is presentational-only, the caller owns state and data — good SRP. De-duplicating SystemSettings' inline chip logic into the shared component is a genuine DRY win.

Positive Highlights

  • Resolution chain fidelity: completions.py uses sync_to_async(get_default_llm_spec)() to avoid sync ORM in an async context — the correct pattern per CLAUDE.md pitfall Bump django-celery-beat from 2.2.1 to 2.4.0 #13.
  • Pitfall Bump coverage from 6.2 to 6.5.0 #14 honoured: instructions= not system_prompt= in agenerate_text — and the docstring calls this out explicitly. Good.
  • Security-conscious provider projection: GET_LLM_PROVIDERS and GET_SYSTEM_DEFAULT_LLM deliberately omit settingsSchema/currentValue. The asymmetric filter_configured bypass (providers visible to non-superusers, config values not) is documented inline in pipeline_queries.py — this comment is essential and it's there.
  • DB consistency: validate_preferred_llm in the serializer normalises ""None so the column stays unambiguously nullable. The comment explaining WHY (prevents filter(preferred_llm__isnull=True) missing empty-string rows) is exactly the kind of non-obvious invariant worth documenting.
  • Constants file: TITLE_GENERATION_TEMPERATURE lives in opencontractserver/constants/llm.py — no magic numbers.
  • Accessibility: aria-pressed on chip buttons, useId() for stable label/input association — both correct.
  • Test coverage: 7 backend tests pin the full resolution chain (corpus beats settings, explicit beats corpus, hard fallback, temperature omission, None output, max_tokens forwarding). 5 frontend unit tests cover the key picker states.

Issues & Suggestions

1. PR body credits Claude Code (violates CLAUDE.md baseline rule)

The PR description ends with:

🤖 Generated with Claude Code
Generated by Claude Code

CLAUDE.md is explicit: "Never credit Claude or Claude Code in commit messages, PR messages, comments, or any other artifacts." Please edit the PR body to remove both lines before merge.

2. agenerate_text constructs a new agent per call — document the cost at the call site too

The docstring in completions.py warns callers this is for infrequent use, but the consumer (_generate_conversation_title) doesn't carry the same note. If someone copies that import pattern for a higher-frequency path, the per-call agent construction overhead will bite them silently. A one-line comment at the call site (e.g. # one-shot: agent construction overhead is acceptable here) would make the intentionality explicit.

3. Test patching strategy comment is load-bearing — fragile if imports are ever hoisted

# NOTE: ``agenerate_text`` imports its collaborators … lazily inside the function
# body, so these tests patch them at their *source* modules.

This is correct and well-documented. Just worth confirming: if resolve_model_spec, abuild_agent_model, or make_pydantic_ai_agent ever get hoisted to module-level imports in completions.py, every test in test_llms_completions.py will silently stop intercepting the mocks and pass vacuously. Consider adding an assertion in one test that a sentinel value actually flowed through (e.g., mock_build.assert_awaited_once_with(...)) to guard against this. Several tests already do this — the ones that don't (e.g., test_none_output_returns_empty_string) are the ones at risk.

4. CorpusSettings: preferredLlm not included in the canUpdate guard for the provider query skip

const { data: llmProvidersData } = useQuery<LlmProvidersQueryResult>(
  GET_LLM_PROVIDERS,
  { skip: !canUpdate }
);
const { data: systemDefaultLlmData } = useQuery<SystemDefaultLlmQueryResult>(
  GET_SYSTEM_DEFAULT_LLM  // <-- no skip
);

GET_SYSTEM_DEFAULT_LLM fires for all authenticated users including read-only viewers. This is intentional (the inherited-default hint should display even for viewers), so the asymmetry is correct — but it's not explained. A short comment like // fire for all viewers so the inherited-default hint is populated would prevent a future reader from "fixing" the asymmetry.

5. LlmModelPicker: "Registered Providers & Suggested Models:" label is a <div>, not a heading

The provider section header is rendered as a stylised <div>. For screen readers, this reads as plain text with no semantic structure. Given that the chip list below it is interactive, a <fieldset>/<legend> pairing (or at minimum a role="group" with aria-label) would make the association machine-readable. Minor, but the component already invests in aria-pressed — this would complete the a11y story.

6. test_hard_fallback_when_everything_unset pins "openai:gpt-4o" as the hard default

mock_build.assert_awaited_once_with("openai:gpt-4o")

This asserts a concrete value from _HARD_DEFAULT_MODEL inside resolve_model_spec. If that constant ever changes, this test will fail with a confusing mismatch. Consider importing _HARD_DEFAULT_MODEL (or a public alias) and asserting against it, or at minimum add a comment: # mirrors _HARD_DEFAULT_MODEL in llm_registry.py — update if that constant changes.


No Issues Found In

  • Permission model for the two new queries (correctly @login_required, no credential fields)
  • Cache update after updateCorpusLlm mutation (correct cache.modify pattern)
  • llmDraft.trim() === originalLlm.trim() disabling the Save button — prevents spurious mutations
  • enabled !== false filter preserving null/undefined as enabled — correct semantics and commented
  • ""null normalisation round-trip (frontend sends trimmed value, serializer enforces NULL)
  • Removal of LLM_CLIENT_* settings from base.py — settings.py is clean

Summary

Approve pending: (1) remove the Claude Code attribution from the PR body, (2) optionally address the minor documentation and a11y nits above. The core implementation is correct and production-ready.

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.00000% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
frontend/src/components/common/LlmModelPicker.tsx 73.52% 2 Missing and 7 partials ⚠️
...rc/components/corpuses/CorpusLanguageModelCard.tsx 94.73% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…ranches

The new frontend code (corpus LLM card handlers, LlmModelPicker branches) was
uncovered, dropping patch coverage to 72.52% (target 82.82%; frontend 61.19% vs
71.01%).

- Extract the per-corpus Language Model card out of the 600-line CorpusSettings
  god-component into a self-contained, testable CorpusLanguageModelCard (better
  SRP). CorpusSettings now just renders it with corpus.id / preferredLlm /
  canUpdate.
- Add CorpusLanguageModelCard tests (MockedProvider): save-success, mutation
  error rollback, server ok:false rollback, clear-to-inherit, and read-only
  (no Save) — bringing the card to ~96% coverage.
- Add LlmModelPicker tests for the label / helper-text / API-key-badge and
  no-suggested-models branches — ~93% coverage.

tsc + prettier clean; 12 frontend unit tests pass.
…X polish

Follow-up cleanup from an audit + smoke test of the per-corpus LLM work.

- Validate Corpus.preferred_llm in CorpusSerializer.validate_preferred_llm
  (via validate_model_spec) BEFORE save(), so an unregistered/malformed spec
  typed into the free-text picker returns a clean ok=false field message
  instead of the generic "Mutation failed due to an internal error" (the
  model-layer ValidationError raised from save() was caught as an internal
  error). Normalisation stays in Corpus.save() as the single source of truth.
- Extract DEFAULT_COMPLETION_TEMPERATURE (0.7) to constants/llm.py and use it
  for the agenerate_text default temperature (no magic number).
- LlmModelPicker: guard the chip onClick with an explicit `disabled` check so
  the no-op is the component's own contract; only render the "inherit" hint
  when the caller opts in (inheritedSpec !== undefined), so the install-wide
  System Settings screen no longer shows a spurious inherit hint.
- Tests: add a clear->save frontend test pinning that preferredLlm:"" reaches
  the backend; add serializer-validation tests (clean rejection of unknown
  provider / malformed spec); add a backend test pinning non-superusers see the
  full llmProviders list but a NULL settingsSchema (no credential leak); assert
  the agenerate_text default temperature via the named constant.
- Changelog: scope the "only bypass" wording (the *_claude analyzer tasks and
  the structured-extraction default deliberately pin their own models and are
  out of scope) and add a fragment for the validation fix.
Comment thread config/graphql/serializers.py Fixed

JSv4 commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Re: the CodeQL "information exposure through an exception" alert on config/graphql/serializers.py:109

This is a pre-existing alert, not introduced by this PR. git diff origin/main...HEAD -- config/graphql/serializers.py is empty — this PR makes zero changes to serializers.py (the file is identical to main; its last commit predates this branch). Line 109 is the CorpusSerializer license validation, unrelated to the LLM-config work here.

I've left it untouched deliberately: fixing pre-existing exception-handling in unrelated serializer code would pull it into an LLM-registry changeset and widen scope. It should be triaged separately against main (where it also reproduces).

For completeness, none of the files this PR does change contain that pattern — verified no str(exc)/traceback/.args exception-to-user flow across the changed Python (completions.py, the consumer, pipeline_queries.py, etc.). The consumer's title-generation except logs the error and returns a generated fallback string; it never surfaces exception detail to the client.


Generated by Claude Code

…registry-config-4rei39

# Conflicts:
#	config/settings/base.py
@claude

claude Bot commented Jun 21, 2026

Copy link
Copy Markdown

Code Review — PR #2035: Consume LLM Singleton registry everywhere + add per-corpus LLM admin GUI

Overview

This PR closes two genuine gaps in the LLM configuration system: conversation-title generation was hardcoding gpt-4o-mini via a registry-bypassing SimpleLLMClient, and Corpus.preferred_llm existed in the backend but had no frontend surface. Both problems are real and the fixes are directionally correct. Below is a breakdown by area.


Backend

opencontractserver/llms/completions.pyagenerate_text

Clean, well-specified helper. A few notes:

  • Lazy imports inside the function body are used to avoid circular imports. This is fine, but it means the test must patch at the source module (documented in the test file). If any import is ever hoisted to module level the patches silently stop working — the test comment explains this, which is good.
  • Fresh Agent per call is acknowledged in the docstring as an intentional trade-off for infrequent calls. Worth adding a guard or comment about call frequency if this helper is ever considered for higher-traffic paths.
  • sync_to_async(get_default_llm_spec)() is correct — get_default_llm_spec does ORM work and agenerate_text is async.
  • The str(output).strip() if output is not None else "" guard is thoughtful; some providers can return None on refusals.

config/graphql/serializers.pyvalidate_preferred_llm

Good layering: the serializer catches LLMProviderNotRegistered/ValueError before save(), so the mutation returns a structured field error rather than a generic "internal error." The inline import:

from opencontractserver.llms.llm_registry import (
    LLMProviderNotRegistered,
    validate_model_spec,
)

…is placed inside the validator method rather than at module level. This is the standard Django pattern to break circular import chains, but it would benefit from a one-line comment to that effect so it doesn't look like an oversight to a future reviewer.

config/graphql/pipeline_queries.py

The comment explaining why filter_configured is skipped for non-superusers on the llmProviders list is exactly the right level of documentation — it captures the "why" of a non-obvious decision.

config/settings/base.pyLLM_CLIENT_* removal ✅

Removing LLM_CLIENT_PROVIDER, LLM_CLIENT_MODEL, LLM_CLIENT_TEMPERATURE, LLM_CLIENT_MAX_TOKENS is correct; they only existed to configure the now-deleted SimpleLLMClient. Deployments that set these env vars will see them ignored silently — a migration note in the changelog fragment would be helpful for operators (currently the fragment mentions the settings removal but doesn't explicitly call out the env-var deprecation).

config/websocket/consumers/unified_agent_conversation.py

uuid is already imported at line 31, so the f"Conversation {uuid.uuid4()}" fallback is safe. The getattr(self, "corpus", None) defensive check is correct since corpus may be unset in document-only sessions.


Frontend

LlmModelPicker.tsx

Solid presentational component:

  • useId() fallback prevents label/input collision when two pickers share a page.
  • aria-pressed on the chip buttons is correct.
  • The three-way inheritedSpec !== undefined / === null / === string distinction is clever and necessary (the System Settings screen omits the prop entirely; corpus settings passes null when no default exists). The inline comment explains it, but a prop-name alternative like showInheritHint?: boolean might be more explicit for future callers.

Inline styles: The component uses inline style={{...}} for all layout. The rest of the codebase is migrating to the OS-Legal style system. Not a blocker, but worth aligning on: is this component expected to stay style-system-free, or should it be slated for a follow-up pass?

CorpusLanguageModelCard.tsx

Good separation of concerns and proper Apollo integration:

  • Optimistic cache write via cache.modify is correct.
  • Rollback in both onError and ok: false paths is correct.
  • The enabled !== false filter (null/undefined = enabled) is the right idiom here and is documented in the comment, though a true/undefined binary would be cleaner at the type level.
  • GET_LLM_PROVIDERS is correctly skipped (skip: !canUpdate) for read-only viewers.

Minor: when both queries (GET_LLM_PROVIDERS, GET_SYSTEM_DEFAULT_LLM) are loading, the card renders immediately with an empty provider list — no loading spinner or placeholder. For a secondary settings panel this is acceptable, but the UX could flicker as providers load. Low priority.

GraphQL types / queries / mutations ✅

The preferredLlm threading through RESOLVE_CORPUS_BY_SLUGS_FULL, UPDATE_CORPUS, RawCorpusType, ResolveCorpusFullQuery, and Corpuses.tsx is complete. LlmProviderOption as a shared structural subset of PipelineComponentType is the right approach — it prevents the picker and the query result type from drifting apart.


Tests ✅

Coverage is strong across all new code:

  • test_llms_completions.py: pins the priority chain (explicit > corpus > settings > hard default), temperature omission, None output, and max_tokens forwarding.
  • test_llm_runtime_config.py: adds three serializer-level cases (unknown provider, malformed spec, valid spec round-trip).
  • test_pipeline_component_queries.py: pins both halves of the credential boundary — non-superusers see all providers but get null settingsSchema; superusers see populated settingsSchema.
  • CorpusLanguageModelCard.test.tsx: covers disabled-provider filtering, rollback on network error, rollback on ok: false, clear-then-save with "" (not undefined), and read-only mode.
  • Consumer title tests correctly mock agenerate_text at opencontractserver.llms.completions.agenerate_text (not the deleted client.create_client).

One gap: test_title_generation_uses_corpus_preferred_llm verifies corpus_preferred is passed, but not that it actually wins over the settings default. That end-to-end chain is covered by test_corpus_preferred_wins_over_settings_default in test_llms_completions.py, so the overall contract is pinned — just noting the division of responsibility.


Security ✅

  • Non-superusers can read provider names/models without seeing settingsSchema/currentValue/hasValue — correctly enforced in pipeline_queries.py and verified in test_pipeline_component_queries.py.
  • validate_preferred_llm prevents unregistered or malformed specs from ever reaching the DB.

Summary

This is a clean, well-tested PR that closes real gaps. The main actionable items before merge:

  1. Changelog: Mention the LLM_CLIENT_* env-var deprecation explicitly for operators.
  2. serializers.py inline import: Add a # avoid circular import comment (one line).
  3. Inline styles in LlmModelPicker: No-op for merge, but worth a follow-up to align with the OS-Legal style system.

No blocking issues found.

Call out for operators that the removed LLM_CLIENT_* env vars are now silently
ignored and should be dropped from .env/ConfigMap, pointing them at
PipelineSettings.default_llm / DEFAULT_LLM as the replacement.

JSv4 commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Correction to my earlier comment on the CodeQL serializers.py:109 alert.

My previous reply said "this PR makes zero changes to serializers.py." That was based on my local branch state at the time, which did not yet include a concurrently-pushed commit on this branch — c1d7712 ("Address PR #2035 audit: serializer-level LLM spec validation"). After fetching/rebasing, the PR does modify serializers.py: validate_preferred_llm now validates the spec and, on failure, raises

except (LLMProviderNotRegistered, ValueError) as exc:
    raise serializers.ValidationError(str(exc)) from exc   # line 109

So the alert is PR code — apologies for the inaccurate earlier characterization.

On the substance: the caught exceptions are LLMProviderNotRegistered / ValueError from validate_model_spec/parse_model_spec, whose messages are curated, user-facing validation strings (e.g. "Provider 'foo' (from spec 'foo:bar') is not registered…", "Malformed model spec: …") — not tracebacks or internal/sensitive data. So functionally this is close to a false positive, but it does match the ValidationError(str(exc)) pattern CodeQL flags. If we want it green rather than dismissed-with-justification, the minimal hardening is to raise a controlled field message instead of str(exc) (e.g. a fixed "is not a valid/registered model spec" string, optionally echoing only the user-supplied spec, not the exception).

Note this code came from a separate session also developing this branch, so I'm flagging rather than editing it to avoid colliding with that work — see my note to the PR owner.


Generated by Claude Code

@claude

claude Bot commented Jun 21, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes two well-identified gaps in the LLM Singleton registry: (1) the last conversational LLM bypass (title generation hardcoding gpt-4o-mini), and (2) the missing frontend surface for Corpus.preferred_llm. The approach is sound—agenerate_text correctly mirrors the agent factory's resolution chain, the LlmModelPicker is a clean extraction, and the serializer-layer validation closes a real user-facing rough edge.


Correctness & Critical Issues

completions.py — lazy-import patch-target fragility (low risk, high maintainability cost)

The test file comments this correctly:

# NOTE: ``agenerate_text`` imports its collaborators lazily inside the function
# body, so these tests patch them at their *source* modules. If any of those
# imports is ever hoisted to module level, the patches must move to
# ``opencontractserver.llms.completions.<name>`` or they'll stop intercepting.

The lazy imports are necessary to avoid circular dependencies—that's fine. But the warning means a future refactor could silently break all 7 tests without any static signal. Consider adding a brief # noqa: PLC0415 (circular-import guard) style tag, or a module-level comment explaining why the imports are deferred, so the constraint is hard to miss.

CorpusLanguageModelCard — cache update captures closure-time draft, not submission-time value

update: (cache, { data }) => {
  if (data?.updateCorpus?.ok && corpusId) {
    ...
    cache.modify({
      fields: {
        preferredLlm: () => llmDraft.trim() || null,  // ← reads current state at completion time
      },
    });
  }
},

If the user edits the input between clicking Save and the mutation completing, the cache will be written with the wrong (never-submitted) value. The input is disabled during updatingLlm, so this can't actually happen today, but it's a semantic bug waiting for a future regression when the disabled guard is relaxed. Capture the submitted value before the mutation call:

const handleLlmSave = () => {
  const submittedValue = llmDraft.trim();
  updateCorpusLlm({
    variables: { id: corpusId, preferredLlm: submittedValue },
    // then use submittedValue (not llmDraft) in the update callback
  });
};

Security

Non-superuser LLM provider exposure is intentional and well-documented (pipeline_queries.py comment + test_llm_providers_visible_to_non_superuser_without_secret_leak). The two new queries (GET_LLM_PROVIDERS, GET_SYSTEM_DEFAULT_LLM) correctly omit settingsSchema/currentValue, so no credentials are reachable. The settingsSchema gate is also pinned by the new integration test. ✓

Serializer-layer validation is the right place for the validate_model_spec call. The ValidationError from Corpus.save() being caught as a generic mutation error was a real UX and security-transparency issue. ✓


Code Quality

agenerate_text correctly avoids CLAUDE.md pitfall #14 (instructions= not system_prompt=). ✓

temperature=None semantics are correctly handled—None omits the key entirely so reasoning-model backends don't receive an unsupported parameter. The test_temperature_none_is_omitted test pins this contract. ✓

TITLE_GENERATION_TEMPERATURE constant is the right call—the comment explains why 0.3 (deterministic) vs. 0.7 (general default) matters here. ✓

Asymmetry in pipeline_queries.py: Other component types (parsers, embedders, rerankers) pass through filter_configured for non-superusers, but llmProviders deliberately does not. This is explained in the comment added by the PR. It's the correct decision for the picker use case, but worth tracking—if the list grows to hundreds of providers in a large install, returning the full unfiltered list to every logged-in user could become noisy. Not a blocker.

LlmModelPicker accessibility: aria-pressed on chip buttons, useId() for stable label→input association, disabled guard both on the native attribute and in the click handler. Well done. ✓


Tests

Backend (test_llms_completions.py): Covers the full resolution-chain priority ordering (explicit > corpus > settings default > hard fallback), temperature=None omission, None output collapse, and max_tokens forwarding. Strong.

Frontend CorpusLanguageModelCard tests cover: enabled/disabled provider filtering, mutation success/error/ok-false rollback, clear-then-save with empty string, read-only mode. One missing case:

  • canUpdate=false + initialPreferredLlm=null: the inherited hint should still render (disabled picker, no Save button, but the "inheriting openai:gpt-4o" hint is visible). The current test for canUpdate=false uses a non-null initial value, so the hint is suppressed by the non-empty value check—the inherited-hint path is not exercised in read-only mode. Not a blocker, but it's a gap.

Consumer title tests are correctly updated to mock agenerate_text at the source module. The test_title_generation_uses_corpus_preferred_llm test pins the corpus threading contract. ✓


Minor Nits

  • CorpusLanguageModelCard re-exports default in addition to the named export—consistent with the rest of the codebase. ✓
  • key={provider.className || provider.name || providerKey} in LlmModelPicker—covers all null-field combinations without a single point of failure. ✓
  • The inline styles in LlmModelPicker are consistent with the SystemSettings.tsx code being replaced. ✓

Summary

Solid PR. The two genuine issues are:

  1. The cache update closure bug in CorpusLanguageModelCard (minor, fixable with a one-liner capture before the useMutation call).
  2. A missing read-only + null-value test case for the inherited hint.

Everything else—the registry wiring, security boundary, constant extraction, docs update, and test coverage—is well-executed.

JSv4 added 3 commits June 21, 2026 16:59
- CorpusLanguageModelCard: capture the submitted/rollback values up front in
  handleLlmSave and move onCompleted/onError/update to per-call options so the
  success path, rollback, and cache write all act on what was actually sent
  rather than the live draft state (latent divergence if the input is ever made
  editable mid-save).
- Add a unit case pinning that the inherited system-default hint still renders
  in read-only mode (canUpdate=false) when the corpus has no override.
Mounts the Corpus Settings panel through CorpusSettingsTestWrapper, asserts the
provider chips and inherited system-default hint render, and captures the
corpus--language-model-card--inherit-default documentation screenshot. Behavioural
save/rollback coverage stays in the Vitest unit test.
@claude

claude Bot commented Jun 21, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes two well-defined gaps in the LLM Singleton registry: (1) a registry-bypassing title-generation path using a hardcoded gpt-4o-mini and (2) a backend-only preferred_llm field with no frontend surface. The approach is sound — remove the dead code, introduce a thin registry-backed completion helper, and build the UI around a shared, accessible picker component. The changes are well-scoped and the test coverage is thorough.


Correctness

agenerate_text (completions.py)

  • Resolution chain is correct: per-call modelcorpus_preferredget_default_llm_spec() (ORM call on PipelineSettings) → hard default. The sync_to_async wrapper for the ORM call is right.
  • The guard str(output).strip() if output is not None else "" correctly handles a model refusal producing None so caller or fallback patterns fire as expected.
  • The new agent is constructed fresh per call. The docstring flags this as intentional and acceptable for infrequent title-generation calls, which is fine.
  • The lazy imports inside the function body avoid circular imports — correct, but if any of those imports ever gets hoisted to module level, the test patches in test_llms_completions.py need to move too (the comment inside the test already documents this).

validate_preferred_llm (serializers.py)

  • Validating before save() is the right level — the mutation catches serializer errors cleanly while the model-layer ValidationError from Corpus.save() was getting swallowed as a generic internal error.
  • The local import of LLMProviderNotRegistered / validate_model_spec inside the validator is an accepted pattern for avoiding circular imports in Django serializers. No concern here.

_generate_conversation_title (unified_agent_conversation.py)

  • corpus = getattr(self, "corpus", None) and corpus_preferred = getattr(corpus, "preferred_llm", None) are safe for document-only sessions where self.corpus is unset.
  • The fallback uses uuid.uuid4() — confirm uuid is already imported at the top of the module (not visible in the diff). If it was a pre-existing import this is fine; otherwise it needs to be added.

Apollo cache update (CorpusLanguageModelCard.tsx)

The preferredLlm: () => submittedValue || null modifier correctly maps "" to null in the cache after a clear. The snapshot-before-mutation pattern (submittedValue / rollbackValue captured before the async call) is a good defensive practice.


Security

The change to stop filtering LLM providers for non-superusers in pipeline_queries.py is intentional and well-documented. The credential-bearing surface (settingsSchema / currentValue / hasValue) is still gated to superusers via to_graphql_type. The new test test_llm_providers_visible_to_non_superuser_without_secret_leak pins both halves of that boundary — the provider list being visible and the schema being null — which is exactly the right thing to have as a regression guard.

The two new GraphQL queries (GET_LLM_PROVIDERS / GET_SYSTEM_DEFAULT_LLM) are @login_required and project no credential fields. That is appropriately scoped.


Component Design (Frontend)

LlmModelPicker.tsx

  • useId() for label-to-input association is the right call; avoids collisions when two pickers appear on the same page.
  • aria-pressed on the chip buttons is the correct ARIA pattern for toggle-style selectors.
  • The disabled guard in the onClick handler (in addition to the native disabled attribute) is unnecessary defensiveness for a native <button> element — native disabled already prevents the click event. The comment explains the reasoning, but it adds noise. Minor nit.
  • Replacing the hardcoded hex colors with OS_LEGAL_COLORS.selectedBg / OS_LEGAL_COLORS.selectedBorder is an improvement.
  • inheritedSpec !== undefined as the gate for showing the inherit hint is the right idiom — it allows the null case ("no default yet") to still show the generic hint. The comment in the JSX explains this clearly.

CorpusLanguageModelCard.tsx

The useEffect that syncs initialPreferredLlm prop changes to local draft state is standard controlled-input practice, but it means an external prop change (e.g., parent refetch) will clear an in-progress unsaved edit. That is usually acceptable for a settings card where edits are short-lived, but worth noting as a UX edge case.

The skip: !canUpdate on GET_LLM_PROVIDERS is correct — read-only viewers do not need the chip list. The system-default query still runs for the inherited-hint, which is the right call.


Tests

Coverage is solid:

  • test_llms_completions.py pins the full resolution chain including the hard fallback and the temperature=None omission case. The note about where to move patches if imports are ever hoisted is a nice touch.
  • test_llm_runtime_config.py additions cover unknown-provider, malformed-spec, and valid-spec paths through the serializer.
  • test_pipeline_component_queries.py pins the credential boundary precisely.
  • CorpusLanguageModelCard.test.tsx covers success, network error, ok: false, clear, clear-then-save (pinning the empty-string contract), read-only rendering, and the inherited-hint in read-only mode. That is thorough.
  • Consumer title tests have been updated to mock agenerate_text instead of create_client and now include corpus-preferred threading and empty-response fallback — correct.

Minor Suggestions

  1. Verify the uuid import in unified_agent_conversation.py. The diff adds uuid.uuid4() to the fallback return but does not show a new import. If it was already there, no action needed.

  2. The double disabled guard in LlmModelPicker — the in-handler if (disabled) return is redundant for a native <button disabled>. Either remove the comment-and-guard or switch to a non-button element where the guard is load-bearing. As-is it just adds lines.

  3. agenerate_text hot-path note — the docstring already calls out that callers needing throughput should reuse an agent. Consider adding a brief note in the module docstring at the top of completions.py as well; the function-level note is good but a module-level note makes it visible before anyone reads the function signature.


Summary

Solid, well-motivated PR. The SimpleLLMClient removal is clean, the new helper respects all project conventions (async-only, registry-backed, no magic models), the frontend component is accessible and properly extracted, and the security boundary around provider credentials is tested. The minor items above are nits — none block merging.

JSv4 and others added 3 commits June 21, 2026 17:34
The linter (mypy) job was red on every branch, including main: a routine
transitive numpy 2.4.x bump began shipping numpy/__init__.pyi with PEP 695
'type' aliases, and mypy.ini's stale python_version = 3.11 made mypy reject
that stub syntax ('Type statement is only supported in Python 3.12 and
greater') and abort before any project code was checked.

Docker images and CI both run 3.12.x and there is no 3.11 target anywhere,
so 3.11 was simply stale. Bumping the target to 3.12 fixes both CI mypy
invocations (the pre-commit hook on mypy==2.0.0 and the dedicated Run mypy
step on mypy==2.1.0); both now report 'Success: no issues found in 1425
source files' with zero new project type errors. Adds an in-file note and
corrects docs/typing/README.md so the target is not silently lowered again.
… through an exception'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com>
Signed-off-by: JSIV <5049984+JSv4@users.noreply.github.com>
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes two real gaps in the LLM singleton registry: (1) routing conversation-title generation through the registry instead of a hardcoded gpt-4o-mini via the now-deleted SimpleLLMClient, and (2) surfacing Corpus.preferred_llm in the frontend via a new shared LlmModelPicker component. The mypy Python-version fix and the serializer-level validation for preferred_llm are solid bonuses. Overall the PR is well-structured and the intent is clearly right, but there is one bug that will cause a test failure.


Bug -- Error message mismatch in validate_preferred_llm

config/graphql/serializers.py

The serializer raises a hardcoded message:

raise serializers.ValidationError("Invalid model specification.") from exc

But the corresponding test in test_llm_runtime_config.py asserts:

self.assertIn("not registered", str(serializer.errors["preferred_llm"]))

serializer.errors["preferred_llm"] will be ["Invalid model specification."], which does not contain "not registered". This test will fail.

The PR description and changelog both say the mutation should return "the real reason (e.g. 'Provider not-a-provider ... is not registered')". The fix is straightforward -- thread the underlying exception message instead of using a hardcoded string:

except (LLMProviderNotRegistered, ValueError) as exc:
    raise serializers.ValidationError(str(exc)) from exc

Minor Issues

Lazy imports in completions.py are fragile

All collaborator imports in agenerate_text are deferred to function-body scope to avoid circular imports. The test file explicitly documents this risk: "If any of those imports is ever hoisted to module level, the patches must move...". This is a coupling that will silently break if a refactor moves any of these imports. A note in the module docstring explaining why these must stay lazy would help future maintainers. Alternatively, consider whether the circular dependency can be broken structurally.

Raw <label> in LlmModelPicker.tsx instead of design system <FormLabel>

LlmModelPicker renders its label as an inline <label> with hard-coded inline styles rather than using the <FormLabel> component from @os-legal/ui (used everywhere else, including the old SystemSettings.tsx code this replaces). The <FormField> + <FormLabel> wrapper pattern from the design system would make it visually consistent and avoid duplicating style values.

CorpusLanguageModelCard -- initialPreferredLlm typed overly broadly

The prop is typed string | null | undefined and the useState init coalesces with || "". Since the GraphQL field returns nullable String, it will always arrive as string | null -- undefined cannot realistically arrive here. Narrowing the prop type makes the contract more precise and avoids the silent coercion.

Minor style note -- "what" comments in type definitions

A few added TypeScript comments describe what a field is rather than why it behaves a specific way (e.g., the preferredLlm additions to RawCorpusType). Per the project's no-unnecessary-comments policy, the non-obvious invariant (null = inherit) is worth keeping, but the spec-format reminder ("provider:model") duplicates what the field name already implies.


What's Working Well

  • The resolve_model_spec resolution chain is correctly threaded all the way from per-call to corpus to install-wide to Django settings, and the priority tests in test_llms_completions.py pin this contract clearly.
  • The disabled state on LlmModelPicker chips is correctly guarded in both the native disabled attribute and the onClick handler; aria-pressed is included for accessibility.
  • The Apollo cache update callback in CorpusLanguageModelCard correctly snapshots submittedValue before the mutation completes -- the right pattern for preventing stale-closure bugs.
  • enabled !== false (rather than !!enabled) correctly treats null/undefined as "not explicitly disabled".
  • The defensive getattr(self, "corpus", None) in _generate_conversation_title is the right guard for document-only sessions.
  • useId() in LlmModelPicker prevents label/input ID collisions when two pickers render on the same page.
  • The provider-exposure boundary test in test_pipeline_component_queries.py pins exactly the right invariant: corpus editors see all registered providers, but settingsSchema (which can carry credentials) stays superuser-only.
  • The mypy python_version = 3.12 fix is the correct long-term solution rather than pinning numpy.
  • Changelog fragments are thorough and follow the project fragment conventions correctly.

Summary

One test will fail as-is (the "not registered" assertion on the serializer error). Fix validate_preferred_llm in config/graphql/serializers.py to forward str(exc) instead of the hardcoded "Invalid model specification." message and the test becomes correct. The rest of the PR is solid -- the gap closures, new agenerate_text helper, and LlmModelPicker extraction are all well-reasoned and well-tested.

JSv4 added 3 commits June 22, 2026 20:51
…g internal path

The CodeQL info-exposure auto-fix flattened the LLMProviderNotRegistered
reason to a generic 'Invalid model specification.', which dropped the
user-actionable 'not registered' signal and broke
test_unknown_provider_rejected_with_clean_field_error. Construct a
type-specific message from the user's own input (the typed provider
prefix) so the real reason surfaces without echoing str(exc), whose
text embeds an internal source path.
The collaborator imports in agenerate_text are deferred to function scope
to break an import cycle; add a WHY comment so a future refactor doesn't
silently hoist them and break the source-module test patches in
test_llms_completions.py.
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

test

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

PLACEHOLDER - full review body to follow

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

Unable to post full review body — sandbox write permissions are blocked. The full review body was provided in the task instructions. Please post the content of /tmp/pr_review_body.md manually if needed.

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

PR Review — Consume LLM Singleton registry everywhere + per-corpus LLM GUI

This PR closes two real gaps in the LLM registry: it replaces the OpenAI-only SimpleLLMClient (which hardcoded gpt-4o-mini and blocked the async event loop) with a properly async, registry-backed agenerate_text helper, and it surfaces the previously-ghost Corpus.preferred_llm field through a new LlmModelPicker component shared between per-corpus settings and the admin system-settings screen. The serializer-layer validation for bad model specs (surfacing clean field errors instead of 500s) is a nice defensive addition. Overall the design is solid and the test coverage is thorough. Three things worth addressing:


1. if instructions: should be if instructions is not None: — API contract defect

File: opencontractserver/llms/completions.py, line ~88

The parameter is typed str | None = None, meaning None = "not provided" and "" = "explicitly pass an empty system instruction". The guard if instructions: conflates the two, silently suppressing instructions="". No current caller passes an empty string so there is no regression today, but the public API contract is wrong. Fix: if instructions is not None:.


2. Duplicate constant value — minor DRY violation

File: opencontractserver/constants/llm.py, lines 26 and 30

EXTRACT_DEFAULT_TEMPERATURE = 0.3 (existing) and TITLE_GENERATION_TEMPERATURE = 0.3 (new) are silently equal with no comment explaining why they must be independently tunable. CLAUDE.md requires maximal dryness. If they are semantically equivalent today, alias one to the other (TITLE_GENERATION_TEMPERATURE = EXTRACT_DEFAULT_TEMPERATURE) with a comment that they are kept separate to allow future divergence.


3. Redundant parse_model_spec call in the exception handler

File: config/graphql/serializers.py, line ~116

In the except LLMProviderNotRegistered handler, parse_model_spec(cleaned) is called a second time to extract the provider name. The comment correctly explains why str(exc) is avoided (CodeQL path-exposure). The second parse is safe in practice — LLMProviderNotRegistered is only raised after a successful parse inside validate_model_spec, so the same string cannot fail again. But the cleaner fix is to add a provider_key attribute to LLMProviderNotRegistered so callers can read it directly without re-parsing:

class LLMProviderNotRegistered(ValueError):
    def __init__(self, provider_key: str, spec: str) -> None:
        super().__init__(f"Provider {provider_key!r} (from spec {spec!r}) is not registered.")
        self.provider_key = provider_key

Then the serializer handler becomes provider = exc.provider_key — no second parse, no CodeQL concern. Not blocking, but the comment already flags the awkwardness and this resolves it cleanly.


Minor notes (non-blocking)

LlmModelPicker.tsx scatters raw rem/px literals ("0.8125rem", "0.375rem", "0.625rem") across ~22 inline style props — frontend/src/assets/configurations/constants.ts is the project home for these per CLAUDE.md's no-magic-numbers rule.

AgenerateTextTests patches collaborators at their source modules (e.g. opencontractserver.pipeline.utils.get_default_llm_spec), which is correct for lazy function-body imports but would silently break if those imports were ever hoisted. The in-code comment documents this — flagging for reviewers who might be surprised by the pattern.


Verified as non-issues: GET_SYSTEM_DEFAULT_LLM is @login_required only and defaultLlm is returned to all authenticated users (not superuser-gated); the enabled !== false provider filter handles null/undefined gracefully; the Apollo cache update callback is correctly gated on ok: true so no stale writes on mutation failure; exception handler order (LLMProviderNotRegistered before ValueError) is correct; uuid was already imported in the consumer. The removed SimpleLLMClient leaves zero dangling callers.

@JSv4 JSv4 merged commit 4cbd054 into main Jun 23, 2026
22 checks passed
@JSv4 JSv4 deleted the claude/llm-singleton-registry-config-4rei39 branch June 23, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants