Skip to content

Commit f742e32

Browse files
UN-2946 [FEAT] Prompt Studio lookups bridge, executor hook, and IDE wiring (OSS side) (#1929)
* UN-2946 [FEAT] Add lightweight list serializer for Prompt Studio and prompt list endpoint - Add CustomToolListSerializer for the list action to avoid N+1 queries (profile lookups, prompt fetching, coverage calculation per tool) - Add ToolStudioPromptListSerializer with only prompt_id, prompt_key, enforce_type, sequence_number - Add GET /prompt-studio/prompt/?tool_id={uuid} list endpoint - List action uses select_related and Subquery annotation for prompt_count - Detail endpoint unchanged (still uses full CustomToolSerializer) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Add Look-Ups plugin integration in sidebar nav and routes - Add lookup-studio plugin detection with dynamic import - Add PromptStudioPopoverContent for hover submenu (Projects / Look-Ups) following the same Popover pattern as HITL and Platform Settings - Register lookups/* route in useMainAppRoutes.js Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FIX] Use .get() fallback for prompt_studio_tool in create_profile_manager Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Add Lookups V2 OSS integration hooks for post-extraction enrichment - Add lookup_config export in prompt_studio_helper and registry_helper via cloud plugin guard (try/except ImportError) - Store raw output in PromptStudioOutputManager, enriched in cloud LookupOutputResult — preserving both for UI tab display - Add LookupEnrichmentProtocol and plugin call in post-extraction pipeline using ExecutorPluginLoader (no-op in OSS) - Track lookup LLM usage via standard metrics pipeline (usage_kwargs with run_id/execution_id, capture_metrics) - Move webhook postprocessing from answer_prompt to pipeline - Frontend: dynamic plugin imports for LookupMenuItem, LookupIndicator, LookupOutputTabs in prompt cards; fetch lookup outputs on page load - Add scroll-to-prompt support via query param in DocumentParser Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 Removed unnecessary gitignore * UN-2946 [REFACTOR] Deduplicate lookup config helper and add lookup usage reason - Extract get_lookup_config() to prompt_studio/lookup_utils.py, replacing 4 identical try/except ImportError blocks across prompt_studio_helper and prompt_studio_registry_helper - Add LOOKUP to LLMUsageReason choices (was missing, causing invalid choice on usage records from lookup enrichment LLM calls) - Migration: usage_v2/0004_add_lookup_usage_reason Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Add get_last_usage() to SDK1 LLM for token tracking Store prompt/completion/total token counts from the most recent complete() call on the LLM object itself, making usage data queryable without relying on the Audit pipeline roundtrip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [REFACTOR] Split post-extraction pipeline into lookup and webhook methods Move lookup result-application logic to the cloud plugin, matching the challenge plugin pattern where the plugin owns metadata mutation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Generic async extraction callbacks and WebSocket transport fallback Add reusable extraction_complete/extraction_error callback tasks to the ide_callback worker, replacing the need for Django-based celery workers for text extraction. Add ExtractionAPIClient for internal API calls. Add polling fallback to WebSocket transport for local dev reliability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Reduce success notification duration from 2s to 1s for less intrusive UX * Revert "Reduce success notification duration from 2s to 1s for less intrusive UX" This reverts commit d6e136d. * UN-2946 [REFACTOR] Pluggable lookup export validation Replace inline DRAFT lookup check with pluggable cloud-only hook. Uses try/except ImportError pattern — zero lookup code in OSS. Collects all DRAFT lookups in one pass with markdown-linked error messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [REFACTOR] Lookups V2 review cleanup - Consolidate cloud imports into lookup_utils.py with persist_lookup_output() and validate_lookups_for_export() wrappers - Fix LookupEnrichmentProtocol.run() return type to None matching challenge/evaluation pattern - Revert logger.info to logger.debug in websocket_views.py - Eliminate duplicated LookupOutputTabs ternary with renderWithLookupWrapper helper - Move lookups menu constants from SideNavBar.jsx to cloud plugin - Harden DocumentParser.jsx scrollTo with UUID validation and fix useEffect dependency - Revert SocketContext transport to ["websocket"] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [UI] Replace sidebar popover with in-page tabs for Lookups Move Prompt Studio / Look-Ups navigation from a hover popover on the sidebar into a Segmented control within the ToolNavBar. CustomTools dynamically imports LookupList from the plugin and renders tabs when available, falling back to projects-only view in OSS mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Deferred batch usage tracking with operation metrics Switch from eager per-call Audit HTTP push to a deferred batch write pattern for adapter usage. LLM/embedding calls stash records in-memory; the executor flushes them into ExecutionResult metadata; the Celery task batch-writes via a new internal endpoint. Adds 5 nullable columns to Usage (reference_id, reference_type, execution_time_ms, status, error_message) and a composite index for lookup dashboard queries. Extensible choice lists allow cloud plugins to register additional usage reasons and reference types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Add plugin hook for lookup output enrichment in serializer Bridge function in lookup_utils.py lets cloud plugins enrich PromptStudioOutputSerializer with lookup data (enriched output, lookup name). Enables real-time lookup results via WebSocket without page refresh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Add lookup usage observability with error handling and metadata passthrough - Wire usage_kwargs_extra from lookup config into LLM usage_kwargs for execution observability - Add error handling around enricher.run() with explicit ERROR usage records - Generic passthrough of _usage_kwargs into usage records for arbitrary metadata (e.g. reference_id) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Support enriched output copy and lookup drawer plugin hooks Add dynamic import of getEnrichedCopyText so the copy button copies enriched lookup output when the Enriched tab is active. Applied to both single-pass and multi-profile output paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Lookup export validation gate, raw-latest helper, and modified_at fix - Add /prompt-studio/<pk>/lookup-validation/ endpoint backing the FE Export/Deploy gate; multi-var block check accepts prompt_ids so a single prompt run isn't blocked by an unrelated multi-var lookup. - Add /prompt-output/latest-by-keys/ endpoint that returns the most recent raw output per prompt_key for the test panel's "Use Latest Outputs" helper. - Fix prompt output modified_at not refreshing on re-runs (QuerySet.update bypasses auto_now); set timezone.now() explicitly in the update args. - lookup_utils: bridge get_lookup_validation_for_tool and get_multi_var_lookups_for_tool with prompt_ids scoping. - Header wires useLookupExportGate via try-import (no-op stub in OSS). - TokenUsage treats all-null Usage rows as empty. - CombinedOutput / JsonView build enriched dict from metadata.lookup_outputs to back the Raw|Enriched output toggle. - .gitignore: widen docker/compose.*.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [UI] Fix combined output pill overlap and preserve Look-Ups tab on back - ProfileInfoBar: swap Row/Col for plain flex-wrap div — kills Ant Row negative-margin quirk that overlapped wrapped pills in combined output. - CustomTools: honor location.state.activeTab so back navigation from lookup detail lands on the Look-Ups tab instead of defaulting to Projects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Add last_exported_at and wire lookup staleness bridge Introduces nullable last_exported_at on CustomTool (populated on first successful export) so staleness checks can compare against downstream mutations without a data backfill. NULL is treated as "unknown" and suppresses the lookup-dirty flag to avoid false alarms on pre-feature projects. Adds the get_latest_lookup_mutation_for_tool bridge in lookup_utils so OSS stays decoupled from the cloud plugin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Stream lookup enrichment failures to workflow logs When enricher.run() raises, surface a user-visible ERROR log line in the workflow execution log alongside the existing usage record. Keeps lookup failures observable next to the other pre/post lookup lines we already emit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [UI] Wire lookup dirty-seed and export gate into ToolIde Loads useLookupDirtySeed (server-side is_lookup_dirty) and useLookupExportGate from the cloud plugin via dynamic imports so the reminder banner reflects lookup changes across page reloads and the banner's Export flow goes through the same validation modal as the main buttons. Also adds a titleAdornment slot on ToolNavBar for rendering the onboarding tooltip and relaxes EmptyState.text to accept nodes for the tagline + link composition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [FEAT] Share lookup test wrapper + generic ExecutionLogs back state - Delegate production lookup enrichment to LookupEnrichment.run_with_metrics so the executor and the IDE test path share LLM construction, error handling, and usage-record emission. - Let ExecutionLogs callers pass an arbitrary backRouteState via location state so nested UI restore (e.g. a sub-tab) no longer needs special casing in this component. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [REFACTOR] Round-2 review fixes — OSS side Bucket of hardening fixes driven by a staff-level PR re-review: - Org-scope latest_outputs_by_keys (was cross-tenant readable via raw .objects.filter() that bypassed OrganizationFilterBackend). - Hide lookup payload shape from OSS: three new opaque bridge helpers (get_original_value_if_enriched, attach_combined_output_enrichment, extract_prompt_output_enrichment) replace direct reads of metadata["lookup_outputs"] / _lookup_outputs / lookup_outputs in output_manager_helper, CombinedOutput.jsx, and usePromptOutput.js. - Split usage_v2 index into a new 0005 migration that uses AddIndexConcurrently + atomic=False so prod doesn't lock the billing table during build. - Delete stale workers/tests/test_usage.py that imported the removed UsageHelper module. - SDK1 LLM gains public get_last_usage_record() so downstream code stops reaching into _pending_usage across plugin boundaries. - legacy_executor stamps metadata["lookup_errors"][prompt] on a failed lookup outcome for dashboards that surface partial-failure runs. - extraction_client docstring notes the cloud-only endpoint contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [REFACTOR] Round-3 review fixes — OSS side - Module-level probe in prompt_studio/lookup_utils.py — swap per-function try/except ImportError for a single LOOKUPS_AVAILABLE flag. Add attach_lookup_config / attach_lookup_configs_to_tool_settings helpers so the direct metadata["lookup_errors"] write and the lookup_config key stamping both route through the bridge. - Reject org=None in UsageBatchCreateView (usage_v2/internal_views.py). - Lift useLookupExportGate to a single mount in ToolIde.jsx; thread checkLookups down into custom-tools/header/Header.jsx (eliminates the double modal-portal risk). - Delete the direct metadata["lookup_errors"] write from workers/executor/executors/legacy_executor.py — flat summary is now stamped by LookupEnrichment.write_lookup_error in the cloud plugin. Replace hardcoded "lookup_llm" metrics key with lookup_cls.METRICS_KEY. - Trim boilerplate comments across CombinedOutput, PromptCardItems, PromptOutput, usePromptOutput, prompt-card/Header, CustomToolsHelper, SideNavBar — keep the why-comments, drop the what-comments. * UN-2946 [FEAT] Reference prompts by UUID + missing-file gate — OSS side OSS counterpart to the cloud-side data-model change. Wires the prompt studio runtime to the new wire shape, surfaces lookup runnability state in the prompt card, and adds the usage_v2 enum entries the cloud side records against (lookup as an LLM usage reason, lookup_version as a reference type). Partially working — known follow-up: TODO: rework lookup input UX. The current variable-mapping flow is awkward (separate rows, manual prompt selection per variable); needs a redesign that mirrors how users actually compose a lookup template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [FIX] Surface skipped lookups when source prompt has no value When a configured lookup runs but extraction returned None for the source prompt, _run_lookup_enrichment used to fall through silently — leaving users wondering why enrichment didn't appear. Stream a one-line workflow log via shim.stream_log so the skip is visible alongside other tool-run events. * UN-2946 [REFACTOR] Address Sonar findings on lookups V2 PR - Extract _init_llm_and_retrieval and _flush_per_prompt_metrics from LegacyExecutor._execute_single_prompt to drop cognitive complexity below the gate - Extract per-prompt helpers from OutputManagerHelper.fetch_default_output_response - Extract buildDefaultProfileOutputs / buildSelectedProfileOutputs from CombinedOutput.fetchCombinedOutput - Give OSS lookup-plugin stub fns matching parameter lists so static analysis stops flagging call sites as arity mismatches - Define _UNKNOWN_EXECUTOR_ERROR constant in ide_callback.tasks for the thrice-duplicated literal - Use splice(-1, 0, ...) idiom for LookupMenuItem insertion in prompt card Header Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [FIX] Preserve usage records on executor failure paths - LegacyExecutor.execute attaches collected usage_records to failure metadata in the LegacyExecutorError branch - ExecutionOrchestrator broad-except pulls usage_records off the executor before wrapping the unhandled exception, so Celery autoretry doesn't drop billing rows from a partially-completed run - lookup_utils.get_lookup_validation_for_tool OSS stub returns incomplete_lookups: [] to match the cloud schema Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [REFACTOR] Squash usage_v2 migrations 3 → 2 for lookups V2 Folds 0006 (choices-only AlterField, no SQL) into 0004 by baking the final llm_usage_reason and reference_type choice lists directly. Keeps 0005 separate to preserve CREATE INDEX CONCURRENTLY safety on the billing-critical usage table. Existing envs that already ran 0006 need a one-time DELETE on its django_migrations row. * UN-2946 [REFACTOR] Update OSS↔cloud lookup bridge for app rename Cloud's lookup_v1 plugin is being renamed to lookups; update the OSS try-import paths (prompt_studio/lookup_utils.py, usage_v2/models.py) and the related comment in workers/shared/clients/extraction_client.py. Bridge contract is unchanged — the cloud module just lives at a new import path. * UN-2946 [FIX] Address greptile review on lookups V2 OSS - usage_v2/internal_views.py: wrap bulk_create in try/except so a flush failure logs ERROR with full context instead of returning a silent 500; chunk with batch_size=500 to bound transaction size on the billing-critical usage table. - legacy_executor: extend the lookup empty-value guard to also skip on empty strings/lists/dicts so a "" extraction doesn't reach the LLM as an empty input. Boolean/number 0/False remain valid. * UN-2946 [FIX] Static usage choices to fix migration drift in OSS CI Greptile P1: migration 0004_usage_metrics_fields hard-codes the cloud values "lookup" / "lookup_version" into the AlterField operations, but the model resolved its choices via a try-import that only fired when pluggable_apps.lookups was present. In OSS-only builds the import raised, the model carried 3 choices, and ``makemigrations --check`` saw drift against the migration's 4 choices — turning OSS CI red on every run. Carry the union of OSS + cloud values statically in usage_v2/models.py and drop the try-import. Cloud-only entries are unused on OSS but make migration state and model state agree in both builds. The DB is unaffected (choices is a Django-validator concern, not DDL). The now-unused CLOUD_LLM_USAGE_REASON_CHOICES / CLOUD_REFERENCE_TYPE_CHOICES exports are removed from the cloud plugin's constants module in a separate commit. * UN-2946 [FEAT] Block enforce_type switch via lookup plugin gate When a prompt has a lookup configured, switching its enforce_type to a complex type (table / line-item / agentic_table) is blocked: the cloud plugin exposes useEnforceTypeSwitchGate, which OSS dynamically imports with a no-op fallback. Blocked switches surface an alert and keep the previous enforce_type. * UN-2946 [FIX] Harden billing/usage paths against silent drops - Guard embedding flush on callback_manager is None so public-adapter embedding usage stops being silently swallowed by the broad except. - Initialise LegacyExecutor._usage_records in __init__ so an early-execute crash can't leave the orchestrator's getattr fallback reading None. - Wrap challenge and summarisation completion calls in try/finally so flush_pending_usage() runs even on exception — transient errors no longer drop those LLMs' billing rows. - Replace bare except: pass on the FE error stream with logger.debug; the secondary failure is now recoverable. - Tighten UsageAPIClient.bulk_create_usage success heuristic against future partial-body contracts; capture and log the bool return at the Celery task with run_id + organization_id. - Drop redundant organization_id kwarg passed alongside set_organization_context in tasks.py. - SDK records: spread _usage_kwargs first so explicit billing fields win; write run_id/llm_usage_reason as None instead of "" so UUIDField/choice columns don't reject the row. - Use rsplit('/', 1)[-1] for display_model so multi-segment IDs (e.g. bedrock/anthropic/claude) collapse to the trailing segment. - Log the litellm cost_per_token failure for embeddings (matches the LLM path). * UN-2946 [FIX] Cross-cutting hygiene around lookup enrichment & webhook - Tighten LookupEnrichmentProtocol to declare run_with_metrics and METRICS_KEY so it actually matches the plugin call site — run() -> None was structurally identical to the no-op protocols. - Wrap the cloud run_with_metrics call in _run_lookup_enrichment in a defensive try/except that logs + streams a WARN to the IDE log, so plugin contract drift degrades the lookup gracefully instead of aborting the answer-prompt batch. - Hoist llm_cls out of the per-prompt hot loop — it was being unpacked from a 7-tuple inside _run_lookup_enrichment on every prompt; the caller already has it. - Extract the is_empty ladder into a module-level _is_blank helper so the predicate (and the falsy-but-valid 0/False rationale comment) lives next to the predicate, not the executor. - _run_webhook_postprocessing: when webhook_enabled and output is non-JSON, log + shim.stream_log so the user sees the skip in the IDE panel instead of silently never receiving the webhook call. - Narrow persist_lookup_output catch to (IntegrityError, ValidationError) and promote to logger.error — broad catches were hiding plugin schema drift while reporting success. - Wrap enrich_prompt_output in the prompt-output serializer's to_representation so an enrichment exception no longer 500s the list endpoint; matches the surrounding log-and-continue policy. * UN-2946 [FIX] Tighten Usage choices & lookup_utils contracts - UsageStatus(TextChoices) so the status field has an enforced domain instead of free-text — producers (llm.py, usage_handler.py) already write the canonical "SUCCESS" string so no producer changes needed. - Add usage_reference_pair_consistent CheckConstraint so a row with reference_id set but reference_type NULL (or vice versa) is rejected at the DB. Cheap to validate on apply because both fields landed together in lookups V2 — legacy rows have both NULL. The sibling (usage_type, llm_usage_reason) constraint is deferred to a follow-up issue: legacy embedding rows have llm_usage_reason='' from the old SDK default, and a full-table backfill or default ADD CONSTRAINT scan would lock the billing table for too long in prod. - internal_views.py: write llm_usage_reason as None instead of "" when missing so the choice column doesn't store an out-of-domain value. - lookup_utils.py: narrow the ImportError catch to the four cloud lookup modules so a failing transitive import inside the cloud plugin re-raises instead of silently degrading the whole feature to a no-op. Annotate get_original_value_if_enriched return as tuple[Any, dict] | None and rephrase the docstring to match the actual shape callers tuple-unpack. - Drop the attach_lookup_config / attach_lookup_configs_to_tool_settings / get_lookup_config_from_output wrappers — they were trivial dict ops over a hardcoded key already used directly by the executor and single-pass plugin, so the "key owned by the bridge" framing was misleading. Inline at all five callsites. - Drop the "future prompt_studio" forward-looking comment in ide_callback/tasks.py — only source='lookup' is wired up, and the cloud lookups plugin is the only registrant of the underlying endpoints. * UN-2946 [PERF] Push Combined Output queries into SQL - latest_outputs_by_keys: switch the per-prompt latest pick to order_by('prompt_id', '-modified_at').distinct('prompt_id') so Postgres returns at most one row per prompt instead of materialising every historical run + relying on a Python break. - fetch_default_output_response: collapse the previous N+1 (exists() + for output in queryset per prompt) into a single DISTINCT ON (prompt_id, profile_manager_id) query, plus a per-tool cache for the default-profile lookup. Combined Output is a hot path — the old shape made every panel switch O(prompts × runs) with a plugin invocation per matching row. - Drop the unused _resolve_profile_for_prompt and _collect_default_output_for_prompt helpers, and the dead except ObjectDoesNotExist: return '' (neither .exists() nor the queryset iteration ever raised that exception). * UN-2946 [FIX] Frontend & callback hygiene around lookup hooks - PromptOutput.jsx: replace empty catch {} on the lookup dynamic-import sites with console.warn so unexpected chunk-load failures don't masquerade as OSS-mode behaviour. Add a resolveCopyText helper that wraps getEnrichedCopyText in try/catch + fallback so a plugin-side throw can't break the Copy button at either of the two call sites. - usePromptOutput.js: same catch (error) -> console.warn for the two existing dynamic-import sites; wrap the per-item handleLookupOutput call in try/catch so a single bad enrichment payload no longer aborts the surrounding forEach and skips the prompt-output state update. - prompt_studio_core_v2/views.py: validate prompt_id before the multi-var lookup gate so a missing field returns a clean 400 instead of a lookup-related error. - ide_callback/tasks.py: (result_dict.get('data') or {}) so an explicit data=None from the executor doesn't AttributeError into a generic ERROR callback. Replace the inner except: pass swallow with logger.debug so a secondary WS-emit failure during the outer extraction_complete fallback is recoverable from logs. * UN-2946 [FIX] Skip webhook on JSON parse failure & re-include compose.debug - _run_webhook_postprocessing: skip when structured_output[prompt_name] is empty or non-iterable. Pre-refactor, the webhook lived inside handle_json after its parse-failure early-return (which sets the output to {}), so a malformed JSON answer never dispatched the webhook — the new explicit gate restores that behaviour. Subscribers no longer receive empty-payload calls they didn't see before. - .gitignore: re-include docker/compose.debug.yaml after the broader docker/compose.*.yaml rule so a delete + recreate doesn't make the tracked file look untracked, and so teammate-added compose files aren't silently masked. * UN-2946 [FIX] Return 400 for missing tool_id (was 500) `APIException(code=400)` only sets `detail.code` in the JSON body; `status_code` is hardcoded to HTTP_500_INTERNAL_SERVER_ERROR. Switch to `ValidationError` so the missing-tool_id and tool-not-found branches in `latest_outputs_by_keys` and `get_output_for_tool_default` actually respond with 400 as the comment / docstring imply. * UN-2946 [FIX] Address remaining post-disposition review comments (OSS) Five threads on PR #1929 raised against the latest disposition push: * prompt_studio_core_v2/views.py: drop the ``single_pass_extraction_mode`` bypass from ``_multi_var_lookup_block_response`` — fetch_response / bulk_fetch_response are always non-SP, so the tool-setting check just let multi-var lookups slip past the gate when the tool happened to be configured for SP. * prompt_studio_registry_helper.py: filter out NOTES + inactive prompts *before* calling ``validate_lookups_for_export`` so an incomplete lookup on a non-exportable prompt no longer fails the whole export. * unstract/sdk1/usage_handler.py: guard ``self.token_counter is None`` in the embedding-end branch — degrade with a warning instead of an AttributeError on early callbacks. * workers/executor/legacy_executor.py: move the ``outcome.usage_records`` / ``outcome.llm_metrics`` access inside the ``try`` so plugin contract drift hits the same graceful-degrade branch as a thrown ``run_with_metrics``. * backend/prompt_studio/lookup_utils.py: include ``pluggable_apps`` itself in ``_CLOUD_LOOKUP_MODULES``. Pure OSS images don't have the parent package, so ``ImportError.name`` surfaces as ``"pluggable_apps"`` and the previous filter re-raised instead of setting LOOKUPS_AVAILABLE=False. * UN-2946 [DOCS] Tighten comments across lookups V2 OSS surface Drop archaeology / "previously / before-X-now-Y" framing; collapse multi-line WHAT walkthroughs to single-line WHY. No logic changes. * UN-2946 [REFACTOR] Drop unused token_count param from ExtractionAPIClient Pairs with the cloud-side removal of the lookup token_count / estimated_tokens fields — the worker no longer computes a value to send. * UN-2946 [REFACTOR] DRF-ify Usage internal batch endpoint + squash migration - UsageBatchCreateView raises DRF exceptions (drf-standardized-errors envelope) instead of hand-rolled JsonResponse — serializer validation via raise_exception=True, dedicated UsagePersistError(APIException) for the bulk_create failure path. - Records validated through UsageBatchCreateSerializer / nested UsageRecordCreateSerializer so adapter_instance_id, model_name, usage_type are required and the rest get explicit defaults. - Fold 0006_alter_usage_status_and_more (UsageStatus choices + reference_pair CheckConstraint) into 0004 — branch hasn't merged to main, so squashing avoids an extra ALTER on deploy. * UN-2946 [REFACTOR] Harden SDK / worker billing path + extract lookup helpers - llm.py: token_counter fallback when prompt_tokens=0; rsplit('/',1) for multi-segment provider IDs; spread _usage_kwargs first so explicit billing fields win. - utils/common.py: stamp every record appended during the call window (was clobbering only the last entry). - legacy_executor.py: extract run_lookup_enrichment / run_webhook_postprocessing / is_blank into workers/executor/executors/lookup_enrichment.py — caller passes shim/state in, plugin returns usage_records for the caller to extend its billing batch. Orchestrator stays pure dispatch. - ide_callback/tasks.py: drop the char-÷4 token estimate heuristic; context-manage the API clients so HTTP sessions don't leak. * UN-2946 [FIX] Lookup-related FE polish + DRF error envelope - prompt_studio/views.py: _multi_var_lookup_block_response uses ``detail`` instead of ``error`` to match drf-standardized-errors. - CombinedOutput.jsx: hoist build helpers to module scope (selectedProfile passed as arg) — no per-render allocation, fewer useEffect closures. - DocumentParser.jsx: hoist UUID_RE to module scope (no per-render compile). - PromptOutput.jsx: silent ``catch {}`` on plugin dynamic-import failures so OSS doesn't surface noisy warnings for cloud-only modules. - SideNavBar.jsx: hide the lookups submenu item when the lookup-studio plugin isn't loaded (keeps OSS nav clean). * UN-2946 [FIX] Type run_id / reference_id as UUIDField in batch serializer Both columns are UUIDField on the Usage model — leaving them as CharField in the serializer let invalid UUIDs slip through to bulk_create and surface as a 500. UUIDField catches them at validation with a standard DRF 400. * UN-2946 [DOCS] Tighten extraction_complete docstring Drops the stale "Computes token count" line — the callback no longer derives a token count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-2946 [FIX] Pre-bind validated_file_execution_id in usage client If _validate_file_execution_id raised, the except handler hit UnboundLocalError and masked the original ValueError. Pre-bind a str(file_execution_id) fallback so the error response carries the real cause. Also gitignore Codex's AGENTS.md scratchpad. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [CHORE] Ignore .pi/ tooling directory Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3494 [REFACTOR] Replace polymorphic Usage attribution with typed columns + post-write hooks Drop reference_id / reference_type from Usage in favour of typed project_id / prompt_id columns indexed CONCURRENTLY for dashboard rollups. Cloud-only attribution (e.g. lookup_version_id) now flows through an opaque cloud_extras carrier on the batch endpoint, which forwards it verbatim to plugin-registered post-write hooks invoked inside the same atomic transaction as bulk_create — a hook failure rolls back the Usage rows so attribution stays consistent or nothing is written. Removes the need for cloud to subclass UsageBatchCreateView or prepend a URL override; the hook seam is generic for any future cloud feature. * UN-3494 [REVIEW] Idempotent hook registration + lookup-usage partial index - register_post_write_hook now dedupes by identity so AppConfig.ready() re-firing under test reloads or dev autoreload can't queue a second LookupUsage write that would IntegrityError and roll back the batch. - usage_objects builder collapsed to a comprehension (review polish). - New 0005 migration step adds a partial index idx_usage_lookup_recent on (organization_id, created_at DESC) WHERE llm_usage_reason='lookup', so the per-(run x prompt) dashboard aggregation stops heap-scanning all Usage rows when filtering by organisation + reason. * UN-3494 [FIX] Forward answer-step metadata in structure pipeline The SP cloud plugin returns its usage_records via ExecutionResult.metadata; the non-SP path recovers them via self._usage_records in LegacyExecutor.execute(). _handle_structure_pipeline only honoured the second carrier — every SP-mode API deployment lost its extraction + lookup billing rows silently because tasks.py guards the flush behind ``if usage_records:`` and the empty list short-circuits the post. Forwarding answer_result.metadata closes the gap. Surface drift here deserves a follow-up to consolidate to a single carrier; tracked separately. * UN-3494 [REFACTOR] Unify usage-records carrier and propagate execution_id Collapses the dual-carrier pattern in the executor worker so every handler returns its rows via ExecutionResult.metadata["usage_records"]. LegacyExecutor's instance attribute and recovery hook are removed; each helper returns its records, orchestrators absorb child metadata into a single list, and partial rows survive a mid-pipeline failure via LegacyExecutorError.partial_usage_records. tasks.py now logs an INFO line when an LLM-bearing op succeeds with zero rows so future drift surfaces immediately. Also fixes the long-standing dispatcher gap where structure_tool_task omitted execution_id and file_execution_id when constructing ExecutionContext for structure_pipeline / table_extract. The fields were only stuffed inside executor_params, so context.execution_id was None for every downstream handler. The legacy answer-prompt handler dug into executor_params and worked, but SP plugin and summarize handlers fell back to "" — and the dashboard's classifier mapped empty execution_id to the IDE bucket. Setting the dataclass field at dispatch plus reading context.execution_id in _handle_summarize lets workflow rows classify as WF/API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3494 [REVIEW] Trim verbose code comments Drop WHAT-comments, references to PR/conversation context, and multi-line explanations that didn't add WHY. Comments now describe behavior generically so they make sense without prior context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0538fdc commit f742e32

63 files changed

Lines changed: 2539 additions & 845 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -653,7 +653,11 @@ docker/*.env
653653
!docker/sample*.env
654654
docker/public_tools.json
655655
docker/proxy_overrides.yaml
656-
docker/compose.override.yaml
656+
docker/compose.*.yaml
657+
# ``docker/compose.debug.yaml`` is checked-in tooling — keep it out of the
658+
# broader ``compose.*.yaml`` ignore so a delete + recreate doesn't make it
659+
# look untracked, and so teammates can spot it.
660+
!docker/compose.debug.yaml
657661
docker/workflow_data/
658662

659663
# Tool development
@@ -696,6 +700,12 @@ CLAUDE.md
696700
CONTRIBUTION_GUIDE.md
697701
.mcp.json
698702

703+
# Codex
704+
AGENTS.md
705+
706+
# Pi
707+
.pi/
708+
699709
# Windsurf
700710
.qodo
701711
.windsurfrules
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
"""Shared utility for lookup operations. No-ops in OSS.
2+
3+
Only the absence of ``pluggable_apps.lookups`` itself is treated as
4+
"cloud not installed"; an ImportError from a transitive dependency
5+
re-raises so we don't silently degrade to a no-op on a real bug.
6+
"""
7+
8+
import logging
9+
from typing import Any
10+
11+
logger = logging.getLogger(__name__)
12+
13+
_CLOUD_LOOKUP_MODULES = {
14+
# OSS images lack the parent ``pluggable_apps`` package, so include it.
15+
"pluggable_apps",
16+
"pluggable_apps.lookups",
17+
"pluggable_apps.lookups.execution",
18+
"pluggable_apps.lookups.output_enrichment",
19+
"pluggable_apps.lookups.staleness",
20+
"pluggable_apps.lookups.validation",
21+
"pluggable_apps.lookups.models",
22+
}
23+
24+
try:
25+
from pluggable_apps.lookups import execution as _execution
26+
from pluggable_apps.lookups import output_enrichment as _output_enrichment
27+
from pluggable_apps.lookups import staleness as _staleness
28+
from pluggable_apps.lookups import validation as _validation
29+
from pluggable_apps.lookups.models import LookupOutputResult as _LookupOutputResult
30+
31+
LOOKUPS_AVAILABLE = True
32+
except ImportError as e:
33+
if e.name not in _CLOUD_LOOKUP_MODULES:
34+
raise
35+
LOOKUPS_AVAILABLE = False
36+
37+
38+
def get_lookup_config(prompt) -> dict | None:
39+
"""Return lookup config for a prompt, or None if lookups are unavailable."""
40+
if not LOOKUPS_AVAILABLE:
41+
return None
42+
return _execution.build_lookup_config_for_prompt(prompt)
43+
44+
45+
def get_lookup_configs_for_tool(tool, prompts=None) -> list[dict] | None:
46+
"""Return lookup configs for a tool (single pass), or None in OSS.
47+
48+
``prompts`` scopes validation to the run's prompts so unrelated
49+
incomplete assignments on the tool don't block it.
50+
"""
51+
if not LOOKUPS_AVAILABLE:
52+
return None
53+
return _execution.build_lookup_configs_for_tool(tool, prompts=prompts)
54+
55+
56+
def get_multi_var_lookups_for_tool(tool, prompt_ids=None) -> list[str]:
57+
"""Return names of multi-variable lookups linked to the tool, [] in OSS.
58+
59+
``prompt_ids`` scopes the check so a run is only blocked when the
60+
multi-var lookup is actually used by it.
61+
"""
62+
if not LOOKUPS_AVAILABLE:
63+
return []
64+
_, names = _execution.has_multi_var_lookups(tool, prompt_ids=prompt_ids)
65+
return names
66+
67+
68+
def persist_lookup_output(prompt_output, prompt_lookup: dict) -> None:
69+
"""Persist lookup enrichment result. No-op in OSS."""
70+
if not LOOKUPS_AVAILABLE:
71+
return
72+
lookup_meta = prompt_lookup.get("meta", {})
73+
lookup_id = lookup_meta.get("lookup_id")
74+
if not lookup_id:
75+
return
76+
defaults = {
77+
"lookup_definition_id": lookup_id,
78+
"output": prompt_lookup.get("enriched", ""),
79+
}
80+
version_id = lookup_meta.get("version_id")
81+
if version_id:
82+
defaults["version_id"] = version_id
83+
_LookupOutputResult.objects.update_or_create(
84+
prompt_output=prompt_output,
85+
defaults=defaults,
86+
)
87+
88+
89+
def enrich_prompt_output(prompt_output, data: dict) -> dict:
90+
"""Let cloud plugins enrich serialized prompt output with lookup data.
91+
92+
No-op in OSS.
93+
"""
94+
if not LOOKUPS_AVAILABLE:
95+
return data
96+
return _output_enrichment.enrich_with_lookup_output(prompt_output, data)
97+
98+
99+
def validate_lookups_for_export(prompts) -> tuple[dict, str | None]:
100+
"""Validate lookup assignments before export. Returns ({}, None) in OSS."""
101+
if not LOOKUPS_AVAILABLE:
102+
return {}, None
103+
return _validation.validate_lookups_for_export(prompts)
104+
105+
106+
def get_latest_lookup_mutation_for_tool(tool):
107+
"""Max ``modified_at`` across lookup-related records linked to the tool
108+
(version, reference file, assignment) — feeds the staleness banner.
109+
None if unavailable or nothing linked.
110+
"""
111+
if not LOOKUPS_AVAILABLE:
112+
return None
113+
return _staleness.get_latest_lookup_mutation_for_tool(tool)
114+
115+
116+
def get_original_value_if_enriched(
117+
metadata: dict, prompt_key: str
118+
) -> tuple[Any, dict] | None:
119+
"""Return ``(original_value, prompt_lookup_dict)`` if ``prompt_key`` was
120+
enriched, or ``None`` otherwise.
121+
122+
Pure metadata-shape check — safe to call even when LOOKUPS_AVAILABLE
123+
is False (returns None because the shape won't match).
124+
"""
125+
if not isinstance(metadata, dict):
126+
return None
127+
lookup_outputs = metadata.get("lookup_outputs") or {}
128+
prompt_lookup = lookup_outputs.get(prompt_key)
129+
if isinstance(prompt_lookup, dict) and "original" in prompt_lookup:
130+
return prompt_lookup.get("original"), prompt_lookup
131+
return None
132+
133+
134+
def attach_combined_output_enrichment(result: dict, enriched_by_key: dict) -> None:
135+
"""Stamp the combined-output payload with enriched-output metadata.
136+
137+
Key name stays cloud-side so the FE-plugin shape can evolve without
138+
coordinating with OSS.
139+
"""
140+
if not LOOKUPS_AVAILABLE:
141+
return
142+
_output_enrichment.attach_combined_output_enrichment(result, enriched_by_key)
143+
144+
145+
def extract_prompt_output_enrichment(item) -> dict | None:
146+
"""Pick enriched-output data off a serialized prompt-output row.
147+
148+
Returns a plugin-opaque dict (FE-only) or None when no enrichment
149+
is present / plugin missing.
150+
"""
151+
if not LOOKUPS_AVAILABLE:
152+
return None
153+
return _output_enrichment.extract_prompt_output_enrichment(item)
154+
155+
156+
def get_lookup_validation_for_tool(tool) -> dict:
157+
"""Pre-emptive lookup validation for FE Export / Deploy gating.
158+
159+
Returns an "always ok" payload in OSS so the FE gate is a no-op.
160+
"""
161+
if not LOOKUPS_AVAILABLE:
162+
return {
163+
"ok": True,
164+
"draft_lookups": [],
165+
"multi_var_lookups": [],
166+
"incomplete_lookups": [],
167+
"single_pass_enabled": bool(
168+
getattr(tool, "single_pass_extraction_mode", False)
169+
),
170+
}
171+
return _validation.get_lookup_validation_for_tool(tool)
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Generated by Django 4.2.1 on 2026-04-21 20:20
2+
3+
from django.db import migrations, models
4+
5+
6+
class Migration(migrations.Migration):
7+
dependencies = [
8+
("prompt_studio_core_v2", "0006_add_custom_data_to_customtool"),
9+
]
10+
11+
operations = [
12+
migrations.AddField(
13+
model_name="customtool",
14+
name="last_exported_at",
15+
field=models.DateTimeField(
16+
blank=True,
17+
db_comment="Timestamp of the last successful export; NULL if never exported since the field was introduced.",
18+
null=True,
19+
),
20+
),
21+
]

backend/prompt_studio/prompt_studio_core_v2/models.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,15 @@ class CustomTool(DefaultOrganizationMixin, BaseModel):
161161
db_comment="Flag to share this custom tool with all users in the organization",
162162
)
163163

164+
# NULL on pre-feature tools; populated on first successful export.
165+
# Drives staleness checks (e.g. lookup-change banner) without requiring
166+
# a data backfill.
167+
last_exported_at = models.DateTimeField(
168+
null=True,
169+
blank=True,
170+
db_comment="Timestamp of the last successful export; NULL if never exported since the field was introduced.",
171+
)
172+
164173
objects = CustomToolModelManager()
165174

166175
def delete(self, organization_id=None, *args, **kwargs):

backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,17 @@
1414
from django.db import transaction
1515
from django.db.models.manager import BaseManager
1616
from plugins import get_plugin
17+
from rest_framework.exceptions import APIException
1718
from rest_framework.request import Request
1819
from utils.file_storage.constants import FileStorageKeys
1920
from utils.file_storage.helpers.prompt_studio_file_helper import PromptStudioFileHelper
2021
from utils.local_context import StateStore
2122

2223
from backend.celery_service import app as celery_app
24+
from prompt_studio.lookup_utils import (
25+
get_lookup_config,
26+
get_lookup_configs_for_tool,
27+
)
2328
from prompt_studio.prompt_profile_manager_v2.models import ProfileManager
2429
from prompt_studio.prompt_profile_manager_v2.profile_manager_helper import (
2530
ProfileManagerHelper,
@@ -387,6 +392,9 @@ def _build_prompt_output(
387392
if webhook_enabled:
388393
output[TSPKeys.POSTPROCESSING_WEBHOOK_URL] = webhook_url
389394

395+
if lookup_config := get_lookup_config(prompt):
396+
output["lookup_config"] = lookup_config
397+
390398
output[TSPKeys.EVAL_SETTINGS] = {}
391399
output[TSPKeys.EVAL_SETTINGS][TSPKeys.EVAL_SETTINGS_EVALUATE] = prompt.evaluate
392400
output[TSPKeys.EVAL_SETTINGS][TSPKeys.EVAL_SETTINGS_MONITOR_LLM] = [monitor_llm]
@@ -798,6 +806,9 @@ def build_fetch_response_payload(
798806
if webhook_enabled:
799807
output[TSPKeys.POSTPROCESSING_WEBHOOK_URL] = webhook_url
800808

809+
if lookup_config := get_lookup_config(prompt):
810+
output["lookup_config"] = lookup_config
811+
801812
output[TSPKeys.EVAL_SETTINGS] = {}
802813
output[TSPKeys.EVAL_SETTINGS][TSPKeys.EVAL_SETTINGS_EVALUATE] = prompt.evaluate
803814
output[TSPKeys.EVAL_SETTINGS][TSPKeys.EVAL_SETTINGS_MONITOR_LLM] = [monitor_llm]
@@ -1166,6 +1177,10 @@ def build_single_pass_payload(
11661177
TSPKeys.SIMILARITY_TOP_K: default_profile.similarity_top_k,
11671178
}
11681179

1180+
lookup_configs = get_lookup_configs_for_tool(tool, prompts=prompts)
1181+
if lookup_configs:
1182+
tool_settings["lookup_configs"] = lookup_configs
1183+
11691184
for p in prompts:
11701185
if not p.prompt:
11711186
raise EmptyPromptError()
@@ -1607,6 +1622,9 @@ def _execute_single_prompt(
16071622
is_single_pass=False,
16081623
profile_manager_id=profile_manager_id,
16091624
)
1625+
except APIException:
1626+
# Validation responses are user-facing; DRF renders them as-is.
1627+
raise
16101628
except Exception as e:
16111629
logger.error(
16121630
f"[{tool.tool_id}] Error while fetching response for "
@@ -1672,6 +1690,9 @@ def _execute_prompts_in_single_pass(
16721690
document_id=document_id,
16731691
is_single_pass=True,
16741692
)
1693+
except APIException:
1694+
# Validation responses are user-facing; DRF renders them as-is.
1695+
raise
16751696
except Exception as e:
16761697
logger.error(
16771698
f"[{tool.tool_id}] Error while fetching single pass response: {e}"
@@ -1911,6 +1932,8 @@ def _fetch_response(
19111932
output[TSPKeys.ENABLE_POSTPROCESSING_WEBHOOK] = webhook_enabled
19121933
if webhook_enabled:
19131934
output[TSPKeys.POSTPROCESSING_WEBHOOK_URL] = webhook_url
1935+
if lookup_config := get_lookup_config(prompt):
1936+
output["lookup_config"] = lookup_config
19141937
# Eval settings for the prompt
19151938
output[TSPKeys.EVAL_SETTINGS] = {}
19161939
output[TSPKeys.EVAL_SETTINGS][TSPKeys.EVAL_SETTINGS_EVALUATE] = prompt.evaluate

backend/prompt_studio/prompt_studio_core_v2/urls.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,10 @@
6666

6767
prompt_studio_task_status = PromptStudioCoreView.as_view({"get": "task_status"})
6868

69+
prompt_studio_lookup_validation = PromptStudioCoreView.as_view(
70+
{"get": "lookup_validation"}
71+
)
72+
6973

7074
urlpatterns = format_suffix_patterns(
7175
[
@@ -165,5 +169,10 @@
165169
prompt_studio_task_status,
166170
name="prompt-studio-task-status",
167171
),
172+
path(
173+
"prompt-studio/<uuid:pk>/lookup-validation/",
174+
prompt_studio_lookup_validation,
175+
name="prompt-studio-lookup-validation",
176+
),
168177
]
169178
)

0 commit comments

Comments
 (0)