[Feature]: Term Mapping — LLM-as-judge suggester engine

## Background

Follow-up from the Term Mapping v1 work (PR #483, PRD #469). The PRD listed multiple suggester engines, but only `HeuristicSuggester` ships in v1. The `engines/` directory and `engine_metadata` JSON column on `term_mapping_suggestions` are already scaffolded for additional engines.

## Goal

Add an **LLM-as-judge** re-rank engine that runs on top of the heuristic suggester to disambiguate mid-confidence candidates.

## Design (from original PRD)

- Fires **only** for heuristic confidences in the mid-range (configurable threshold) and **only** when explicitly enabled on the run.
- Uses the existing Databricks Foundation Model client (no new SDK dependency).
- Output: confidence ∈ [0, 1] plus a short rationale.
- The judge's confidence **multiplies** the heuristic confidence; the product is rounded to preserve the high-confidence auto-mark threshold semantics.
- Engine name `llm_judge` (or similar); rationale stored on the existing `reason` field; engine-specific bookkeeping in `engine_metadata`.

## Scope

- New `engines/llm_judge.py` implementing the `Suggester` protocol (or a complementary `Reranker` protocol — TBD during design).
- Wire into `TermMappingManager.create_run` orchestration: heuristic runs first, then the judge re-ranks the mid-confidence slice.
- Per-run toggle in the run-config dialog (`engines: ['heuristic', 'llm_judge']`).
- LLM consent flow reuses the existing app-wide one-time consent dialog.
- Cost guardrails: hard cap on number of judge calls per run; configurable in Settings.
- Unit tests for engine glue (mock the FM client).

## Out of scope (separate follow-ups)

- Embedding-based semantic engine.
- Per-engine confidence calibration UI.

## Acceptance

- Run-config dialog exposes the engine toggle.
- A run with `llm_judge` enabled produces suggestions whose `engine_metadata` records both heuristic and judge confidences plus the judge's rationale.
- Auto-apply threshold behavior is preserved (judge can only **lower** the final confidence, never raise it above 1.0).
- Cost cap is enforced; runs that hit the cap surface a per-run warning rather than failing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Term Mapping — LLM-as-judge suggester engine #485

Background

Goal

Design (from original PRD)

Scope

Out of scope (separate follow-ups)

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Term Mapping — LLM-as-judge suggester engine #485

Description

Background

Goal

Design (from original PRD)

Scope

Out of scope (separate follow-ups)

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions