Skip to content

Add confidence breakdown API for proposal review UI#1036

Merged
Chris0Jeky merged 17 commits intomainfrom
paper/1021-confidence-breakdown
Apr 27, 2026
Merged

Add confidence breakdown API for proposal review UI#1036
Chris0Jeky merged 17 commits intomainfrom
paper/1021-confidence-breakdown

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

Implements the backend confidence breakdown system (#1021) so the paper deep-review header dial and right-rail bars can show a 4-component explanation of why a proposal is or isn't above the apply threshold.

  • Domain layer: ConfidenceComponent and ConfidenceBreakdown value objects with full [0..1] range validation, NaN/Infinity rejection, equality, and defensive copy semantics
  • Application layer: IConfidenceBreakdownService / ConfidenceBreakdownService computing 4 components (Pattern match, Reach, Reversibility, Recency) from proposal metadata with weighted overall score and threshold-relative note generation
  • API layer: GET /api/automation/proposals/{id}/confidence endpoint with board-scoped authorization
  • Tests: 30 domain tests + 33 service tests covering validation, edge cases, and integration scenarios

Component computation

Component Weight Signal
Pattern match 0.30 Proportion of operations using well-known action types
Reach 0.20 Inverse log of distinct affected entities (focused = high)
Reversibility 0.35 Blend of risk level baseline + action-type destructiveness
Recency 0.15 Remaining fraction of proposal expiry window

Response shape

{
  "overall": 0.82,
  "components": [
    { "key": "Pattern match", "value": 1.0 },
    { "key": "Reach", "value": 0.67 },
    { "key": "Reversibility", "value": 0.9 },
    { "key": "Recency", "value": 0.95 }
  ],
  "note": null,
  "threshold": 0.7,
  "meetsThreshold": true
}

Test plan

  • Domain: ConfidenceComponent validation (0..1 range, NaN, Infinity, empty key)
  • Domain: ConfidenceBreakdown validation (overall, threshold, null components, boundary values, MeetsThreshold)
  • Domain: Equality and hash code contracts
  • Service: Breakdown computation for different proposal types
  • Service: Each component computes in [0, 1] range
  • Service: Pattern match for known/unknown action types
  • Service: Reach for single vs multi-target proposals
  • Service: Reversibility for different risk levels and action types
  • Service: Overall weighted average correctness
  • Service: Note generation near/far from threshold
  • Service: Not-found proposal returns proper error
  • dotnet build backend/Taskdeck.sln -c Release passes
  • 63 new tests all pass

Closes #1021

Immutable value object representing a single named component of a
confidence breakdown (e.g. Pattern match, Reach). Validates key is
non-empty and value is finite and in [0.0, 1.0]. Includes equality,
hash code, and operator overloads.

Closes part of #1021
Multi-component confidence breakdown with overall score, component
list, optional note, and threshold. Validates all scores are finite
and in [0.0, 1.0]. Defensive-copies component list. Exposes
MeetsThreshold computed property.

Closes part of #1021
Application-layer DTOs for the confidence breakdown API response.
ConfidenceBreakdownDto includes overall, components, note, threshold,
and a computed MeetsThreshold property.

Closes part of #1021
Computes a 4-component confidence breakdown for proposals:
- Pattern match: proportion of operations using well-known actions
- Reach: inverse log of distinct affected entities
- Reversibility: blend of risk level baseline and action-type risk
- Recency: remaining fraction of the proposal expiry window

Overall is a weighted average (Reversibility 0.35, Pattern match 0.30,
Reach 0.20, Recency 0.15). Generates explanatory note when score is
near the default 0.7 threshold.

Closes part of #1021
Wire IConfidenceBreakdownService into AutomationProposalsController
and register it in DI. Endpoint uses existing authorization flow to
verify the caller has read access to the proposal's board.

Closes part of #1021
30 tests covering value validation (0..1 range, NaN, Infinity,
empty key, null components), boundary values, equality/hash code,
MeetsThreshold, defensive copy, and ToString formatting.

Closes part of #1021
33 tests covering breakdown computation, component logic (pattern
match, reach, reversibility, recency), overall weighted average,
note generation, and proposal-type integration scenarios.

Closes #1021
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Adversarial review found that move, reorder, assign, attach, restore,
unarchive, and unblock were in WellKnownActions but not classified as
safe or destructive, causing them to fall through to a neutral 0.6
action factor. These are all reversible operations and should be
classified as safe for the reversibility component.
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

Checked areas (per issue requirements)

1. Value range validation -- PASS

  • ConfidenceComponent and ConfidenceBreakdown both reject NaN, Infinity, negative, and >1.0 values at construction time
  • All service computation methods clamp outputs to [0.0, 1.0] via Math.Clamp before passing to domain constructors
  • Double safety net: clamp in service + validation in domain constructor

2. NaN/Infinity handling -- PASS

  • ComputeOverall divides by totalWeight but guards totalWeight <= 0.0 first
  • ComputeReach uses Math.Log2(distinctTargets) where distinctTargets >= 1, so Log2(1) = 0 is safe (result is 1.0 / 1.0 = 1.0)
  • ComputeRecency divides by totalWindow but guards totalWindow <= 0.0 first
  • No code path can produce NaN or Infinity

3. Interaction with existing ConfidenceScore system -- PASS (no conflict)

  • New system is deliberately parallel: ConfidenceScore/ConfidenceAggregator/FieldConfidence aggregate per-source signals (Verbalized, ProviderLogprob, etc.)
  • New ConfidenceBreakdown decomposes per-proposal characteristics (Pattern match, Reach, Reversibility, Recency)
  • Different concern, no shared state, no coupling risk

4. Component weight normalization -- PASS

  • Weights sum to exactly 1.0 (0.30 + 0.20 + 0.35 + 0.15)
  • ComputeOverall normalizes by totalWeight so it handles partial component sets correctly
  • Components are independent signals, not constrained to sum to overall

5. Floating-point comparison edge cases -- PASS

  • MeetsThreshold uses >= (standard double comparison, correct at boundary)
  • GenerateNote uses strict < for band detection, consistent behavior
  • No epsilon comparisons needed since scores are clamped and well-bounded

Found and fixed

6. Missing action type classification -- FIXED in 28cfb5de

  • move, reorder, assign, attach, restore, unarchive, unblock were in WellKnownActions but not in SafeActions or DestructiveActions
  • They fell through to a neutral actionFactor = 0.6, undervaluing reversibility for common safe operations
  • Fix: added all reversible actions to SafeActions

Remaining notes (not bugs, design decisions)

  • Threshold is currently hardcoded at 0.7 (issue says "configurable later")
  • Component weights are internal constants -- no user-facing tuning yet
  • Recency is time-dependent (uses DateTime.UtcNow), which makes it non-deterministic for exact value assertions in tests; tests use range checks instead

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a confidence breakdown system for automation proposals, adding a new API endpoint and a service to calculate scores based on pattern matching, reach, reversibility, and recency. The feedback identifies an unused parameter in the service method, a discrepancy between the reach score formula and its documentation, and a performance optimization to reduce allocations during score weighting.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Review -- PR #1036

Reviewed all 10 changed files. Confirming the 3 Gemini bot findings and fixing all of them:

1. MEDIUM: Unused userId parameter (ConfidenceBreakdownService.cs:68)

Confirmed. The userId parameter in GetBreakdownAsync is accepted but never used. Authorization is already handled at the controller level via AuthorizeProposalAsync before the service is called. Removed userId from IConfidenceBreakdownService, ConfidenceBreakdownService, the controller call site, and all test call sites.

2. MEDIUM: Reach formula mismatch (ConfidenceBreakdownService.cs:148)

Confirmed -- the formula has a bug. The documented examples (2 targets ~ 0.67, 4 targets ~ 0.5, 8 targets ~ 0.4) match the formula 2.0 / (2.0 + log2(n)), but the code used 1.0 / (1.0 + log2(n)) which produces 0.50 / 0.33 / 0.25 instead. Fixed to 2.0 / (2.0 + Math.Log2(distinctTargets)).

3. MEDIUM: Dictionary allocation on every call (ConfidenceBreakdownService.cs:252)

Confirmed. The weights dictionary was re-allocated on every call to ComputeOverall. Promoted to a private static readonly Dictionary<string, double> ComponentWeights field.

Additional checks

  • NaN/Infinity handling: Domain value objects (ConfidenceComponent, ConfidenceBreakdown) both validate against NaN and Infinity in constructors. The service clamps all computed values via Math.Clamp. Covered by tests.
  • Value range validation (0..1): Both domain types reject out-of-range values. Service uses Math.Clamp on all component scores and the overall score. Covered by tests.
  • Floating-point edge cases: ComputeRecency guards against degenerate windows (totalWindow <= 0), negative elapsed time (clock skew), and uses Math.Max before division. ComputeOverall guards against totalWeight <= 0. No division-by-zero paths.
  • Integration with ConfidenceScore: The service operates independently via IUnitOfWork.AutomationProposals and produces a standalone DTO. No coupling issues with any existing ConfidenceScore property.

All 63 ConfidenceBreakdown tests pass (30 domain, 33 application). Build succeeds with 0 errors.

Authorization is handled at the controller level via AuthorizeProposalAsync
before the service is called, so the userId parameter was dead code.
…tatic field

- Fix ComputeReach formula from 1/(1+log2) to 2/(2+log2) to match
  documented examples (2 targets ~ 0.67, 4 ~ 0.5, 8 ~ 0.4)
- Remove unused userId parameter from GetBreakdownAsync implementation
- Promote per-call weights dictionary to static readonly ComponentWeights
  field to avoid allocation on every ComputeOverall call
Matches the updated IConfidenceBreakdownService signature that no
longer accepts a userId parameter.
Remove Guid.NewGuid() userId arguments from all GetBreakdownAsync
test call sites to match the updated interface signature.
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Fresh Adversarial Review (Round 3)

IMPORTANT

1. Double database fetch for proposal (Confidence: 85)
Controller calls AuthorizeProposalAsync (loads proposal), then _confidenceBreakdownService.GetBreakdownAsync loads it again. Consistent with existing codebase pattern but still two round-trips per request.

2. Static mutable Dictionary<string, double> for ComponentWeights (Confidence: 80)
Dictionary<string, double> is mutable even with readonly — future code could accidentally call ComponentWeights.Add(...). Trivially preventable.

Fix: Use FrozenDictionary<string, double> (.NET 8+) or ReadOnlyDictionary<string, double>.

Verified Non-Issues

  • Floating-point handling thorough: NaN/Infinity guards, Math.Clamp at service + domain layers
  • Reach formula 2.0 / (2.0 + log2(n)) matches documented examples (previous bug fixed)
  • Weight normalization correct (sum = 1.0, ComputeOverall handles partial sets)
  • Security: AuthorizeProposalAsync with board-scoped authorization
  • Clean architecture boundaries respected
  • 63 tests with good edge case coverage

Wrap the static ComponentWeights field as IReadOnlyDictionary backed by
ReadOnlyDictionary to prevent accidental mutation of weight values at
runtime.
Resolve conflicts combining confidence-breakdown endpoint (branch) with
side-effects, similar-past, and streak features (main). Take main's
addColumn fix in smoke.spec.ts.
Comment thread frontend/taskdeck-web/scripts/demo-director.mjs Fixed
@Chris0Jeky Chris0Jeky merged commit c98459d into main Apr 27, 2026
31 checks passed
@github-project-automation github-project-automation Bot moved this from Pending to Done in Taskdeck Execution Apr 27, 2026
@Chris0Jeky Chris0Jeky deleted the paper/1021-confidence-breakdown branch April 27, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

paper-review-backend-gap-confidence-breakdown

2 participants