[feature] ~ Expose retrieval scores in recall API response for transparency and threshold-based filtering

### Use Case

I'm building an AI agent system that uses Hindsight's recall API for memory retrieval. My application needs to make intelligent decisions about which retrieved results to act on — not just consume everything returned, but selectively use results based on their confidence/relevance level. Today I have to enable `trace: true` and parse debug output just to get any score signal, which is verbose and not suitable for production use.

### Problem Statement

Currently, the recall API does not return any numeric score alongside results. The documentation acknowledges that scores exist internally but intentionally omits them from the response, citing that "raw retrieval scores are not meaningful on an absolute scale." While I understand the reasoning around relative ordering, this design creates three concrete problems:

1. **Transparency**: Users cannot see why certain results ranked higher than others. The ordering alone gives no insight into the confidence gap between results.

2. **Debugging**: Diagnosing poor retrieval quality requires enabling `trace: true`, which dumps verbose internal debug data. This is not practical in production pipelines and leaks internal implementation details.

3. **Custom filtering**: Advanced users cannot implement threshold-based filtering logic. For example, in an agent system, I may want to discard results below a certain relevance confidence rather than blindly passing all `top_k` results to the LLM context.

Scores ARE already computed internally (cross-encoder score, temporal proximity boost, recency boost, final combined score) and ARE already exposed via `trace: true`. The gap is simply that there's no lightweight way to get the final combined score without opting into full trace mode.

### How This Feature Would Help

With this feature, I would be able to:

- **Filter low-confidence results** in my agent pipeline before passing context to an LLM, reducing hallucination risk from irrelevant memories
- **Understand retrieval quality** at a glance without needing to enable trace mode or parse verbose debug payloads
- **Implement dynamic `top_k` logic** — e.g., take up to 10 results but stop early if score drops below a threshold
- **Monitor retrieval health** in production by logging score distributions over time, enabling proactive detection of embedding drift or recall degradation
- **Build user-facing explanations** that show why a particular memory was surfaced in an AI application

**Proposed API change (non-breaking):**

Add an optional `include_scores` parameter (default: `false`) to the recall request. When enabled, each result object includes a `score` field with the final normalized combined score (0.0–1.0):

```json
{
  "results": [
    {
      "id": "...",
      "content": "...",
      "score": 0.87
    }
  ]
}
```

This is strictly additive — it does not change existing behavior and preserves the current default experience. Advanced users who want the full breakdown can still use `trace: true`.

### Proposed Solution

It would be great if Hindsight could expose scores in the recall response via an opt-in flag (e.g., `include_scores: true`). The score returned should be the final normalized combined score already computed by the ranking pipeline — no new computation needed.

Additionally, an optional `score_threshold` request parameter would allow server-side filtering, so users don't have to over-fetch and filter client-side:

```json
{
  "query": "...",
  "top_k": 10,
  "include_scores": true,
  "score_threshold": 0.6
}
```

This keeps the API ergonomic for simple use cases (no scores by default) while unlocking powerful patterns for advanced users building production agent systems.

### Alternatives Considered

_No response_

### Priority

Nice to have

### Additional Context

_No response_

### Checklist

- [ ] I would be willing to contribute this feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] ~ Expose retrieval scores in recall API response for transparency and threshold-based filtering #1624

Use Case

Problem Statement

How This Feature Would Help

Proposed Solution

Alternatives Considered

Priority

Additional Context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature] ~ Expose retrieval scores in recall API response for transparency and threshold-based filtering #1624

Description

Use Case

Problem Statement

How This Feature Would Help

Proposed Solution

Alternatives Considered

Priority

Additional Context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions