Skip to content

[feature] ~ Expose retrieval scores in recall API response for transparency and threshold-based filtering #1624

@eight-atulya

Description

@eight-atulya

Use Case

I'm building an AI agent system that uses Hindsight's recall API for memory retrieval. My application needs to make intelligent decisions about which retrieved results to act on — not just consume everything returned, but selectively use results based on their confidence/relevance level. Today I have to enable trace: true and parse debug output just to get any score signal, which is verbose and not suitable for production use.

Problem Statement

Currently, the recall API does not return any numeric score alongside results. The documentation acknowledges that scores exist internally but intentionally omits them from the response, citing that "raw retrieval scores are not meaningful on an absolute scale." While I understand the reasoning around relative ordering, this design creates three concrete problems:

  1. Transparency: Users cannot see why certain results ranked higher than others. The ordering alone gives no insight into the confidence gap between results.

  2. Debugging: Diagnosing poor retrieval quality requires enabling trace: true, which dumps verbose internal debug data. This is not practical in production pipelines and leaks internal implementation details.

  3. Custom filtering: Advanced users cannot implement threshold-based filtering logic. For example, in an agent system, I may want to discard results below a certain relevance confidence rather than blindly passing all top_k results to the LLM context.

Scores ARE already computed internally (cross-encoder score, temporal proximity boost, recency boost, final combined score) and ARE already exposed via trace: true. The gap is simply that there's no lightweight way to get the final combined score without opting into full trace mode.

How This Feature Would Help

With this feature, I would be able to:

  • Filter low-confidence results in my agent pipeline before passing context to an LLM, reducing hallucination risk from irrelevant memories
  • Understand retrieval quality at a glance without needing to enable trace mode or parse verbose debug payloads
  • Implement dynamic top_k logic — e.g., take up to 10 results but stop early if score drops below a threshold
  • Monitor retrieval health in production by logging score distributions over time, enabling proactive detection of embedding drift or recall degradation
  • Build user-facing explanations that show why a particular memory was surfaced in an AI application

Proposed API change (non-breaking):

Add an optional include_scores parameter (default: false) to the recall request. When enabled, each result object includes a score field with the final normalized combined score (0.0–1.0):

{
  "results": [
    {
      "id": "...",
      "content": "...",
      "score": 0.87
    }
  ]
}

This is strictly additive — it does not change existing behavior and preserves the current default experience. Advanced users who want the full breakdown can still use trace: true.

Proposed Solution

It would be great if Hindsight could expose scores in the recall response via an opt-in flag (e.g., include_scores: true). The score returned should be the final normalized combined score already computed by the ranking pipeline — no new computation needed.

Additionally, an optional score_threshold request parameter would allow server-side filtering, so users don't have to over-fetch and filter client-side:

{
  "query": "...",
  "top_k": 10,
  "include_scores": true,
  "score_threshold": 0.6
}

This keeps the API ergonomic for simple use cases (no scores by default) while unlocking powerful patterns for advanced users building production agent systems.

Alternatives Considered

No response

Priority

Nice to have

Additional Context

No response

Checklist

  • I would be willing to contribute this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions