Skip to content

Trace latency: bound trace-id lookups by time #285

@thorrester

Description

@thorrester

Parent: #281

Goal

Make trace-id lookups avoid broad object-store scans by bounding the query to the smallest credible time window.

Trace-id lookup is an important user path and should not pay for scanning unrelated partitions when the system has enough metadata to infer where the trace can exist.

Scope

  • Add or use a bounded lookup path for trace-id requests.
  • Prefer recent time windows and known trace bounds before falling back to broader scans.
  • Keep correctness: if a trace exists outside the fast-path window and no bound is available, the system must still be able to find it through the fallback path.
  • Capture metrics for fast-path hits, fallback hits, misses, and lookup latency.

High-level design

The fast path should turn “find this trace somewhere in the lake” into “find this trace in a narrow set of partitions.” That reduces Delta file candidates, footer reads, bloom checks, and row-group reads.

Do not rely on bloom filters alone as the bounding mechanism. Bloom filters help after the candidate files are known; they do not replace a bounded time or partition search.

Acceptance criteria

  • Trace-id queries use a bounded lookup when enough information is available.
  • Fallback behavior preserves correctness for older or unknown traces.
  • Metrics distinguish fast-path, fallback, and miss behavior.
  • Benchmarks show object-store request counts for trace-id lookup before and after this change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions