Skip to content

Unsatisfactory behavior of snapshot selector #2689

@PeterF778

Description

@PeterF778

There are at least three issues with the mechanism for selecting traces for collecting "snapshots":

  • When dealing with distributed traces, the selection for the head service is inconsistent with the other services. Running a test where service A calls B, and choosing selection probability of 10%, I got the following results. Out of 4781 traces, service A was selected 480 times, and service B was selected 448 times. However, the number of traces where both services were selected was only 39.
  • Traces which originate from spans different than SERVER or CONSUMER (like resulting from POJO instrumentation) are never selected (however, their downstream calls may still be selected).
  • The selection algorithm for downstream services uses the same algorithm as TraceIdRatioBased sampler, which can lead to metrics skew if that sampler is actually used for sampling.

I believe the selection mechanism needs to be redesigned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions