Skip to content

Commit 6dd456e

Browse files
authored
Added documentation page for traces processor (#1614)
* Added traces to pipeline main page and started on new documentation page for traces. Signed-off-by: Eric D. Schabell <eric@schabell.org> * Finished traces documentation, adding two images for the doc page. Signed-off-by: Eric D. Schabell <eric@schabell.org> * Fixed typo in tail sampling paragraph. Signed-off-by: Eric D. Schabell <eric@schabell.org> --------- Signed-off-by: Eric D. Schabell <eric@schabell.org>
1 parent 1744d81 commit 6dd456e

4 files changed

Lines changed: 379 additions & 0 deletions

File tree

imgs/traces_head_sampling.png

215 KB
Loading

imgs/traces_tail_sampling.png

326 KB
Loading

pipeline/processors/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ Fluent Bit offers the following processors:
1919
- [OpenTelemetry Envelope](opentelemetry-envelope.md): Transform logs into an
2020
OpenTelemetry-compatible format.
2121
- [SQL](sql.md): Use SQL queries to extract log content.
22+
- [Traces](traces.md): Trace sampling designed with a pluggable architecture,
23+
allowing easy extension to support multiple sampling strategies and backends.
2224
- [Filters](filters.md): Any filter can be used as a processor.
2325

2426
## Features

pipeline/processors/traces.md

Lines changed: 377 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,377 @@
1+
# Traces
2+
3+
The _Traces_ sampling processor is designed with a pluggable architecture, allowing easy extension to support multiple sampling strategies and backends. It provides you with the ability to apply head or tail sampling to incoming trace telemetry data.
4+
5+
Available samplers:
6+
7+
- `probabilistic` (head sampling)
8+
- `tail` (tail sampling)
9+
10+
Conditions:
11+
12+
- `latency`
13+
- `span_count`
14+
- `status_code`
15+
- `string_attribute`
16+
- `numeric_attribute`
17+
- `boolean_attribute`
18+
- `trace_state`
19+
20+
## Configuration Parameters
21+
22+
The processor does not provide any extra configuration parameter, it can be used directly in your _processors_ Yaml directive.
23+
24+
## Traces types
25+
26+
Traces have both a name and a type with the following possible settings:
27+
28+
| Key | Possible values |
29+
| :----- | :---------------------: |
30+
| `name` | `sampling` |
31+
| `type` | `probabilistic`, `tail` |
32+
33+
## Head sampling
34+
35+
In this example, head sampling will be used to process a smaller percentage of the overall ingested traces and spans. This is done by setting up the pipeline to ingest on the OpenTelemetry defined port as shown below using the OpenTelemetry Protocol (OTLP). The processor section defines traces for head sampling and the sampling percentage defining the total ingested traces and spans to be forwarded to the defined output plugins.
36+
37+
![](/imgs/traces_head_sampling.png)
38+
39+
| Sampling settings | Description |
40+
| :-------------------- | :------------------------------------------------------------------------------------------------------------------ |
41+
| `sampling_percentage` | This sets the probability of sampling trace, can be between 0-100%. For example, 40 samples 40% of traces randomly. |
42+
43+
**fluent-bit.yaml**
44+
45+
```yaml
46+
service:
47+
flush: 1
48+
log_level: info
49+
hot_reload: on
50+
51+
pipeline:
52+
inputs:
53+
- name: opentelemetry
54+
port: 4318
55+
56+
processors:
57+
traces:
58+
# Head sampling of traces (percentage)
59+
- name: sampling
60+
type: probabilistic
61+
sampling_settings:
62+
sampling_percentage: 40
63+
64+
outputs:
65+
- name: stdout
66+
match: "*"
67+
```
68+
69+
With this head sampling configuration, a sample set of ingested traces will randomly send 40% of the total traces to the standard output.
70+
71+
## Tail sampling
72+
73+
Tail sampling is used to obtain a more selective and fine grained control over the collection of traces and spans without collecting everything. Below is an example showing the process is a combination of waiting on making a sampling decision together followed by configuration defined conditions to determine the spans to be sampled.
74+
75+
![](/imgs/traces_tail_sampling.png)
76+
77+
The following samplings settings are available with their default values:
78+
79+
| Sampling settings | Description | Default value |
80+
| :---------------- | :------------------------------------------------------------------------------------------------------------------------- | :-----------: |
81+
| `decision_wait` | Specifies how long to buffer spans before making a sampling decision, allowing full trace evaluation. | 30s |
82+
| `max_traces` | Specifies the maximum number of traces that can be held in memory. When the limit is reached, the oldest trace is deleted. | |
83+
84+
The tail-based sampler supports various conditionals to sample traces if their spans meet a specific condition.
85+
86+
### Condition: latency
87+
88+
This condition samples traces based on span duration. It uses `threshold_ms_low` to capture short traces and `threshold_ms_high` for long traces.
89+
90+
| Condition settings | Description | Default value |
91+
| :------------------ | :------------------------------------------------------------------------------------------- | :-----------: |
92+
| `threshold_ms_low` | Specifies the lower latency threshold. Traces with a duration <= this value will be sampled. | 0 |
93+
| `threshold_ms_high` | Specifies the upper latency threshold. Traces with a duration >= this value will be sampled. | 0 |
94+
95+
**fluent-bit.yaml**
96+
97+
```yaml
98+
service:
99+
flush: 1
100+
log_level: info
101+
hot_reload: on
102+
103+
pipeline:
104+
inputs:
105+
- name: opentelemetry
106+
port: 4318
107+
108+
processors:
109+
traces:
110+
# Tail sampling of traces (latency)
111+
- name: sampling
112+
type: tail
113+
sampling_settings:
114+
decision_wait: 5s
115+
conditions:
116+
- type: latency
117+
threshold_ms_high: 200
118+
threshold_ms_high: 3000
119+
120+
outputs:
121+
- name: stdout
122+
match: "*"
123+
```
124+
125+
This tail-based sampling configuration waits 5 seconds before making a decision. It samples traces based on latency, capturing short traces of 200ms or less and long traces of 3000ms or more. Traces between 200ms and 3000ms are not sampled unless another condition applies.
126+
127+
### Condition: span_count
128+
129+
This condition samples traces that have specific span counts defined in a configurable range. It uses `min_spans` and `max_spans` to specify the number of spans a trace can have to be sampled.
130+
131+
| Condition settings | Description | Default value |
132+
| :----------------- | :--------------------------------------------------------------------- | :-----------: |
133+
| `max_spans` | Specifies the minimum number of spans a trace must have to be sampled. | |
134+
| `min_spans` | Specifies the maximum number of spans a trace can have to be sampled. | |
135+
136+
**fluent-bit.yaml**
137+
138+
```yaml
139+
service:
140+
flush: 1
141+
log_level: info
142+
hot_reload: on
143+
144+
pipeline:
145+
inputs:
146+
- name: opentelemetry
147+
port: 4318
148+
149+
processors:
150+
traces:
151+
# Tail sampling of traces (span_count)
152+
- name: sampling
153+
type: tail
154+
sampling_settings:
155+
decision_wait: 5s
156+
conditions:
157+
- type: span_count
158+
min_spans: 3
159+
max_spans: 5
160+
161+
outputs:
162+
- name: stdout
163+
match: "*"
164+
```
165+
166+
This tail-based sampling configuration waits 5 seconds before making a decision. It samples traces based on having a minimum of 3 spans and a maximum of 5 spans. Traces with less than 3 and more than 5 spans are not sampled unless another condition applies.
167+
168+
### Condition: status_code
169+
170+
This condition samples traces based on span status codes (`OK`, `ERROR`, `UNSET`).
171+
172+
| Condition settings | Description | Default value |
173+
| :----------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: |
174+
| `status_codes` | Defines an array of span status codes (`OK`, `ERROR`, `UNSET`) to filter traces. Traces are sampled if any span matches a listed status code. For example, `status_codes: [ERROR, UNSET]` captures traces with errors or unset statuses. | |
175+
176+
**fluent-bit.yaml**
177+
178+
```yaml
179+
service:
180+
flush: 1
181+
log_level: info
182+
hot_reload: on
183+
184+
pipeline:
185+
inputs:
186+
- name: opentelemetry
187+
port: 4318
188+
189+
processors:
190+
traces:
191+
# Tail sampling of traces (status_code)
192+
- name: sampling
193+
type: tail
194+
sampling_settings:
195+
decision_wait: 5s
196+
conditions:
197+
- type: status_code
198+
status_codes: [ERROR]
199+
200+
outputs:
201+
- name: stdout
202+
match: "*"
203+
```
204+
205+
With this tail-based sampling configuration, a sample set of ingested traces will select only the spans with status codes marked as `ERROR` to the standard output.
206+
207+
### Condition: string_attribute
208+
209+
This conditional allows traces to be sampled based on specific span or resource attributes. Users can define key-value filters (e.g., http.method=POST) to selectively capture relevant traces.
210+
211+
| Condition settings | Description | Default value |
212+
| :----------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: |
213+
| `key` | Specifies the span or resource attribute to match (e.g., "service.name"). | |
214+
| `values` | Defines an array of accepted values for the attribute. A trace is sampled if any span contains a matching key-value pair: `["payment-processing"]` | |
215+
| `match_type` | Defines how attributes are compared: `strict` ensures exact value matching, while `exists` checks if the attribute is present regardless of its value (note that string type is enforced) | `strict` |
216+
217+
**fluent-bit.yaml**
218+
219+
```yaml
220+
service:
221+
flush: 1
222+
log_level: info
223+
hot_reload: on
224+
225+
pipeline:
226+
inputs:
227+
- name: opentelemetry
228+
port: 4318
229+
230+
processors:
231+
traces:
232+
# Tail sampling of traces (string_attribute)
233+
- name: sampling
234+
type: tail
235+
sampling_settings:
236+
decision_wait: 2s
237+
conditions:
238+
- type: string_attribute
239+
match_type: strict
240+
key: "http.method"
241+
values: ["GET"]
242+
- type: string_attribute
243+
match_type: exists
244+
key: "service.name"
245+
246+
outputs:
247+
- name: stdout
248+
match: "*"
249+
```
250+
251+
This tail-based sampling configuration waits 2 seconds before making a decision. It samples traces based on string matching key value pairs. Traces are sampled if the key `http.method` is set to `GET` or if spans or resources have a key `service.name`.
252+
253+
### Condition: numeric_attribute
254+
255+
This condition samples traces based on numeric attribute values of a defined key where users can configure minimum and maximum thresholds.
256+
257+
| Condition settings | Description | Default value |
258+
| :----------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: |
259+
| `key` | Specifies the span or resource attribute to match (e.g., "service.name"). | |
260+
| `min_value` | The minimum inclusive value for the numeric attribute. Traces with values >= the `min_value` are sampled. | |
261+
| `max_value` | The maximum inclusive value for the numeric attribute. Traces with values <= the `max_value` are sampled. | |
262+
| `match_type` | This defines how attribute values are evaluated: `strict` matches exact values, `exists` checks if the attribute is present, regardless of its value. | `strict` |
263+
264+
**fluent-bit.yaml**
265+
266+
```yaml
267+
service:
268+
flush: 1
269+
log_level: info
270+
hot_reload: on
271+
272+
pipeline:
273+
inputs:
274+
- name: opentelemetry
275+
port: 4318
276+
277+
processors:
278+
traces:
279+
# Tail sampling of traces (status_code)
280+
- name: sampling
281+
type: tail
282+
sampling_settings:
283+
decision_wait: 5s
284+
conditions:
285+
- type: numeric_attribute
286+
key: "http.status_code"
287+
min_value: 400
288+
max_value: 504
289+
290+
outputs:
291+
- name: stdout
292+
match: "*"
293+
```
294+
295+
With this tail-based sampling configuration, a sample set of ingested traces will select only the spans with a key `http.status code` with numeric values between 400 and 504 inclusive.
296+
297+
### Condition: boolean_attribute
298+
299+
This condition samples traces based on a boolean attribute value of a defined key. This allows for selection of traces based on flags such as error indicators or debug modes.
300+
301+
| Condition settings | Description | Default value |
302+
| :----------------- | :------------------------------------------------------------------------ | :-----------: |
303+
| `key` | Specifies the span or resource attribute to match (e.g., "service.name"). | |
304+
| `value` | Expected boolean value: `true` or `false` | |
305+
306+
**fluent-bit.yaml**
307+
308+
```yaml
309+
service:
310+
flush: 1
311+
log_level: info
312+
hot_reload: on
313+
314+
pipeline:
315+
inputs:
316+
- name: opentelemetry
317+
port: 4318
318+
319+
processors:
320+
traces:
321+
# Tail sampling of traces (boolean_attribute)
322+
- name: sampling
323+
type: tail
324+
sampling_settings:
325+
decision_wait: 2s
326+
conditions:
327+
- type: boolean_attribute
328+
key: "user.logged"
329+
value: false
330+
331+
outputs:
332+
- name: stdout
333+
match: "*"
334+
```
335+
336+
This tail-based sampling configuration waits 2 seconds before making a decision. It samples traces that do not have the key `user.logged` set to true. Traces are sampled if the key `user.logged` is set to `true`.
337+
338+
### Condition: trace_state
339+
340+
This condition samples traces based on metadata stored int he W3C `trace_state` field.
341+
342+
| Condition settings | Description | Default value |
343+
| :----------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-----------: |
344+
| `values` | Defines a list of key, value pairs to match against the `trace_state`. A trace is sampled if any of the specified values exist in the `trace_state` field. Matching follows OR logic, meaning at least one value must be present for sampling to occur. | |
345+
346+
**fluent-bit.yaml**
347+
348+
```yaml
349+
service:
350+
flush: 1
351+
log_level: info
352+
hot_reload: on
353+
354+
pipeline:
355+
inputs:
356+
- name: opentelemetry
357+
port: 4318
358+
359+
processors:
360+
traces:
361+
# Tail sampling of traces (trace_state)
362+
- name: sampling
363+
type: tail
364+
sampling_settings:
365+
decision_wait: 2s
366+
conditions:
367+
- type: trace_state
368+
values: [debug=false, priority=high]
369+
370+
outputs:
371+
- name: stdout
372+
match: "*"
373+
```
374+
375+
This tail-based sampling configuration waits 2 seconds before making a decision. It samples traces that do not have the key `user.logged` set to true. Traces are sampled if the key `user.logged` is set to `true`.
376+
377+
For more details about further processing, read the [Content Modifier](../processors/content-modifier.md) processor documentation.

0 commit comments

Comments
 (0)