Skip to content

Commit 4e5c520

Browse files
committed
platform wife log level control
Signed-off-by: adarsh0728 <gooneriitk@gmail.com>
1 parent f7769a6 commit 4e5c520

6 files changed

Lines changed: 174 additions & 61 deletions

File tree

docs/development/debugging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Controlling Log Level
44

5-
For a full reference on log-level controls — including which pods are affected, YAML snippets for every component, and `RUST_LOG` for Rust data-plane pods — see [Log Levels](../user-guide/reference/configuration/log-levels.md).
5+
For a full reference on log-level controls — including which pods are affected, YAML snippets for every component, and advanced `RUST_LOG` filtering for data-plane pods — see [Log Levels](../user-guide/reference/configuration/log-levels.md).
66

77
## Debug Logs
88

docs/user-guide/reference/configuration/environment-variables.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44

55
Numaflow exposes three env vars for controlling log verbosity across its pods:
66

7-
- `NUMAFLOW_LOG_LEVEL` — sets the log level for all **Go** components (daemon, controller, webhook, UX server, ISB service jobs). Accepts any [zapcore level](https://pkg.go.dev/go.uber.org/zap/zapcore#Level) (`debug`, `info`, `warn`, `error`, etc.). Overrides the level implied by `NUMAFLOW_DEBUG`. Invalid values are silently ignored.
8-
- `RUST_LOG`sets the log level for all **Rust** data-plane pods (vertex `numa` container, MonoVertex `numa` container, serving pods). Accepts standard [`tracing-subscriber` EnvFilter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) (e.g. `warn`, `numaflow_core=debug,info`).
9-
- `NUMAFLOW_DEBUG` — development shortcut honored by both runtimes; sets level to `debug` and switches log output from JSON to human-readable text. **Note:** the format change may break log shippers expecting JSON — prefer `NUMAFLOW_LOG_LEVEL` or `RUST_LOG` when only the level needs changing.
7+
- `NUMAFLOW_LOG_LEVEL` — sets the log level (`debug`, `info`, `warn`, `error`) for Numaflow-owned components. Overrides the level implied by `NUMAFLOW_DEBUG`.
8+
- `RUST_LOG`advanced override for data-plane pods (vertex `numa` container, MonoVertex `numa` container, serving pods). Accepts standard [`tracing-subscriber` EnvFilter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) (e.g. `warn`, `numaflow_core=debug,info`) and takes precedence over `NUMAFLOW_LOG_LEVEL`.
9+
- `NUMAFLOW_DEBUG` — development shortcut; sets level to `debug` and may switch log output from JSON to human-readable text. **Note:** the format change may break log shippers expecting JSON — prefer `NUMAFLOW_LOG_LEVEL` when only the level needs changing.
1010

1111
See [Log Levels](log-levels.md) for a full pod inventory, per-component YAML examples, and common recipes.
1212

docs/user-guide/reference/configuration/log-levels.md

Lines changed: 44 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,36 @@
11
# Log Levels
22

3-
Numaflow pods use two separate logging systems depending on whether they run Go or Rust code. This page explains how to control the log level for each.
3+
Numaflow-owned pods use `NUMAFLOW_LOG_LEVEL` as the standard log-level control. Data-plane pods also support `RUST_LOG` for advanced filtering.
44

55
## Quick reference
66

7-
| Pod / container | Runtime | Log-level env var | Default level |
8-
|---|---|---|---|
9-
| Pipeline daemon | Go | `NUMAFLOW_LOG_LEVEL` | `info` |
10-
| MonoVertex daemon | Go | `NUMAFLOW_LOG_LEVEL` | `info` |
11-
| ISB svc create / delete job | Go | `NUMAFLOW_LOG_LEVEL` | `info` |
12-
| ISB svc validate (init container) | Go | `NUMAFLOW_LOG_LEVEL` | `info` |
13-
| Controller (`numaflow-controller`) | Go | `NUMAFLOW_LOG_LEVEL` | `info` |
14-
| Webhook (`numaflow-webhook`) | Go | `NUMAFLOW_LOG_LEVEL` | `info` |
15-
| UX server (`numaflow-server`) | Go | `NUMAFLOW_LOG_LEVEL` | `info` |
16-
| Pipeline vertex `numa` container | Rust | `RUST_LOG` | `info` |
17-
| MonoVertex `numa` container | Rust | `RUST_LOG` | `info` |
18-
| Serving pod | Rust | `RUST_LOG` | `info` |
19-
| InterStepBufferService (JetStream / Redis) | upstream image | n/a | n/a |
20-
21-
`NUMAFLOW_DEBUG=true` is honored by **both** runtimes but has different effects (see [below](#numaflow_debug-interaction)).
7+
| Pod / container | Standard log-level env var | Default level |
8+
|---|---|---|
9+
| Pipeline daemon | `NUMAFLOW_LOG_LEVEL` | `info` |
10+
| MonoVertex daemon | `NUMAFLOW_LOG_LEVEL` | `info` |
11+
| ISB svc create / delete job | `NUMAFLOW_LOG_LEVEL` | `info` |
12+
| ISB svc validate (init container) | `NUMAFLOW_LOG_LEVEL` | `info` |
13+
| Controller (`numaflow-controller`) | `NUMAFLOW_LOG_LEVEL` | `info` |
14+
| Webhook (`numaflow-webhook`) | `NUMAFLOW_LOG_LEVEL` | `info` |
15+
| UX server (`numaflow-server`) | `NUMAFLOW_LOG_LEVEL` | `info` |
16+
| Pipeline vertex `numa` container | `NUMAFLOW_LOG_LEVEL` | `info` |
17+
| MonoVertex `numa` container | `NUMAFLOW_LOG_LEVEL` | `info` |
18+
| Serving pod | `NUMAFLOW_LOG_LEVEL` | `info` |
19+
| InterStepBufferService (JetStream / Redis) | n/a | n/a |
20+
21+
`NUMAFLOW_DEBUG=true` is also supported as a development shortcut (see [below](#numaflow_debug-interaction)).
2222

2323
---
2424

25-
## Go components`NUMAFLOW_LOG_LEVEL`
25+
## Standard log levels`NUMAFLOW_LOG_LEVEL`
2626

27-
All Go binaries (daemon, controller, webhook, UX server, ISB service jobs) use the shared `NewLogger()` helper, which reads `NUMAFLOW_LOG_LEVEL` at startup.
27+
Numaflow-owned components read `NUMAFLOW_LOG_LEVEL` at startup.
2828

29-
**Accepted values:** any level recognized by [go.uber.org/zap/zapcore](https://pkg.go.dev/go.uber.org/zap/zapcore#Level): `debug`, `info`, `warn`, `error`, `dpanic`, `panic`, `fatal`. In practice `debug`, `info`, `warn`, and `error` are the useful operational values.
29+
**Accepted values:** `debug`, `info`, `warn`, `error`.
3030

3131
**Default:** `info`
3232

33-
**Precedence:** `NUMAFLOW_LOG_LEVEL` overrides the level implied by `NUMAFLOW_DEBUG`. Invalid values are silently ignored and the default level is used instead.
33+
**Precedence:** `NUMAFLOW_LOG_LEVEL` overrides the level implied by `NUMAFLOW_DEBUG`. Invalid values fall back to the level selected by `NUMAFLOW_DEBUG` or the default. For data-plane pods, `RUST_LOG` takes precedence over `NUMAFLOW_LOG_LEVEL` when set.
3434

3535
### Pipeline daemon pod
3636

@@ -88,11 +88,11 @@ spec:
8888

8989
---
9090

91-
## Rust components — `RUST_LOG`
91+
## Pipeline, MonoVertex, and Serving pods
9292

93-
Pipeline vertex pods, MonoVertex pods, and Serving pods run the Numaflow Rust data-plane binary. These use the [`tracing-subscriber`](https://docs.rs/tracing-subscriber) `EnvFilter`, which reads `RUST_LOG` at startup.
93+
Pipeline vertex pods, MonoVertex pods, and Serving pods use `NUMAFLOW_LOG_LEVEL` for common log-level cases:
9494

95-
**Accepted values:** standard `EnvFilter` syntax — simple level names (`debug`, `info`, `warn`, `error`) or per-crate directives (`numaflow_core=debug,h2=warn,info`).
95+
**Accepted values:** `debug`, `info`, `warn`, `error`.
9696

9797
**Default:** `info`
9898

@@ -106,17 +106,10 @@ spec:
106106
- name: my-vertex
107107
containerTemplate:
108108
env:
109-
- name: RUST_LOG
109+
- name: NUMAFLOW_LOG_LEVEL
110110
value: warn
111111
```
112112

113-
To enable debug logs for a specific crate only:
114-
115-
```yaml
116-
- name: RUST_LOG
117-
value: "numaflow_core=debug,info"
118-
```
119-
120113
### MonoVertex pod
121114

122115
```yaml
@@ -125,7 +118,7 @@ kind: MonoVertex
125118
spec:
126119
containerTemplate:
127120
env:
128-
- name: RUST_LOG
121+
- name: NUMAFLOW_LOG_LEVEL
129122
value: warn
130123
```
131124

@@ -138,25 +131,31 @@ spec:
138131
serving:
139132
containerTemplate:
140133
env:
141-
- name: RUST_LOG
134+
- name: NUMAFLOW_LOG_LEVEL
142135
value: warn
143136
```
144137

138+
### Advanced data-plane filtering — `RUST_LOG`
139+
140+
Data-plane pods also support standard [`tracing-subscriber` EnvFilter syntax](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) via `RUST_LOG`. Use this only when you need fine-grained filtering. When `RUST_LOG` is set, it takes precedence over `NUMAFLOW_LOG_LEVEL`.
141+
142+
For example, to enable debug logs for a specific target only:
143+
144+
```yaml
145+
# Pipeline.spec.vertices[].containerTemplate.env
146+
- name: RUST_LOG
147+
value: "numaflow_core=debug,info"
148+
```
149+
145150
---
146151

147152
## `NUMAFLOW_DEBUG` interaction
148153

149-
`NUMAFLOW_DEBUG=true` is a development shortcut. Its effects differ by runtime:
150-
151-
| Effect | Go | Rust |
152-
|---|---|---|
153-
| Log level | Lowered to `debug` | Lowered to `debug` (plus `h2::codec=info`) |
154-
| Log format | Switches from JSON to console (human-readable) | Switches from JSON to human-readable text |
155-
| Stacktraces | Added at `warn`+ (vs `error`+ by default) | n/a |
154+
`NUMAFLOW_DEBUG=true` is a development shortcut. It lowers the default log level to `debug` and may switch log output from structured JSON to human-readable text.
156155

157-
**Important:** switching from JSON to text format on the Rust side (`NUMAFLOW_DEBUG=true`) may break log shippers or aggregators that expect structured JSON. Prefer `RUST_LOG=debug` to lower the level without changing the output format.
156+
**Important:** switching from JSON to text format may break log shippers or aggregators that expect structured JSON. Prefer `NUMAFLOW_LOG_LEVEL=debug` to lower the level without changing the output format.
158157

159-
`NUMAFLOW_LOG_LEVEL` overrides the level for Go components regardless of `NUMAFLOW_DEBUG`. There is no equivalent override on the Rust side — use `RUST_LOG` instead.
158+
`NUMAFLOW_LOG_LEVEL` overrides the level implied by `NUMAFLOW_DEBUG` without changing the format selected by `NUMAFLOW_DEBUG`. For data-plane pods, `RUST_LOG` takes precedence over `NUMAFLOW_LOG_LEVEL` when set.
160159

161160
---
162161

@@ -169,18 +168,18 @@ spec:
169168
value: warn
170169
```
171170

172-
**Enable debug for a single Rust crate without flooding all logs:**
171+
**Enable debug for a single data-plane target without flooding all logs:**
173172
```yaml
174173
# Pipeline.spec.vertices[].containerTemplate.env
175174
- name: RUST_LOG
176175
value: "numaflow_core=debug,info"
177176
```
178177

179-
**Enable full debug on a vertex pod (both Go sidecar and Rust data-plane):**
178+
**Enable full debug on a vertex pod:**
180179
```yaml
181180
# Pipeline.spec.vertices[].containerTemplate.env
182181
- name: NUMAFLOW_DEBUG
183182
value: "true"
184183
# Note: this also switches log output from JSON to text.
185-
# To keep JSON format while lowering the Rust level, use RUST_LOG=debug instead.
184+
# To keep JSON format while lowering the level, use NUMAFLOW_LOG_LEVEL=debug instead.
186185
```

pkg/shared/logging/log.go

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,30 +18,39 @@ package logging
1818

1919
import (
2020
"context"
21+
"fmt"
2122
"os"
23+
"strings"
2224

2325
zap "go.uber.org/zap"
2426
"go.uber.org/zap/zapcore"
2527
)
2628

29+
const (
30+
envDebug = "NUMAFLOW_DEBUG"
31+
envLogLevel = "NUMAFLOW_LOG_LEVEL"
32+
)
33+
2734
// NewLogger returns a new zap.SugaredLogger.
2835
// Log level can be overridden at runtime via the NUMAFLOW_LOG_LEVEL env var
29-
// (accepts any zapcore level: debug, info, warn, error, dpanic, panic, fatal).
36+
// (accepted values: debug, info, warn, error).
3037
// NUMAFLOW_DEBUG=true selects the development preset (console encoder, debug level).
31-
// NUMAFLOW_LOG_LEVEL overrides the level chosen by NUMAFLOW_DEBUG; invalid values are silently ignored.
38+
// NUMAFLOW_LOG_LEVEL overrides the level chosen by NUMAFLOW_DEBUG.
3239
func NewLogger() *zap.SugaredLogger {
3340
var config zap.Config
34-
debugMode, ok := os.LookupEnv("NUMAFLOW_DEBUG")
41+
debugMode, ok := os.LookupEnv(envDebug)
3542
if ok && debugMode == "true" {
3643
config = zap.NewDevelopmentConfig()
3744
} else {
3845
config = zap.NewProductionConfig()
3946
}
4047
// NUMAFLOW_LOG_LEVEL overrides the level set by the preset above.
41-
// Invalid values are silently ignored so a typo in a manifest does not crash the pod.
42-
if lvlStr, ok := os.LookupEnv("NUMAFLOW_LOG_LEVEL"); ok {
43-
if lvl, err := zapcore.ParseLevel(lvlStr); err == nil {
48+
// Invalid values fall back to the preset level so a typo does not crash the pod.
49+
if lvlStr, ok := os.LookupEnv(envLogLevel); ok && strings.TrimSpace(lvlStr) != "" {
50+
if lvl, ok := parseLogLevel(lvlStr); ok {
4451
config.Level = zap.NewAtomicLevelAt(lvl)
52+
} else {
53+
_, _ = fmt.Fprintf(os.Stderr, "invalid %s=%q, using default log level\n", envLogLevel, lvlStr)
4554
}
4655
}
4756
config.EncoderConfig.EncodeTime = zapcore.RFC3339NanoTimeEncoder
@@ -53,6 +62,21 @@ func NewLogger() *zap.SugaredLogger {
5362
return logger.Named("numaflow").Sugar()
5463
}
5564

65+
func parseLogLevel(level string) (zapcore.Level, bool) {
66+
switch strings.ToLower(strings.TrimSpace(level)) {
67+
case "debug":
68+
return zapcore.DebugLevel, true
69+
case "info":
70+
return zapcore.InfoLevel, true
71+
case "warn":
72+
return zapcore.WarnLevel, true
73+
case "error":
74+
return zapcore.ErrorLevel, true
75+
default:
76+
return zapcore.InfoLevel, false
77+
}
78+
}
79+
5680
type loggerKey struct{}
5781

5882
// WithLogger returns a copy of parent context in which the

pkg/shared/logging/log_test.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ func TestNewLogger_LogLevel(t *testing.T) {
3939
{"debug", zapcore.DebugLevel},
4040
{"info", zapcore.InfoLevel},
4141
{"warn", zapcore.WarnLevel},
42+
{"WARN", zapcore.WarnLevel},
4243
{"error", zapcore.ErrorLevel},
4344
}
4445
for _, tt := range tests {
@@ -87,3 +88,13 @@ func TestNewLogger_LogLevelOverridesDebugPreset(t *testing.T) {
8788
t.Error("expected warn to be enabled")
8889
}
8990
}
91+
92+
func TestParseLogLevelRejectsRuntimeSpecificLevels(t *testing.T) {
93+
for _, level := range []string{"dpanic", "panic", "fatal"} {
94+
t.Run(level, func(t *testing.T) {
95+
if _, ok := parseLogLevel(level); ok {
96+
t.Fatalf("expected %q to be rejected", level)
97+
}
98+
})
99+
}
100+
}

rust/numaflow/src/setup_tracing.rs

Lines changed: 85 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ use tracing_subscriber::{Layer, filter::EnvFilter, fmt};
77
use std::backtrace::{Backtrace, BacktraceStatus};
88
use std::panic::PanicHookInfo;
99

10+
const DEFAULT_LOG_LEVEL: &str = "info";
11+
const DEBUG_LOG_LEVEL: &str = "debug,h2::codec=info";
12+
const NUMAFLOW_LOG_LEVEL_ENV: &str = "NUMAFLOW_LOG_LEVEL";
13+
const RUST_LOG_ENV: &str = "RUST_LOG";
14+
1015
/// Panic hook to send panic info to `tracing` instead of stderr.
1116
/// Without this, a panic will be logged to stderr as:
1217
/// ```
@@ -196,19 +201,55 @@ impl Drop for TracerProviderGuard {
196201
}
197202
}
198203

204+
fn parse_numaflow_log_level(level: &str) -> Option<&'static str> {
205+
match level.trim().to_ascii_lowercase().as_str() {
206+
"debug" => Some("debug"),
207+
"info" => Some("info"),
208+
"warn" => Some("warn"),
209+
"error" => Some("error"),
210+
_ => None,
211+
}
212+
}
213+
214+
fn default_log_directive(
215+
debug_mode: bool,
216+
numaflow_log_level: Option<&str>,
217+
rust_log_set: bool,
218+
) -> &'static str {
219+
if rust_log_set {
220+
return DEFAULT_LOG_LEVEL;
221+
}
222+
223+
if let Some(level) = numaflow_log_level
224+
&& !level.trim().is_empty()
225+
{
226+
if let Some(parsed_level) = parse_numaflow_log_level(level) {
227+
return parsed_level;
228+
}
229+
eprintln!(
230+
"[setup_tracing] Invalid {NUMAFLOW_LOG_LEVEL_ENV}='{level}', using default log level"
231+
);
232+
}
233+
234+
if debug_mode {
235+
DEBUG_LOG_LEVEL
236+
} else {
237+
DEFAULT_LOG_LEVEL
238+
}
239+
}
240+
199241
/// Initialize the tracing subscriber with optional OTLP export.
200242
/// Returns a `TracerProviderGuard` that will flush buffered spans on drop.
201243
/// Callers must bind it (e.g., `let _guard = register();`) rather than
202244
/// discard it, or the provider will shut down immediately.
203245
pub fn register() -> TracerProviderGuard {
204246
let debug_mode = std::env::var("NUMAFLOW_DEBUG").is_ok_and(|v| v.to_lowercase() == "true");
205-
let default_log_level = if debug_mode {
206-
"debug,h2::codec=info" // "h2::codec" is too noisy
207-
} else {
208-
"info"
209-
};
247+
let rust_log_set = std::env::var(RUST_LOG_ENV).is_ok_and(|v| !v.trim().is_empty());
248+
let numaflow_log_level = std::env::var(NUMAFLOW_LOG_LEVEL_ENV).ok();
249+
let default_log_level =
250+
default_log_directive(debug_mode, numaflow_log_level.as_deref(), rust_log_set);
210251

211-
// Build filtering from default directives and allow `RUST_LOG` environment variable to override.
252+
// Build filtering from Numaflow defaults and allow `RUST_LOG` to override with EnvFilter syntax.
212253
let filter = EnvFilter::builder()
213254
.with_default_directive(default_log_level.parse().unwrap_or(Level::INFO.into()))
214255
.from_env_lossy();
@@ -303,6 +344,44 @@ mod tests {
303344
))
304345
}
305346

347+
#[test]
348+
fn default_log_directive_uses_info_by_default() {
349+
assert_eq!(default_log_directive(false, None, false), "info");
350+
}
351+
352+
#[test]
353+
fn default_log_directive_uses_debug_mode_default() {
354+
assert_eq!(
355+
default_log_directive(true, None, false),
356+
"debug,h2::codec=info"
357+
);
358+
}
359+
360+
#[test]
361+
fn default_log_directive_uses_numaflow_log_level() {
362+
assert_eq!(default_log_directive(false, Some("warn"), false), "warn");
363+
assert_eq!(default_log_directive(false, Some("ERROR"), false), "error");
364+
}
365+
366+
#[test]
367+
fn default_log_directive_numaflow_log_level_overrides_debug() {
368+
assert_eq!(default_log_directive(true, Some("warn"), false), "warn");
369+
}
370+
371+
#[test]
372+
fn default_log_directive_invalid_numaflow_log_level_falls_back() {
373+
assert_eq!(default_log_directive(false, Some("verbose"), false), "info");
374+
assert_eq!(
375+
default_log_directive(true, Some("verbose"), false),
376+
"debug,h2::codec=info"
377+
);
378+
}
379+
380+
#[test]
381+
fn default_log_directive_rust_log_takes_precedence() {
382+
assert_eq!(default_log_directive(true, Some("warn"), true), "info");
383+
}
384+
306385
#[test]
307386
fn sampler_always_on() {
308387
assert!(matches!(

0 commit comments

Comments
 (0)