Skip to content

Commit fe5b806

Browse files
authored
BP-69: Adopt slog for structured logging (#4751)
Proposal to migrate BookKeeper from SLF4J to the slog structured-logging library (mirroring Pulsar PIP-467), and to add a public client API method withLoggerContext(Map<String, Object>) on CreateBuilder/OpenBuilder/ DeleteBuilder so callers can attach their own logging context attributes to a ledger Handle. Retires the existing bookkeeper-slogger module (5 internal call sites, no external adoption). Tracking issue: #4750
1 parent 7b362b1 commit fe5b806

2 files changed

Lines changed: 280 additions & 1 deletion

File tree

Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# BP-69: Adopt slog for structured logging
2+
3+
- Issue: [#4750](https://github.com/apache/bookkeeper/issues/4750)
4+
- State: Under Discussion
5+
- Release: N/A
6+
7+
### Motivation
8+
9+
BookKeeper today uses SLF4J across the codebase. Logs are unstructured lines with
10+
positional `{}` placeholders, and the same identity attributes (ledger id, bookie
11+
id, entry id, journal id, etc.) are repeated inline in every message string. This
12+
has three concrete consequences:
13+
14+
1. **Poor queryability in log backends.** Modern log pipelines (Loki, Elastic,
15+
Splunk, OpenSearch, GCP Logging, CloudWatch) are indexed on structured
16+
key/value attributes. Grepping `"ledgerId=12345"` out of free-form messages is
17+
slow, lossy, and breaks whenever a log message is reworded.
18+
2. **Boilerplate and inconsistency.** Every log statement restates the same
19+
identity context ("ledger 12345, entry 67, bookie 10.0.0.1:3181 ..."), with
20+
no consistent attribute naming. Two log sites that mean the same thing get
21+
formatted differently.
22+
3. **`isDebugEnabled()` boilerplate.** To avoid `String.format`/toString cost on
23+
hot paths with debug disabled, the code is sprinkled with explicit level
24+
guards (hundreds of them), which obscure the actual logic.
25+
26+
A similar analysis was done for Apache Pulsar in
27+
[PIP-467: Adopt slog for structured logging across Pulsar][pip-467]. PIP-467
28+
reached the same conclusion, selected the `slog` library, was accepted, and the
29+
migration of the entire Pulsar codebase is complete. Because Pulsar is
30+
BookKeeper's largest user, a shared structured logging substrate lets
31+
cross-project debugging chains stay structured and correlated end-to-end (more
32+
on this in the "Cross-boundary logger context" section below, which is the main
33+
API change this BP introduces for BookKeeper).
34+
35+
BookKeeper itself has a prior attempt at structured logging, the
36+
`bookkeeper-slogger` module (`Slogger` API with `kv(key, value)` and `ctx()`).
37+
It was introduced alongside `DbLedgerStorage`/`DirectEntryLogger` but was never
38+
adopted broadly: only **5 main-source files** in `bookkeeper-server` use it today
39+
(`DbLedgerStorage`, `DirectEntryLogger`, `DirectCompactionEntryLog`,
40+
`DirectWriter`, `EntryLogIdsImpl`). The API is clunky (no debug/trace levels,
41+
verbose `.kv().kv().info(...)` chains, no zero-overhead disabled levels), and
42+
the SLF4J bridge is the only handler implementation. It is effectively dead
43+
code. We propose to retire it.
44+
45+
### Public Interfaces
46+
47+
**New public interfaces** (`@Public`/`@Unstable`):
48+
49+
1. A `Map<String, Object>` parameter on the client-side builder API so the
50+
caller (Pulsar, a user application, etc.) can attach its own logging
51+
context attributes to a ledger `Handle`:
52+
- `CreateBuilder withLoggerContext(Map<String, Object> attrs)`
53+
- `OpenBuilder withLoggerContext(Map<String, Object> attrs)`
54+
- `DeleteBuilder withLoggerContext(Map<String, Object> attrs)`
55+
56+
When set, the resulting `WriteHandle`/`ReadHandle` binds those attributes
57+
onto its internal logger. All BookKeeper client log lines emitted on behalf
58+
of that handle automatically carry the caller's attributes — e.g. Pulsar's
59+
`managedLedger=<topic>` and `cursor=<subscription>` — without BookKeeper
60+
knowing anything about Pulsar concepts. Note that the public API does
61+
**not** expose slog types: callers pass a plain `Map`, so they are not
62+
required to take a slog dependency to use the feature.
63+
64+
2. A new compile-scope dependency: `io.github.merlimat.slog:slog` (currently
65+
`0.9.7`).
66+
67+
**Changed public interfaces**: none. SLF4J remains on the classpath during and
68+
after the migration as a transitive dependency, because slog ships an SLF4J
69+
handler (`Slf4jHandler`) that sends structured records to any existing SLF4J
70+
backend (Logback, Log4j2). Applications that currently configure BookKeeper
71+
logging via `logback.xml` / `log4j2.xml` continue to work unchanged.
72+
73+
**Removed public interfaces**:
74+
75+
- The `bookkeeper-slogger` module (`org.apache.bookkeeper.slogger.Slogger`,
76+
`AbstractSlogger`, `Sloggable`, `ConsoleSlogger`, `NullSlogger`, and the
77+
`bookkeeper-slogger-slf4j` bridge). It is `@Unstable` and has no adoption
78+
outside the 5 internal files listed above.
79+
80+
**Not changed**:
81+
82+
- The BookKeeper wire protocol, binary formats (journal, entry log, ledger
83+
index), metadata formats, metrics, and CLI are untouched.
84+
- MDC behavior (`MdcUtils`, MDC propagation in `OrderedExecutor`) is preserved;
85+
slog's SLF4J handler forwards structured attributes as MDC to the underlying
86+
SLF4J backend so that existing log appenders keep rendering them.
87+
88+
### Proposed Changes
89+
90+
#### 1. Add `slog` as a core dependency
91+
92+
- Add `io.github.merlimat.slog:slog:0.9.7` to root `pom.xml`
93+
`<dependencyManagement>` and to the global compile classpath, alongside the
94+
existing SLF4J API.
95+
- Add `lombok.config` at the repo root so `@CustomLog` generates a slog
96+
`Logger`:
97+
```
98+
lombok.log.custom.declaration = io.github.merlimat.slog.Logger \
99+
io.github.merlimat.slog.Logger.get(TYPE)
100+
```
101+
- Continue to ship the SLF4J handler at runtime so existing log configs (Logback
102+
/ Log4j2 XML) keep working.
103+
104+
#### 2. Migrate SLF4J call sites to slog, one module at a time
105+
106+
Phased conversion. Each phase is a single commit; the codebase must compile at
107+
the end of each phase.
108+
109+
| Phase | Scope |
110+
|-------|-------------------------------------------------------------|
111+
| 1 | `bookkeeper-common` |
112+
| 2 | `stats`, `bookkeeper-common-allocator` |
113+
| 3 | `bookkeeper-server` (core, largest phase) |
114+
| 4 | `bookkeeper-http`, `tools` |
115+
| 5 | `stream/distributedlog` |
116+
| 6 | `metadata-drivers`, `benchmark`, peripheral tests |
117+
| 7 | Cleanup: remove `bookkeeper-slogger`, remove `@Slf4j` usage |
118+
119+
Conversion rules:
120+
121+
- `@Slf4j``@CustomLog`; `LoggerFactory.getLogger(X.class)`
122+
`Logger.get(X.class)`; naming: static = `LOG`, instance-bound = `log`.
123+
- `log.info("text {} {}", a, b)` is **not** mechanically rewritten to `logf(...)`
124+
— instead convert parameters to structured attributes:
125+
`log.info().attr("a", a).attr("b", b).log("text")`.
126+
- Classes with identity (ledger id, bookie id, stream name, ...) build an
127+
instance-bound logger in the constructor and drop those attributes from every
128+
call site:
129+
```java
130+
this.log = Logger.get(PerChannelBookieClient.class)
131+
.with().attr("bookieId", bookieId).build();
132+
```
133+
- `if (log.isDebugEnabled())` guards are removed; slog returns a no-op `Event`
134+
for disabled levels so the `.attr()` chain is zero-overhead.
135+
- When a `.attr()` value is itself expensive to compute (e.g.
136+
`Iterables.size(iter)`, `Joiner.on(...).join(...)`,
137+
`Arrays.toString(...)`), use slog's lambda form so the lambda is only invoked
138+
when the level is enabled:
139+
```java
140+
log.debug(e -> e.attr("count", Iterables.size(newLocations))
141+
.log("Update locations"));
142+
```
143+
144+
#### 3. Cross-boundary logger context through the client API
145+
146+
This is the main API change. Today a caller (Pulsar, a user app, a CLI tool)
147+
that wants to correlate its own log lines with BookKeeper's log lines has no
148+
way to do so — BookKeeper's client code logs with its own loggers, and the
149+
caller's context (topic name, cursor name, request id, trace id) never makes
150+
it into those lines.
151+
152+
We add an optional `withLoggerContext(Map<String, Object>)` method on the
153+
ledger builder API. The caller provides a plain map of attribute names to
154+
values; slog does not appear in the public API surface:
155+
156+
```java
157+
// Pulsar ManagedLedgerImpl
158+
Map<String, Object> logCtx = Map.of(
159+
"managedLedger", name,
160+
"namespace", namespace);
161+
162+
CompletableFuture<WriteHandle> f = bookKeeper.newCreateLedgerOp()
163+
.withEnsembleSize(3)
164+
.withWriteQuorumSize(2)
165+
.withAckQuorumSize(2)
166+
.withLoggerContext(logCtx) // <-- new
167+
.execute();
168+
```
169+
170+
Inside BookKeeper, the `LedgerHandle` constructor binds those attributes onto
171+
its internal logger:
172+
173+
```java
174+
var builder = Logger.get(LedgerHandle.class).with();
175+
if (callerAttrs != null) {
176+
callerAttrs.forEach(builder::attr);
177+
}
178+
this.log = builder.attr("ledgerId", ledgerId).build();
179+
```
180+
181+
The net effect: a failed write on a bookie emits a log line like
182+
183+
```
184+
level=ERROR ledgerId=12345 entryId=67 bookieId=bk-3:3181 \
185+
managedLedger=persistent://public/default/my-topic \
186+
"Write to bookie failed"
187+
```
188+
189+
without any BookKeeper code referencing the `managedLedger` concept. The
190+
propagation is one-way (caller context flows in; BookKeeper-only attributes do
191+
not leak out), and `withLoggerContext` is optional — if unset, BookKeeper uses
192+
its own `Logger.get(...)` as before and behavior is unchanged.
193+
194+
Scope for this BP: `CreateBuilder`, `OpenBuilder`, `DeleteBuilder`. The same
195+
mechanism can be extended to `newListLedgersOp` / admin APIs in follow-up
196+
changes.
197+
198+
#### 4. Retire `bookkeeper-slogger`
199+
200+
In the cleanup phase:
201+
202+
- Convert the 5 remaining `Slogger` call sites in
203+
`bookkeeper-server` to the new slog API (they are already structured-event
204+
loggers; the translation is mechanical).
205+
- Delete the `bookkeeper-slogger/` module (API + SLF4J bridge + tests).
206+
- Remove it from the root `pom.xml` reactor.
207+
208+
### Compatibility, Deprecation, and Migration Plan
209+
210+
- **End-user impact for the log output format**: with the default SLF4J
211+
handler, the rendered log lines continue to come from the user's existing
212+
Logback/Log4j2 pattern. Structured attributes are forwarded as MDC, so a
213+
layout like `%X{ledgerId} %msg` picks them up automatically. No log config
214+
change is required to upgrade.
215+
- **End-user impact for the client API**: additive only. `withLoggerContext(...)`
216+
is a new method on `CreateBuilder` / `OpenBuilder` / `DeleteBuilder`; existing
217+
callers that don't use it see no behavioral change. Its parameter is a
218+
standard JDK `Map<String, Object>`, so callers are not required to take a
219+
slog dependency to use the feature.
220+
- **Dependency impact**: applications get `io.github.merlimat.slog:slog` as a
221+
new transitive dependency. slog has no transitive dependencies of its own
222+
beyond an optional SLF4J handler (runtime scope).
223+
- **`bookkeeper-slogger` removal**: `@Unstable`, 5 internal call sites, zero
224+
external adoption observed. We will deprecate in the release that begins the
225+
migration and remove after Phase 7 lands. Users who happen to depend on it
226+
can migrate to the published slog API, which is a superset.
227+
- **SLF4J deprecation**: **not** part of this BP. SLF4J stays for the
228+
foreseeable future as the rendering backend behind slog.
229+
- **Rollout**: each phase is independently mergeable and independently
230+
revertable. There is no flag day.
231+
232+
### Test Plan
233+
234+
- Each phase preserves the existing unit/integration test suite unchanged;
235+
"the tests still pass" is the baseline success criterion per phase.
236+
- For the new `withLoggerContext(Map<String, Object>)` API, add unit tests
237+
that:
238+
1. Verify the log events emitted by a `LedgerHandle` carry the attributes
239+
supplied via the map (use a capturing slog handler in tests).
240+
2. Verify the default (no `withLoggerContext` call) behavior is identical
241+
to pre-BP, including logger name.
242+
- For log-format compatibility, a small integration check in
243+
`bookkeeper-server` that wires the slog SLF4J handler to Logback and asserts
244+
that MDC-styled patterns render structured attributes.
245+
- No wire/format/metric tests change — none of those are touched.
246+
247+
### Rejected Alternatives
248+
249+
1. **Stay on SLF4J, adopt a convention.** Rejected: SLF4J's `{}` format has no
250+
attribute names, no composable context (beyond MDC, which is thread-local
251+
and brittle across async/executor boundaries), and no zero-overhead disabled
252+
levels. Cross-project correlation with Pulsar would not be possible in a
253+
structured way.
254+
2. **Expand `bookkeeper-slogger` to cover the whole codebase.** Rejected: the
255+
API surface is small and missing critical features (`debug`/`trace`,
256+
lambda-deferred attribute evaluation, per-instance context via
257+
`with().build()`). It has no external users. Hardening it would mean
258+
reinventing slog while duplicating work with Pulsar's choice. Cheaper to
259+
retire it.
260+
3. **Use Log4j2's `ThreadContext` / structured API directly.** Rejected: binds
261+
BookKeeper to a specific backend (we currently support Logback, Log4j2, and
262+
others via SLF4J). slog keeps the rendering backend pluggable.
263+
4. **OpenTelemetry Logs Bridge API.** Rejected for this BP: OTel logs are still
264+
stabilizing, and binding the in-process logging API to an OTel SDK is a much
265+
larger commitment (and dependency footprint) than what is needed to solve
266+
the stated problem. Nothing in this BP precludes later adding an OTel
267+
handler to slog.
268+
5. **Expose a slog `Logger` parameter on the builder (e.g.
269+
`withLogger(io.github.merlimat.slog.Logger)`) instead of a
270+
`Map<String, Object>`.** Rejected: it would force every BookKeeper client
271+
that wants to propagate logging context to take a direct compile dependency
272+
on slog and to construct a slog `Logger` before calling the client API. The
273+
map-of-attributes form keeps slog out of the public API surface while giving
274+
the caller everything they actually need (a bag of key/value pairs to bind
275+
to the handle's logger). Nothing prevents a caller that already uses slog
276+
from deriving the map from an existing logger's attributes.
277+
278+
[pip-467]: https://github.com/apache/pulsar/blob/master/pip/pip-467.md

site3/website/src/pages/community/bookkeeper-proposals.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ using Google Doc.
8383

8484
This section lists all the _bookkeeper proposals_ made to BookKeeper.
8585

86-
*Next Proposal Number: 43*
86+
*Next Proposal Number: 70*
8787

8888
### Inprogress
8989

@@ -110,6 +110,7 @@ Proposal | State
110110
[BP-41: Separate BookieId from Separate BookieId from Bookie Network Address](../bps/BP-41-bookieid) | Accepted
111111
[BP-42: New Client API - list ledgers](../bps/BP-42-new-api-list-ledgers) | Accepted
112112
[BP-43: Migration to gradle](../bps/BP-43-gradle-migration) | Draft
113+
[BP-69: Adopt slog for structured logging](../bps/BP-69-slog-structured-logging) | Under Discussion
113114

114115

115116
### Adopted

0 commit comments

Comments
 (0)