This file is the human-readable control-plane guide for the DBLog paper implementation in this repository.
The control plane is a local embedded HTTP server inside the long-running DBLog process. It exists to expose DBLog-owned runtime state and a modest set of operator actions without requiring direct database inspection or log scraping.
Current shipped scope:
- runtime snapshot, health, detailed status, persisted schemas, and schema issues,
- curated metrics,
- event summary, recent events, and per-table event drill-down,
- request listing and request detail,
- request submission,
- source flow-control / backpressure visibility for the live runtime.
The control plane intentionally does not expose pause, resume, or cancel
endpoints. To pause ingest, stop the DBLog process; to resume, start it again.
The embedded state store is crash-safe (see §8), so a restart picks up at the
last completed chunk without re-reading data from the source.
Not part of the current shipped slice:
- remote or multi-host control,
- internet-facing authn/authz,
- durable event-history storage for the UI.
- the operator-facing control plane is local-only,
- host-run examples should bind directly to loopback (
127.0.0.1by default), - the packaged Docker example binds
0.0.0.0inside the container so Docker port publishing works, but publishes127.0.0.1:8085on the host, - remote or internet-facing exposure is out of scope,
- it is disabled by default until explicitly enabled,
- request handling now uses a bounded executor plus a bounded request-body size limit rather than an unbounded cached thread pool or unbounded body reads,
- it is intentionally a single versioned API rooted at
/api/v1.
Namespaces:
/api/v1/runtime/api/v1/requests/api/v1/metrics/api/v1/events
Current durable request states:
QUEUEDACTIVECOMPLETEDFAILED
Current operator actions:
- submit
Important semantics:
- new submissions start as
QUEUED, - the runtime transitions work to
ACTIVE, COMPLETEDandFAILEDare terminal,- chunk-level progress is persisted at batch boundaries, so a process restart
resumes
ACTIVEwork from the last completed chunk without re-reading.
The state store keeps every request durably until a newer same-scope, same-table
COMPLETED request supersedes it. This bounds the on-disk request log without
losing useful operator history:
- when a
TABLEorPRIMARY_KEYSrequest reachesCOMPLETED, every other terminal-state (COMPLETEDorFAILED) request that targets the same(databaseName, schemaName, tableName)and was submitted earlier is pruned from the state store atomically — including any dependent rows in the primary-key, missing-key, and chunk-progress tables; - when an
ALL_TABLESrequest reachesCOMPLETED, every prior terminal-stateALL_TABLESrequest is pruned; - pruning never crosses scope: a
TABLEcompletion does not delete priorALL_TABLEShistory, and vice versa; - pruning is gated on
COMPLETEDonly — aFAILEDrequest never deletes history. Failures stay visible until a same-scope, same-table successful completion replaces them; - the just-completed request itself is preserved, and
ACTIVErequests are never touched even when their sequence number is older.
Operator implication: GET /api/v1/requests is not a durable audit trail of
every request ever submitted. It is the live + most-recent terminal state
keyed by scope and table. Export historical entries before the next successful
completion if you need long-term records.
HTTP request submission requires:
- request submission to be available from the current runtime,
- that runtime to have started with an explicit sink configuration; DBLog does not auto-install an implicit discard sink,
- an H2 driver on the runtime path,
- one resolved state path from:
dblog.runtime.state-pathdblog.scenario.state-path
Use the runtime endpoints for:
- compact runtime summary,
- health probes,
- more detailed status,
- persisted schemas and schema issues,
- source flow-control visibility such as bounded queue depth, paused source fetching, and pressure diagnosis.
Relevant runtime properties for the local HTTP server include:
dblog.control-plane.executor-max-threadsdblog.control-plane.executor-queue-capacitydblog.control-plane.max-request-body-bytes
Use the request endpoints for:
- filtered request listings,
- request detail,
- new request submission.
Current HTTP shape rules:
- DBLog assigns the durable numeric
requestIdwhen a submission is accepted, ALL_TABLESsubmissions must not includetableorprimaryKeyLiterals,TABLEsubmissions must includetableand must not includeprimaryKeyLiterals,PRIMARY_KEYSsubmissions must includetableplus at least oneprimaryKeyLiteralsvalue,- malformed request shapes are rejected with
400 bad_requestinstead of being durably queued.
For PRIMARY_KEYS submissions, the current control plane accepts
primaryKeyLiterals as strings. DBLog canonicalizes those strings against the
selected table schema before targeted-repair reads bind typed JDBC primary-key
parameters.
A valid PRIMARY_KEYS request can complete even when one or more requested
source rows are absent. In that case DBLog records those keys in
missingPrimaryKeyLiterals on the completed request.
Current literal shapes:
- single-column primary keys: plain string literals such as
42, - composite primary keys: canonical
{column=value,...}strings or positional tuple strings such as(tenant-a,42).
The table object mirrors the internal three-part TableId. For PostgreSQL,
databaseName is the physical database name and schemaName is the schema
name. For MySQL, databaseName is the logical DBLog source id and
schemaName is the physical source database name. tableName remains the
physical table name for every adapter.
If a PRIMARY_KEYS submission passes HTTP shape validation but later fails
schema-aware literal canonicalization at submission time, the control plane
rejects it with 400 bad_request and does not durably queue a request row.
If later typed JDBC binding fails after acceptance, the current runtime marks
that request FAILED. In both cases the bad repair request does not bring down
the DBLog runtime host.
Likewise, if a submitted TABLE or PRIMARY_KEYS request names a table that
is not present in the runtime's current captured-schema set, the current
runtime marks that request FAILED. The missing-table request does not bring
down the DBLog runtime host.
Use these endpoints for curated DBLog-owned observability:
/api/v1/metrics/api/v1/events/summary/api/v1/events/recent/api/v1/events/tables/{tableDisplayName}
Current pressure interpretation:
- if the sink is retrying, the control plane should describe that as sink unavailability rather than generic slowdown,
- if MySQL source fetching is paused because the bounded source queue is full while the sink is still up, the control plane should describe that as sink slowdown or inbound pressure exceeding current apply throughput,
- PostgreSQL remains direct-poll rather than queue-prefetch based, so sink slowdown shows up through blocked sink-apply state rather than a source-queue stall.
Operator-facing event views hide internal watermark/heartbeat activity.
Never enable the tap in production. The tap deliberately blocks the DBLog pump thread when the subscriber cannot keep up — that is the entire point of the feature. It exists so a separate TUI can visualise the watermark algorithm step by step on a teaching machine. Running it against a real workload will stall CDC under any subscriber slowdown.
When dblog.tap.enabled=true, the control plane mounts one extra route:
GET /api/v1/tap/stream— chunked NDJSON. Exactly one subscriber at a time; a new connection displaces any in-flight one (TCP-like last-wins semantics). Pins one executor thread until the subscriber disconnects. The subscriber detects the end of the run via socket close. TCP flow control is what stalls the pump when the subscriber can't keep up — that is the step-mode mechanism.
The startup WARN means only that the educational tap is enabled. Actual tap-induced pump blocking is surfaced separately:
- server logs emit one
DBLog tap queue is full ...WARN per tap queue lifetime when the queue first fills and the pump blocks, - the tap stream emits
stream.standbyafterdblog.tap.standby-threshold-ms, thenstream.resumedafter the HTTP writer catches up, - tap
stream.heartbeatevents includequeue_depthandqueue_capacityfor the tap queue.
/api/v1/runtime/status sourceFlowControl.queueDepth is source-log
flow-control state, not tap HTTP stream queue health.
Every envelope carries run_id; subscribers use it to detect a new run
(process restart) without relying on a handshake event.
The route returns 503 tap_not_enabled when dblog.tap.enabled=false.
Relevant config:
| property | default | notes |
|---|---|---|
dblog.tap.enabled |
false |
master toggle. Off by default. |
dblog.tap.queue-capacity |
65536 |
bounded queue between pump and HTTP writer |
dblog.tap.standby-threshold-ms |
1000 |
how long the producer must block before stream.standby |
dblog.tap.heartbeat-interval |
2s |
idle heartbeat cadence (any Spring-parseable Duration) |
The per-kind JSON Schemas under docs/schema/events/ are the authoritative
wire contract. The Java event POJOs under
src/main/java/io/github/aandreakis/dblog/tap/generated/ are
generated from those schemas via ./gradlew generateTapSchemaClasses and
committed to the repo; the generator is manual-only and deterministic.
External consumers (Rust via typify, TypeScript via quicktype, etc.)
point their own codegen at docs/schema/tap-event.schema.json, which is
the oneOf union of the per-kind files.
Field inclusion follows Jackson's @JsonInclude(NON_NULL) convention:
optional fields are omitted from the wire when their value is not
applicable, not emitted as explicit null. Consumers should treat the
absence of an optional field as equivalent to "not applicable here". The
only exception is seq on stream.standby and stream.resumed, which is
emitted as explicit null because those events are published out of band
by the HTTP writer thread and therefore do not carry a pump-assigned
sequence number (see §5.4 Envelope below).
Every tap event starts with the same six envelope fields in this order:
| field | type | notes |
|---|---|---|
v |
integer | Tap schema version. Always 1 in the current runtime. |
seq |
integer or null | Monotonic per-run sequence. null for the two out-of-band kinds. |
ts |
string | ISO-8601 UTC with six fractional-second digits (microseconds). |
run_id |
string | DBLog runtime run id. Changes across a process restart. |
source_id |
string | dblog.source.id from runtime config. |
kind |
string | One of the 13 event kinds listed below. |
seq is monotonically increasing across all non-out-of-band events for a
given run_id; a reader can gap-detect by comparing successive seq
values. Out-of-band events (stream.standby, stream.resumed) do not
consume sequence numbers because they are written directly by the HTTP
writer thread without passing through the pump's per-batch buffer.
Thirteen event kinds, grouped by what they observe:
| kind | summary |
|---|---|
cdc |
One CDC row change observed on the source change stream. |
sink.event |
One event appended to a sink delegate, post reconciliation. Emitted once per sink per event. |
watermark.written |
DBLog wrote a LOW or HIGH watermark row on dblog_meta.watermarks. |
watermark.received |
The reconciler observed the LOW or HIGH watermark token come back on the source change log. |
chunk.selected |
A chunk SELECT finished inside the currently open watermark window. |
chunk.collision |
An in-window CDC event caused a PK removal from the chunk refresh buffer. |
chunk.completed |
HIGH watermark observed, refresh rows emitted, chunk acknowledged. |
checkpoint.advanced |
Buffered streaming checkpoint rolled forward. |
request.transition |
A dump request's durable state changed. |
error |
A fail-closed runtime boundary was hit. |
stream.heartbeat |
Liveness plus queue-depth sample. Emitted on batch commits when the heartbeat interval has elapsed. |
stream.standby |
Producer has blocked on a full queue for at least dblog.tap.standby-threshold-ms. Out of band. |
stream.resumed |
Producer is no longer blocked. Out of band. |
See the per-kind schema files for the full field catalog and required / optional split:
docs/schema/events/cdc-event.schema.jsondocs/schema/events/sink-event.schema.jsondocs/schema/events/watermark-written.schema.jsondocs/schema/events/watermark-received.schema.jsondocs/schema/events/chunk-selected.schema.jsondocs/schema/events/chunk-collision.schema.jsondocs/schema/events/chunk-completed.schema.jsondocs/schema/events/checkpoint-advanced.schema.jsondocs/schema/events/request-transition.schema.jsondocs/schema/events/error-event.schema.jsondocs/schema/events/stream-heartbeat.schema.jsondocs/schema/events/stream-standby.schema.jsondocs/schema/events/stream-resumed.schema.json
The reason field on checkpoint.advanced is one of the four canonical
strings defined by the schema enum. The BufferedCheckpointDispatcher
calls into the tap with its own internal labels which the tap normalises
as follows:
| dispatcher label | tap reason |
|---|---|
null |
count |
batched-threshold |
count |
batched-time |
time |
stage-end, shutdown |
boundary |
The schema enum also permits bytes, reserved for future use. Any
dispatcher label the tap does not recognise is treated as a runtime
contract violation — the tap raises IllegalStateException rather than
emitting a wire value that the schema would reject. Adding a new
dispatcher label therefore requires extending both the switch in
ActiveTap#normaliseCheckpointReason and the reason enum in
checkpoint-advanced.schema.json in the same change, then regenerating
the POJOs.
At-most-once per subscriber. There is a narrow window between
queue.poll(...) and a successful out.write(...) + out.flush(...) in
which a line has been dequeued but not yet written to the socket; if the
client disconnects in that window the line is dropped. A reconnecting
reader starts from wherever the queue head is at the time of reconnection
rather than replaying. If an external consumer needs
exactly-once-per-attached-subscriber delivery in the future, the intended
fix is a peek-then-commit inversion — design decision deferred until a
concrete consumer-side requirement exists.
The feature is source-available only as a teaching artefact and is not supported for production use.
The control plane is polling-first rather than push-first. Operator tooling (curl scripts, dashboards, monitoring agents) should poll these endpoints on a reasonable cadence rather than expecting server-pushed updates.
Recommended polling targets include:
/api/v1/runtime/api/v1/runtime/status/api/v1/metrics/api/v1/events/summary/api/v1/events/recent/api/v1/requests
Common HTTP error classes:
400 bad_requestfor malformed JSON or invalid query values,404 not_foundfor unknown request ids or unmatched routes,413 request_too_largewhen the request body exceeds the configureddblog.control-plane.max-request-body-byteslimit,405 method_not_allowedfor wrong HTTP methods,503 service_unavailablefor missing state-store/configuration requirements or unavailable request-submission infrastructure,500 internal_errorfor unexpected internal failures.
Use this document and the current server/test code for exact endpoint paths, query parameters, request bodies, and response envelopes.
DBLog does not expose operator pause/resume/cancel endpoints. The equivalent operation is a process restart:
- to pause ingest, stop the DBLog process (SIGTERM or SIGKILL — both are safe),
- to resume, start the DBLog process again against the same state path.
The embedded state store persists enough progress at chunk boundaries for a fresh process to resume where the previous one stopped:
- stream checkpoints are flushed at batch boundaries, so streaming resumes from the last acknowledged source position rather than from the start,
- in-flight dumps resume at the last completed chunk; rows already emitted are not re-emitted from the source, and the reconciliation algorithm still holds because low/high watermark rows sit in the ordinary source log,
- requests that were
ACTIVEat the moment of the kill remainACTIVEin the durable state store and the restarted process picks them up on its next coordinator poll, COMPLETEDandFAILEDrequests are terminal and unaffected by restart.
A restart is not a hard preemption of the in-flight batch that was executing
when the process died: any events that had already been written to the sink
and checkpointed remain durable, and the next process re-runs the watermark
window for the chunk that was in flight. Sinks must therefore be idempotent at
the (tableDisplayName, primaryKey, sourcePosition) grain, which is the same
contract already assumed by the at-least-once delivery model.
To cancel a request, either stop DBLog and prune the row from the durable state store directly, or accept that the request will complete on its own (dumps are bounded by the source's current row count).