Skip to content

Latest commit

 

History

History
299 lines (250 loc) · 17.6 KB

File metadata and controls

299 lines (250 loc) · 17.6 KB

Perf Debugging Reference

This is the operator reference for load-test debugging. Use it when a run has unexpected throughput, latency, error, routing, or compaction behavior.

First Files To Open

Every orchestrated run writes artifacts under:

loadtest/results/<utc-timestamp>-<scenario-name>/

Start with these files:

File Use
summary.json Final collector rollup, artifact presence, reconciliation warnings, latest broker/process/host totals.
client-summary.json Hammer totals, latency percentiles, per-broker routing counts, and client trace stage summaries.
client-30s.jsonl Interval hammer counts, rates, latency summaries, and client trace stage summaries.
metrics-30s.jsonl Interval broker JSON metrics, process samples, host samples, aggregate totals, and interval deltas.
logs/http-hammer.log Hammer stdout/stderr and fatal client-side failures.
logs/<broker-id>.log Broker stdout/stderr and request trace write failures.
logs/http-hammer-request-trace.jsonl Per-request client trace records keyed by request_id.
logs/<broker-id>-request-trace.jsonl Per-request broker trace records keyed by request_id.
logs/<broker-id>-batch-flush-trace.jsonl Per-flush broker batch trace records with queue-wait summaries and nested writer stage totals.

When a latency gap appears, join the two request trace files by request_id. Client trace stages are in milliseconds. Broker trace stages and Prometheus histograms are in seconds.

Run Artifact Inventory

Artifact Producer Notes
resolved-config.json loadtest.orchestrate Effective scenario config for the run.
stack.json loadtest.orchestrate Generated stack resources and paths.
processes.json loadtest.orchestrate Process metadata, PIDs, running status, and log paths.
brokers.json loadtest.orchestrate Broker IDs and HTTP addresses.
ports.json loadtest.orchestrate Reserved local TCP ports.
environment.json loadtest.orchestrate Captured environment subset used by the run.
git.json loadtest.orchestrate Repo revision metadata.
routing-manifest.json loadtest.routing Deterministic topic-partition owner sets.
partition-manifest.json loadtest.precreate Partition pre-create attempts and per-entry status.
client-30s.jsonl loadtest/http_hammer.go One JSON object per hammer interval.
client-summary.json loadtest/http_hammer.go Final hammer summary.
remap-events.jsonl loadtest/http_hammer.go Routing remaps emitted by owner-set routing.
metrics-30s.jsonl loadtest.metrics_collector One JSON object per collector interval.
summary.json loadtest.metrics_collector Final collector summary and reconciliation view.
logs/http-hammer-request-trace.jsonl loadtest/http_hammer.go Per-request client spans and errors.
logs/<broker-id>-request-trace.jsonl server.leaderless_log_broker Per-request broker spans and response summaries.
logs/<broker-id>-batch-flush-trace.jsonl server.leaderless_log_writer Per-flush batch trace with waiter queue-wait stats, group shape, and nested writer stage totals.
logs/http-hammer.log loadtest.orchestrate Hammer process log.
logs/metrics-collector.log loadtest.orchestrate Collector process log.
logs/<broker-id>.log loadtest.orchestrate Broker process log.
logs/minio.log loadtest.orchestrate MinIO process log.
logs/oxia-standalone.log loadtest.orchestrate Standalone Oxia process log.
logs/oxia-coordinator.log loadtest.orchestrate Clustered Oxia coordinator log.
logs/oxia-dataserver-<n>.log loadtest.orchestrate Clustered Oxia data server logs.
logs/oxia-settle.log loadtest.orchestrate Oxia assignment settle progress.
logs/precreate.log loadtest.orchestrate Partition pre-create subprocess output.
logs/prometheus-target-sync-<reason>.log loadtest.orchestrate Host-loopback Prometheus target sync output, when available.

Request-Path Tracing

Hammer sends X-Llog-Request-Id on every produce request. The same ID appears in both request trace files.

Client request trace fields:

Field Meaning
request_id Correlation key shared with the broker trace.
broker_id Target broker selected by routing.
started_at_ms Client-side request start time.
status_code HTTP status, or 0 when the request failed before a response.
latency_ms Total client-observed request latency.
topic_partition_count Number of topic-partitions in the request group.
record_count Records attempted by this request.
payload_bytes Intended payload bytes.
stages Client-side span timings in milliseconds.
error_type, error Present when the request failed.

Client trace stages:

Stage Meaning
get_conn Time from httptrace.GetConn to GotConn.
tcp_connect TCP connect time for new connections. Absent for reused connections.
got_conn_to_wrote_headers Delay after connection acquisition before headers are written.
request_write Time from headers written to full request written.
request_write_to_first_byte Time from request write completion to first response byte.
response_body_read Time to read the response body after headers arrived.
round_trip Time inside the Go HTTP request.
client_total Total measured by the hammer worker around the request call.

Broker request trace fields:

Field Meaning
request_id Correlation key shared with the client trace.
broker_id Broker process that handled the request.
method, path, status, result HTTP summary.
started_at_ms Broker handler start time.
duration_seconds Broker-observed request duration.
stages Broker-side span timings in seconds.
request Content length, topic-partition count, record count, and payload bytes.
response Success and error counts when a produce response was built.
error_type, error Present for validation or handler errors.

Broker request stages:

Stage Meaning
read_body Read the HTTP request body.
json_decode Decode the JSON request body.
parse_batches Validate and convert JSON into produce batches.
initialize_partitions Run partition initialization for all batches.
append_shared Enqueue and wait for the shared WAL append path.
produce_total Total produce service time; this includes initialization and append time.
response_encode Encode the JSON response.
response_write Write HTTP status, headers, and response bytes.
request_total Full broker handler duration.

Batch append stages:

Stage Meaning
enqueue_lock_wait Time waiting to enter the batcher condition lock.
enqueue_lock_hold Time spent inside the batcher condition lock while enqueueing.
future_wait Time waiting for enqueued requests to complete after leaving the lock.

Partition initialization results:

Result Meaning
cached The topic-partition was already initialized in this broker process.
success The writer initialized the topic-partition during this request.
error Initialization raised an exception.

Metrics Endpoints

Broker and compactor services expose:

GET /metrics
GET /metrics/prometheus

/metrics is the full JSON process-local snapshot. It contains these main sections:

Section Contents
broker Broker identity, host, port, start time, and batch settings.
http Produce request counts, status counts, in-flight requests, accepted records/bytes, request duration totals, and request-stage histograms.
batching Batch enqueue counts, buffer depth, flush counts/durations, partition initialization counts/durations, queue wait, append-stage durations, backpressure, and shared WAL blob totals.
oxia Offset assignment, index writes, control key operations, recoveries, client errors, operation counts, and operation duration totals.
s3 Put counts, uploaded bytes, failures, operation counts, and operation duration totals.
compactor Service identity, discovery state, worker counts, scheduler/claim state, claim counters, and discovery/attempt durations.
compaction Compaction run counts, durations, lag, pending state, recovery counts, bytes, records, stage timings, candidate shape, and index key deletion counts.

/metrics/prometheus is the scrape surface for Prometheus and Grafana Explore.

Prometheus Metrics

Gauges:

Metric Labels Use
leaderless_log_inflight_requests none Current in-flight broker produce requests.
leaderless_log_batch_waiters_current none Current waiters buffered in the batcher.
leaderless_log_batch_buffer_payload_bytes none Current uncommitted batch payload bytes, including an in-flight sealed flush plus queued future batches.
leaderless_log_batch_queued_payload_bytes_current none Current queued payload bytes still waiting for a flush to start.
leaderless_log_batch_inflight_flush_bytes_current none Current payload bytes in the batch actively being flushed.
leaderless_log_batch_pending_batches_current none Current count of pending batches not yet handed to the underlying writer.
leaderless_log_batch_sealed_batches_current none Current count of sealed batches waiting for the flusher.
leaderless_log_batch_oldest_pending_batch_age_seconds none Age of the oldest batch still pending in the batcher.
leaderless_log_compactor_known_targets none Known compactor target count.
leaderless_log_compactor_discovered_targets none Discovered target count beyond static config.
leaderless_log_compactor_inflight_partitions none Topic-partitions currently held by compactor workers.
leaderless_log_compactor_workers none Configured compactor worker count.
leaderless_log_compactor_consecutive_errors none Current consecutive compactor attempt errors.
leaderless_log_compactions_inflight none Current compact_once runs.
leaderless_log_compaction_lag_offsets topic, partition Per-partition lag, only when partition metrics are enabled.
leaderless_log_compaction_avg_lag_offsets none Average compaction lag across known partitions.
leaderless_log_compaction_max_lag_offsets none Maximum compaction lag across known partitions.
leaderless_log_compaction_last_success_timestamp_seconds none Unix timestamp of last successful or recovered compaction.
leaderless_log_compaction_pending_current state Current pending compactions by state.
leaderless_log_compaction_oldest_pending_age_seconds none Age of the oldest pending compaction.

Counters:

Metric Labels Use
leaderless_log_produce_requests_total result Accepted vs rejected produce requests.
leaderless_log_batch_backpressure_rejections_total none Append attempts rejected because total uncommitted payload bytes would exceed the batch buffer cap.
leaderless_log_batch_flushes_total trigger, result Batch flush outcomes by trigger.
leaderless_log_oxia_operations_total operation, result Oxia operation outcomes.
leaderless_log_s3_operations_total operation, result S3 operation outcomes.
leaderless_log_compactor_attempts_total result Compactor attempt outcomes.
leaderless_log_compactor_discovery_scans_total none Compactor discovery scans.
leaderless_log_compactor_discovery_additions_total none Targets added by discovery.
leaderless_log_compactor_claim_attempts_total result Distributed claim outcomes.
leaderless_log_compaction_runs_total result Compaction run outcomes.
leaderless_log_compaction_recoveries_total kind Compaction recovery events.
leaderless_log_compacted_records_total none Records included in compacted blobs.
leaderless_log_compaction_input_bytes_total none Bytes read by compaction.
leaderless_log_compaction_output_bytes_total none Bytes written by compaction.
leaderless_log_compaction_index_keys_deleted_total none Index keys deleted by compaction cleanup.

Histograms and summaries:

Metric Labels Use
leaderless_log_request_stage_duration_seconds stage, result Broker-side produce request stage timing.
leaderless_log_batch_queue_wait_seconds none Time waiting in the batcher before flush starts.
leaderless_log_batch_seal_to_flush_start_seconds none Time between sealing a batch and starting its flush.
leaderless_log_batch_append_stage_duration_seconds stage Request-side batch append timing.
leaderless_log_partition_initialize_duration_seconds result Partition initialization timing by result.
leaderless_log_partition_initialize_lock_wait_seconds result Time waiting for the initialization underlying lock.
leaderless_log_batch_flush_duration_seconds trigger Time inside one batch flush.
leaderless_log_oxia_operation_duration_seconds operation, result Oxia operation latency.
leaderless_log_s3_operation_duration_seconds operation, result S3 operation latency.
leaderless_log_compactor_scheduler_pass_duration_seconds none Compactor scheduler pass duration.
leaderless_log_compactor_discovery_duration_seconds none Compactor discovery scan duration.
leaderless_log_compactor_attempt_duration_seconds result Compactor attempt duration.
leaderless_log_compactor_claim_hold_duration_seconds none Time a distributed compactor claim was held.
leaderless_log_compaction_duration_seconds result End-to-end compaction duration.
leaderless_log_compaction_stage_duration_seconds stage, result Per-stage compaction duration.
leaderless_log_compaction_candidate_entries none Index entries selected for a compaction candidate.
leaderless_log_compaction_candidate_records none Records selected for a compaction candidate.
leaderless_log_compaction_candidate_span_offsets none Offset span selected for a compaction candidate.
leaderless_log_shared_wal_blob_bytes none Summary of encoded shared WAL blob size.

Grafana

The source dashboard is:

grafana/leaderless-log-observability.json

On the host monitoring stack, the provisioned dashboard copy is:

/home/ops/services/monitoring/grafana/provisioning/dashboards/leaderless-log-observability.json

The dashboard includes a REQUEST PATH row with avg and p99 panels for the request-stage, batch append-stage, partition initialization, and partition initialization lock-wait histograms. Use Grafana Explore or Prometheus directly for ad hoc label filtering beyond those panels.

Useful PromQL while investigating the request-path gap:

histogram_quantile(
  0.99,
  sum by (instance, le, stage, result) (
    rate(leaderless_log_request_stage_duration_seconds_bucket{job=~"$job",instance=~"$instance"}[$rate_interval])
  )
)
sum by (instance, stage) (
  rate(leaderless_log_batch_append_stage_duration_seconds_sum{job=~"$job",instance=~"$instance"}[$rate_interval])
)
/
sum by (instance, stage) (
  rate(leaderless_log_batch_append_stage_duration_seconds_count{job=~"$job",instance=~"$instance"}[$rate_interval])
)
sum by (instance, result) (
  rate(leaderless_log_partition_initialize_lock_wait_seconds_sum{job=~"$job",instance=~"$instance"}[$rate_interval])
)
/
sum by (instance, result) (
  rate(leaderless_log_partition_initialize_lock_wait_seconds_count{job=~"$job",instance=~"$instance"}[$rate_interval])
)

The request trace JSONL files are not in Grafana today unless a separate log shipping path is added.

Common Debug Flow

  1. Open summary.json and check warnings, artifact_presence, reconciliation, latest_totals, and cumulative_totals.
  2. Open client-summary.json and compare throughput, latency, connect_error_count, timeout_count, remaps, unroutable partitions, and client_trace_stage_ms.
  3. Open metrics-30s.jsonl and compare broker totals and interval deltas against the client intervals.
  4. If broker HTTP latency is high, inspect leaderless_log_request_stage_duration_seconds by stage and result.
  5. If append_shared dominates, inspect leaderless_log_batch_append_stage_duration_seconds, leaderless_log_batch_queue_wait_seconds, leaderless_log_batch_seal_to_flush_start_seconds, leaderless_log_batch_flush_duration_seconds, leaderless_log_batch_pending_batches_current, leaderless_log_batch_sealed_batches_current, leaderless_log_batch_queued_payload_bytes_current, leaderless_log_batch_inflight_flush_bytes_current, leaderless_log_oxia_operation_duration_seconds, and leaderless_log_s3_operation_duration_seconds. Then open logs/<broker-id>-batch-flush-trace.jsonl to correlate slow flushes with waiter queue-wait, group count, payload bytes, and nested writer stage totals.
  6. If initialize_partitions dominates, inspect leaderless_log_partition_initialize_duration_seconds and leaderless_log_partition_initialize_lock_wait_seconds.
  7. If client latency is higher than broker request_total, join logs/http-hammer-request-trace.jsonl and logs/<broker-id>-request-trace.jsonl by request_id and compare client request_write_to_first_byte against broker request_total.
  8. If errors or remaps spike, inspect remap-events.jsonl, logs/http-hammer.log, logs/<broker-id>.log, and Oxia logs.