Loadtest

This directory is reserved for the local-process load-test harness described in ../plans/in-progress/plan.md.

Expected layout:

HOWRUN.md: loadtest-only operator commands
quick_observability.py: one-command local Prometheus and Grafana wrapper for quick broker reviews
orchestrate.py: local stack entrypoint for up, status, and down
ports.py: deterministic local TCP port reservation helpers
processes.py: reusable subprocess lifecycle helpers with per-process logs
routing.py: deterministic owner-set routing and routing-manifest.json generation
http_hammer.go: stdlib-only HTTP producer load generator
run-configs/: committed scenario configs
RUN_NOTES.md: human summaries of important runs, hypotheses, and next metrics
results/: generated per-run artifacts
precreate.py: idempotent topic-partition pre-create tool
metrics_collector.py: broker/process/host sampling into metrics-30s.jsonl
Python orchestration and metrics code
Go HTTP load generator code

Run artifact convention:

each run should write to loadtest/results/<utc-timestamp>-<scenario-name>/

Current deterministic topic layout rules:

topic names are derived from the scenario name as <scenario_name>-topic-<zero-padded-topic-index>
if partitions_per_topic is set, every topic gets that many partitions
if only total_partition_count is set, partitions are spread evenly and the remainder is assigned to the lowest topic indexes

The pre-create step writes partition-manifest.json into the run artifact directory with:

attempted, created, and already-existing counts
per-topic partition counts
the full flat topic-partition list and per-entry status

Current storage guidance:

harness-managed MinIO is now intended for smoke tests and low-scale local runs, not for authoritative storage performance measurements
when LLOG_LOADTEST_MINIO_DATA_DIR_ROOT is unset, orchestrate.py will prefer /dev/shm/llog-tests for small scenarios whose expected produced bytes fit in current tmpfs free space with 1.5x headroom; otherwise it falls back to the run directory on disk
set LLOG_LOADTEST_MINIO_DATA_DIR_ROOT=/some/path to pin a specific local MinIO data root and bypass that auto-selection
use a real external S3/object-store target for serious performance runs; on this host, local MinIO backed by / can hit filesystem or writeback stalls that are not a reliable proxy for object-store behavior
set LLOG_LOADTEST_EXTERNAL_S3=1 plus LLOG_S3_BUCKET to make the harness skip managed MinIO and point brokers/compactors at the external object store; LLOG_S3_ENDPOINT_URL is optional for non-AWS S3-compatible services

The routing manifest writes routing-manifest.json into the run artifact directory with:

the scenario name and routing seed
the canonical broker id list for the run
the effective owner-set size
one deterministic owner set per topic-partition

Current orchestrator state:

orchestrate.py up can now:
- create the run directory
- generate broker ids and full stack port assignments
- write Oxia cluster/config files under the run artifact directory
- write routing-manifest.json
- start MinIO, clustered or standalone Oxia, an optional compactor service, broker processes, and optional read-only consumer processes
- wait for MinIO health, create the bucket, wait for Oxia assignments, and wait for broker and consumer /health
- run partition pre-create automatically when precreate_partitions is enabled
orchestrate.py run now:
- starts the full stack
- starts metrics_collector.py
- runs http_hammer.go
- writes a final collector snapshot and summary
- runs the generic host-loopback Prometheus sync immediately after broker startup when the sync script is present
- tears the stack down unless keep_stack_running is enabled
orchestrate.py precreate re-runs the idempotent partition bootstrap against an existing run directory
orchestrate.py status reads stack.json and processes.json and reports live process status
orchestrate.py down stops tracked process groups from processes.json
orchestrate.py run and orchestrate.py down re-run the same Prometheus sync after teardown so short-lived runs do not leave stale file_sd targets or bridge proxies behind
ports.py reserves local TCP ports so future up/run commands do not guess
processes.py provides the process-group-aware lifecycle primitives the real stack runner will use
the orchestrator writes stack.json, processes.json, brokers.json, ports.json, environment.json, git.json, and routing-manifest.json
brokers started by orchestrate.py also export static loadtest scenario metrics so Grafana can show the active run shape live
set compactor_enabled=true plus compactor_count / compactor_workers in a scenario to have orchestrate.py start one or more managed server.leaderless_log_compactor_server processes and record them in stack.json, ports.json, and processes.json
set consumer_enabled=true or consumer_count>0 in a scenario to have orchestrate.py start one or more managed server.leaderless_log_broker --role read processes and record them in stack.json, ports.json, and processes.json

Current quick observability state:

quick_run.sh now delegates to quick_observability.py
the quick flow starts Prometheus and Grafana from loadtest/observability/docker-compose.yml
Grafana is provisioned with the repo dashboard and a Prometheus datasource with uid prometheus
the quick wrapper writes the active broker ports into loadtest/observability/runtime/prometheus-targets.json so Prometheus scrapes the host brokers through host.docker.internal
on Linux hosts where the brokers bind loopback-only, the quick wrapper also starts temporary host-network socat bridge containers so Prometheus can reach those broker ports through the Docker host gateway
the quick flow leaves the broker stack and the observability containers running after the hammer exits so the dashboard can be inspected immediately
./quick_run.sh down stops the tracked run and tears the observability containers down

Current collector state:

metrics_collector.py can scrape every broker GET /metrics
it samples tracked local processes with ps
it writes metrics-30s.jsonl and summary.json
it tolerates missing broker/process/host samples by recording warnings instead of aborting the run
summary.json now includes artifact presence, request-path trace presence, batch-flush trace presence, request-path metric rollups, and client-vs-broker reconciliation warnings
broker S3 billing estimates are rolled into the sampled totals, including summed request-cost estimates plus max bucket-footprint and monthly-storage-cost gauges
Linux host CPU, memory, disk-write, and network totals are sampled directly from /proc
Darwin host CPU, memory, and network totals are sampled, but disk-write bytes/sec is still unavailable without extra tooling

Current S3 summary fields include:

broker_s3_estimated_request_cost_usd
bucket_stored_bytes
bucket_stored_gb
bucket_object_count
estimated_monthly_storage_cost_usd

Aggregation rule:

sum request-cost totals across brokers
take max for shared bucket-footprint and storage-cost gauges

Current hammer state:

http_hammer.go reads resolved-config.json, brokers.json, and routing-manifest.json from one run directory
it uses persistent HTTP connections through the Go stdlib client
it emits deterministic multi-topic-partition POST /produce traffic
it supports single_hot_partition, uniform_spread, and zipf_skew
it supports round_robin, least_inflight, and explicit unrestricted_fanout
it now paces logical request dispatch from target_mib_per_sec_per_broker instead of tying pacing to synchronous worker loops
http_clients_per_broker now acts as the per-broker in-flight request cap for the hammer
it writes client-30s.jsonl, client-summary.json, remap-events.jsonl, and logs/http-hammer-request-trace.jsonl
the client artifacts now expose configured request bytes, pacing lag, in-flight request counts, request-path trace stages, and achieved-vs-target throughput per broker
client-summary.json now reports the configured steady-state window by default and keeps a full_window section with the warmup-inclusive traffic totals for raw debugging
broker-side load-test runs now also emit logs/<broker-id>-batch-flush-trace.jsonl with one record per batch flush, including trigger, waiter queue-wait summary, group counts, payload bytes, and nested writer stage totals

Current consume hammer state:

consume_hammer/main.go reads resolved-config.json, stack.json, and routing-manifest.json from one run directory
when stack.json lists dedicated consumers, the consume hammer targets those read processes; otherwise it falls back to broker /consume endpoints
consume scenarios share LLOG_TAIL_CACHE_DIR=<run-dir>/tail-cache across writer brokers and read-only consumers so hot-tail reads can stay on the shared file-backed cache before falling back to Oxia/S3
scenarios with broker_locality_mode=paired_broker instead use per-broker cache groups under <run-dir>/tail-cache/<broker-id>; consumer-N is paired with broker-N, the producer hammer samples partitions from that broker's local owner-set pool, and the consume hammer reads from the same paired pool to mimic node-local cache locality
before timed traffic starts, the consume hammer bootstraps each active partition to the current tail and then keeps the next fetch offset only in local process memory
the consume hammer allows only one in-flight request per topic-partition, so a partition cannot be read concurrently from overlapping local offsets and measured consume throughput cannot outrun current production just by duplicating reads
consume-client-summary.json now also reports the steady-state window by default and preserves a full_window section for warmup-inclusive debugging

The complete load-test metrics, logs, traces, and debugging flow are documented in ../PERF_DEBUGGING.md.

Concurrency sizing rule of thumb:

first compute request payload bytes as record_size_bytes * topic_partitions_per_request * records_per_topic_partition_per_request
then compute target requests/sec per broker as (target_mib_per_sec_per_broker * 1048576) / request_payload_bytes
then compute the approximate in-flight requests needed per broker as target_requests_per_sec_per_broker * p50_latency_seconds
if client-summary.json or client-30s.jsonl shows inflight_requests.per_broker.<broker>.peak pinned near http_clients_per_broker while pacing_lag_ms keeps growing, the hammer is still concurrency-capped and http_clients_per_broker is too low for that scenario
if achieved throughput is still low after the in-flight cap is comfortably above that estimate, the next bottleneck is more likely in the broker, Oxia, MinIO, routing churn, or another downstream path

Example for single-broker-32mib:

producer request payload = 1024 * 4 * 32 = 131072 bytes = 128 KiB
target producer rate = (32 MiB/s * 1048576) / 131072 = 256 requests/sec
at about 590 ms p50 latency, the hammer needs roughly 256 * 0.590 = 151 in-flight requests, and p95 latency needs more than 200
the committed scenario uses 512 producer clients to leave headroom above that estimate
the consume hammer requests up to 1 MiB per selected partition; at the tail of a 32-partition, 32 MiB/s run, each partition fills at about 1 MiB/s, so the committed scenario uses consume_max_wait_ms=1000 to avoid forcing partial tail reads
consume scenarios also provision the file-backed tail cache for about 64 seconds of aggregate producer throughput so the separate read processes can stay on the hot tail without repeated Oxia/S3 fallback; override with LLOG_TAIL_CACHE_MAX_BYTES when testing on a disk-constrained host

Committed run configs:

clustered-2b-2c-10m.json
committed scenarios now keep rough total topic-partitions at or below 32 * broker_count
sanity-clustered.json
standalone-control.json
single-broker-32mib.json
public-s3-demo-5m.json
public-s3-demo-5m-scaleout.json
scaleout-3b-3c-3comp-100mib-locality.json
scaleout-4-brokers-128mib.json
density-16-brokers-512mib.json
fanout-negative-control.json

Current runtime assumptions:

Oxia startup needs either a cached binary at .cache/loadtest/bin/oxia, an oxia binary on PATH, LLOG_LOADTEST_OXIA_BIN=/abs/path/to/oxia, or a sibling oxia-server/ checkout plus go
MinIO startup needs either minio on PATH, docker on PATH, or LLOG_LOADTEST_MINIO_BIN=/abs/path/to/minio
low-scale local runs will default MinIO data into /dev/shm/llog-tests when the estimated payload footprint fits; larger runs stay on disk unless LLOG_LOADTEST_MINIO_DATA_DIR_ROOT explicitly overrides the location
broker and bucket setup use LLOG_LOADTEST_PYTHON when set and otherwise prefer .venv-loadtest/bin/python before falling back to the current interpreter
use oxia>=0.2.0; the current repo baseline is oxia==0.2.1 because older 0.1.x clients lacked the session keepalive/recreation behavior the compaction path relies on
external S3 runs should set LLOG_LOADTEST_EXTERNAL_S3=1; the harness will validate LLOG_S3_BUCKET access before starting brokers and will keep cleanup scoped to LLOG_ROOT_PREFIX
process metrics collection assumes ps is available on the host
host Prometheus bridge sync is best-effort and can be overridden or disabled with LLOG_LOADTEST_PROMETHEUS_SYNC_SCRIPT

EC2-oriented workflow:

python3 -m loadtest.ec2 bootstrap creates .venv-loadtest, installs requirements.txt, and builds cached Go binaries into .cache/loadtest/bin/ when go is available
python3 -m loadtest.ec2 preflight --scenario ... --bucket ... validates the repo Python runtime, launcher resolution, S3 bucket access, and optional observability prerequisites
python3 -m loadtest.ec2 smoke --bucket ... runs a short external-S3 smoke scenario and cleans its prefix up by default unless --keep-artifacts is set
python3 -m loadtest.ec2 run --scenario ... --bucket ... runs the harness in external-S3 mode without manual export steps when the needed values are already present in .env, the shell environment, or the CLI flags
python3 -m loadtest.ec2 report latest prints a concise JSON summary from the latest run directory
python3 -m loadtest.ec2 bundle latest creates one tar.gz artifact bundle from the run directory while skipping transient Oxia and MinIO data trees
python3 -m loadtest.ec2 cleanup latest cleans only the recorded run prefix and now relies on the current boto3 credential chain instead of stored AWS secrets from the original run

Still pending:

real end-to-end validation on a machine with go, oxia, and minio or docker
public external-S3 demo checklist and exact commands are tracked in PUBLIC_S3_DEMO.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loadtest

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Loadtest