This directory is reserved for the local-process load-test harness described in ../plans/in-progress/plan.md.
Expected layout:
HOWRUN.md: loadtest-only operator commandsquick_observability.py: one-command local Prometheus and Grafana wrapper for quick broker reviewsorchestrate.py: local stack entrypoint forup,status, anddownports.py: deterministic local TCP port reservation helpersprocesses.py: reusable subprocess lifecycle helpers with per-process logsrouting.py: deterministic owner-set routing androuting-manifest.jsongenerationhttp_hammer.go: stdlib-only HTTP producer load generatorrun-configs/: committed scenario configsRUN_NOTES.md: human summaries of important runs, hypotheses, and next metricsresults/: generated per-run artifactsprecreate.py: idempotent topic-partition pre-create toolmetrics_collector.py: broker/process/host sampling intometrics-30s.jsonl- Python orchestration and metrics code
- Go HTTP load generator code
Run artifact convention:
- each run should write to
loadtest/results/<utc-timestamp>-<scenario-name>/
Current deterministic topic layout rules:
- topic names are derived from the scenario name as
<scenario_name>-topic-<zero-padded-topic-index> - if
partitions_per_topicis set, every topic gets that many partitions - if only
total_partition_countis set, partitions are spread evenly and the remainder is assigned to the lowest topic indexes
The pre-create step writes partition-manifest.json into the run artifact directory with:
- attempted, created, and already-existing counts
- per-topic partition counts
- the full flat topic-partition list and per-entry status
Current storage guidance:
- harness-managed MinIO is now intended for smoke tests and low-scale local runs, not for authoritative storage performance measurements
- when
LLOG_LOADTEST_MINIO_DATA_DIR_ROOTis unset,orchestrate.pywill prefer/dev/shm/llog-testsfor small scenarios whose expected produced bytes fit in current tmpfs free space with1.5xheadroom; otherwise it falls back to the run directory on disk - set
LLOG_LOADTEST_MINIO_DATA_DIR_ROOT=/some/pathto pin a specific local MinIO data root and bypass that auto-selection - use a real external S3/object-store target for serious performance runs; on this host, local MinIO backed by
/can hit filesystem or writeback stalls that are not a reliable proxy for object-store behavior - set
LLOG_LOADTEST_EXTERNAL_S3=1plusLLOG_S3_BUCKETto make the harness skip managed MinIO and point brokers/compactors at the external object store;LLOG_S3_ENDPOINT_URLis optional for non-AWS S3-compatible services
The routing manifest writes routing-manifest.json into the run artifact directory with:
- the scenario name and routing seed
- the canonical broker id list for the run
- the effective owner-set size
- one deterministic owner set per topic-partition
Current orchestrator state:
orchestrate.py upcan now:- create the run directory
- generate broker ids and full stack port assignments
- write Oxia cluster/config files under the run artifact directory
- write
routing-manifest.json - start MinIO, clustered or standalone Oxia, an optional compactor service, broker processes, and optional read-only consumer processes
- wait for MinIO health, create the bucket, wait for Oxia assignments, and wait for broker and consumer
/health - run partition pre-create automatically when
precreate_partitionsis enabled
orchestrate.py runnow:- starts the full stack
- starts
metrics_collector.py - runs
http_hammer.go - writes a final collector snapshot and summary
- runs the generic host-loopback Prometheus sync immediately after broker startup when the sync script is present
- tears the stack down unless
keep_stack_runningis enabled
orchestrate.py precreatere-runs the idempotent partition bootstrap against an existing run directoryorchestrate.py statusreadsstack.jsonandprocesses.jsonand reports live process statusorchestrate.py downstops tracked process groups fromprocesses.jsonorchestrate.py runandorchestrate.py downre-run the same Prometheus sync after teardown so short-lived runs do not leave stale file_sd targets or bridge proxies behindports.pyreserves local TCP ports so futureup/runcommands do not guessprocesses.pyprovides the process-group-aware lifecycle primitives the real stack runner will use- the orchestrator writes
stack.json,processes.json,brokers.json,ports.json,environment.json,git.json, androuting-manifest.json - brokers started by
orchestrate.pyalso export static loadtest scenario metrics so Grafana can show the active run shape live - set
compactor_enabled=truepluscompactor_count/compactor_workersin a scenario to haveorchestrate.pystart one or more managedserver.leaderless_log_compactor_serverprocesses and record them instack.json,ports.json, andprocesses.json - set
consumer_enabled=trueorconsumer_count>0in a scenario to haveorchestrate.pystart one or more managedserver.leaderless_log_broker --role readprocesses and record them instack.json,ports.json, andprocesses.json
Current quick observability state:
quick_run.shnow delegates toquick_observability.py- the quick flow starts Prometheus and Grafana from
loadtest/observability/docker-compose.yml - Grafana is provisioned with the repo dashboard and a Prometheus datasource with uid
prometheus - the quick wrapper writes the active broker ports into
loadtest/observability/runtime/prometheus-targets.jsonso Prometheus scrapes the host brokers throughhost.docker.internal - on Linux hosts where the brokers bind loopback-only, the quick wrapper also starts temporary host-network
socatbridge containers so Prometheus can reach those broker ports through the Docker host gateway - the quick flow leaves the broker stack and the observability containers running after the hammer exits so the dashboard can be inspected immediately
./quick_run.sh downstops the tracked run and tears the observability containers down
Current collector state:
metrics_collector.pycan scrape every brokerGET /metrics- it samples tracked local processes with
ps - it writes
metrics-30s.jsonlandsummary.json - it tolerates missing broker/process/host samples by recording warnings instead of aborting the run
summary.jsonnow includes artifact presence, request-path trace presence, batch-flush trace presence, request-path metric rollups, and client-vs-broker reconciliation warnings- broker S3 billing estimates are rolled into the sampled totals, including summed request-cost estimates plus max bucket-footprint and monthly-storage-cost gauges
- Linux host CPU, memory, disk-write, and network totals are sampled directly from
/proc - Darwin host CPU, memory, and network totals are sampled, but disk-write bytes/sec is still unavailable without extra tooling
Current S3 summary fields include:
broker_s3_estimated_request_cost_usdbucket_stored_bytesbucket_stored_gbbucket_object_countestimated_monthly_storage_cost_usd
Aggregation rule:
- sum request-cost totals across brokers
- take max for shared bucket-footprint and storage-cost gauges
Current hammer state:
http_hammer.goreadsresolved-config.json,brokers.json, androuting-manifest.jsonfrom one run directory- it uses persistent HTTP connections through the Go stdlib client
- it emits deterministic multi-topic-partition
POST /producetraffic - it supports
single_hot_partition,uniform_spread, andzipf_skew - it supports
round_robin,least_inflight, and explicitunrestricted_fanout - it now paces logical request dispatch from
target_mib_per_sec_per_brokerinstead of tying pacing to synchronous worker loops http_clients_per_brokernow acts as the per-broker in-flight request cap for the hammer- it writes
client-30s.jsonl,client-summary.json,remap-events.jsonl, andlogs/http-hammer-request-trace.jsonl - the client artifacts now expose configured request bytes, pacing lag, in-flight request counts, request-path trace stages, and achieved-vs-target throughput per broker
client-summary.jsonnow reports the configured steady-state window by default and keeps afull_windowsection with the warmup-inclusive traffic totals for raw debugging- broker-side load-test runs now also emit
logs/<broker-id>-batch-flush-trace.jsonlwith one record per batch flush, including trigger, waiter queue-wait summary, group counts, payload bytes, and nested writer stage totals
Current consume hammer state:
consume_hammer/main.goreadsresolved-config.json,stack.json, androuting-manifest.jsonfrom one run directory- when
stack.jsonlists dedicated consumers, the consume hammer targets thosereadprocesses; otherwise it falls back to broker/consumeendpoints - consume scenarios share
LLOG_TAIL_CACHE_DIR=<run-dir>/tail-cacheacross writer brokers and read-only consumers so hot-tail reads can stay on the shared file-backed cache before falling back to Oxia/S3 - scenarios with
broker_locality_mode=paired_brokerinstead use per-broker cache groups under<run-dir>/tail-cache/<broker-id>;consumer-Nis paired withbroker-N, the producer hammer samples partitions from that broker's local owner-set pool, and the consume hammer reads from the same paired pool to mimic node-local cache locality - before timed traffic starts, the consume hammer bootstraps each active partition to the current tail and then keeps the next fetch offset only in local process memory
- the consume hammer allows only one in-flight request per topic-partition, so a partition cannot be read concurrently from overlapping local offsets and measured consume throughput cannot outrun current production just by duplicating reads
consume-client-summary.jsonnow also reports the steady-state window by default and preserves afull_windowsection for warmup-inclusive debugging
The complete load-test metrics, logs, traces, and debugging flow are documented
in ../PERF_DEBUGGING.md.
Concurrency sizing rule of thumb:
- first compute request payload bytes as
record_size_bytes * topic_partitions_per_request * records_per_topic_partition_per_request - then compute target requests/sec per broker as
(target_mib_per_sec_per_broker * 1048576) / request_payload_bytes - then compute the approximate in-flight requests needed per broker as
target_requests_per_sec_per_broker * p50_latency_seconds - if
client-summary.jsonorclient-30s.jsonlshowsinflight_requests.per_broker.<broker>.peakpinned nearhttp_clients_per_brokerwhilepacing_lag_mskeeps growing, the hammer is still concurrency-capped andhttp_clients_per_brokeris too low for that scenario - if achieved throughput is still low after the in-flight cap is comfortably above that estimate, the next bottleneck is more likely in the broker, Oxia, MinIO, routing churn, or another downstream path
Example for single-broker-32mib:
- producer request payload =
1024 * 4 * 32 = 131072bytes =128 KiB - target producer rate =
(32 MiB/s * 1048576) / 131072 = 256requests/sec - at about
590 msp50 latency, the hammer needs roughly256 * 0.590 = 151in-flight requests, and p95 latency needs more than200 - the committed scenario uses
512producer clients to leave headroom above that estimate - the consume hammer requests up to
1 MiBper selected partition; at the tail of a 32-partition, 32 MiB/s run, each partition fills at about1 MiB/s, so the committed scenario usesconsume_max_wait_ms=1000to avoid forcing partial tail reads - consume scenarios also provision the file-backed tail cache for about 64 seconds of aggregate producer throughput so the separate read processes can stay on the hot tail without repeated Oxia/S3 fallback; override with
LLOG_TAIL_CACHE_MAX_BYTESwhen testing on a disk-constrained host
Committed run configs:
clustered-2b-2c-10m.json- committed scenarios now keep rough total topic-partitions at or below
32 * broker_count sanity-clustered.jsonstandalone-control.jsonsingle-broker-32mib.jsonpublic-s3-demo-5m.jsonpublic-s3-demo-5m-scaleout.jsonscaleout-3b-3c-3comp-100mib-locality.jsonscaleout-4-brokers-128mib.jsondensity-16-brokers-512mib.jsonfanout-negative-control.json
Current runtime assumptions:
- Oxia startup needs either a cached binary at
.cache/loadtest/bin/oxia, anoxiabinary onPATH,LLOG_LOADTEST_OXIA_BIN=/abs/path/to/oxia, or a siblingoxia-server/checkout plusgo - MinIO startup needs either
minioonPATH,dockeronPATH, orLLOG_LOADTEST_MINIO_BIN=/abs/path/to/minio - low-scale local runs will default MinIO data into
/dev/shm/llog-testswhen the estimated payload footprint fits; larger runs stay on disk unlessLLOG_LOADTEST_MINIO_DATA_DIR_ROOTexplicitly overrides the location - broker and bucket setup use
LLOG_LOADTEST_PYTHONwhen set and otherwise prefer.venv-loadtest/bin/pythonbefore falling back to the current interpreter - use
oxia>=0.2.0; the current repo baseline isoxia==0.2.1because older0.1.xclients lacked the session keepalive/recreation behavior the compaction path relies on - external S3 runs should set
LLOG_LOADTEST_EXTERNAL_S3=1; the harness will validateLLOG_S3_BUCKETaccess before starting brokers and will keep cleanup scoped toLLOG_ROOT_PREFIX - process metrics collection assumes
psis available on the host - host Prometheus bridge sync is best-effort and can be overridden or disabled with
LLOG_LOADTEST_PROMETHEUS_SYNC_SCRIPT
EC2-oriented workflow:
python3 -m loadtest.ec2 bootstrapcreates.venv-loadtest, installsrequirements.txt, and builds cached Go binaries into.cache/loadtest/bin/whengois availablepython3 -m loadtest.ec2 preflight --scenario ... --bucket ...validates the repo Python runtime, launcher resolution, S3 bucket access, and optional observability prerequisitespython3 -m loadtest.ec2 smoke --bucket ...runs a short external-S3 smoke scenario and cleans its prefix up by default unless--keep-artifactsis setpython3 -m loadtest.ec2 run --scenario ... --bucket ...runs the harness in external-S3 mode without manualexportsteps when the needed values are already present in.env, the shell environment, or the CLI flagspython3 -m loadtest.ec2 report latestprints a concise JSON summary from the latest run directorypython3 -m loadtest.ec2 bundle latestcreates onetar.gzartifact bundle from the run directory while skipping transient Oxia and MinIO data treespython3 -m loadtest.ec2 cleanup latestcleans only the recorded run prefix and now relies on the current boto3 credential chain instead of stored AWS secrets from the original run
Still pending:
- real end-to-end validation on a machine with
go,oxia, andminioordocker - public external-S3 demo checklist and exact commands are tracked in
PUBLIC_S3_DEMO.md