Skip to content

Latest commit

 

History

History
364 lines (302 loc) · 10.8 KB

File metadata and controls

364 lines (302 loc) · 10.8 KB

HTTP API

This document describes the HTTP surface that is actually implemented in server/leaderless_log_broker.py.

The broker can run in three roles selected by --role or LLOG_BROKER_ROLE:

  • write: enables POST /produce
  • read: enables POST /consume
  • both: enables both endpoints and is the default

Endpoints

GET /health

Returns:

{
  "status": "ok",
  "broker_id": "broker-1",
  "host": "127.0.0.1",
  "port": 8080,
  "started_at_ms": 1760000000000
}

Status:

  • 200 OK when the broker process is up
  • 404 Not Found for any other GET path

GET /metrics

Returns:

  • JSON with content type application/json
  • top-level objects:
    • broker
    • http
    • batching
    • oxia
    • s3

Minimum broker metadata in the snapshot:

  • broker.broker_id
  • broker.host
  • broker.port
  • broker.started_at_ms
  • broker.roles
  • broker.batch_settings.max_bytes
  • broker.batch_settings.max_delay_ms
  • broker.batch_settings.max_buffer_bytes

Batching semantics:

  • broker.batch_settings.max_bytes is the sealed batch target. Once a batch reaches it, the batch is handed to the flusher and later appends go into a new batch.
  • broker.batch_settings.max_buffer_bytes is the total uncommitted payload memory cap across the in-flight flushing batch and any queued future batches. Backpressure is returned only when that cap would be exceeded.

Current HTTP counters include:

  • http.inbound_requests
  • http.accepted_produce_requests
  • http.rejected_produce_requests
  • http.malformed_request_count
  • http.response_status_counts
  • http.backpressure_rejected_count
  • http.records_accepted
  • http.request_payload_bytes_accepted
  • http.distinct_topics_seen
  • http.distinct_topic_partitions_seen
  • http.consume_requests_total
  • http.consume_request_errors_total
  • http.consume_records_returned_total
  • http.consume_bytes_returned_total
  • http.consume_partition_results_total
  • http.consume_blob_cache_hits_total
  • http.consume_blob_cache_misses_total
  • http.consume_blob_cache_full_object_gets_total
  • http.consume_blob_cache_range_gets_total
  • http.consume_long_poll_waiters_current
  • http.consume_long_poll_hits_total
  • http.consume_long_poll_timeouts_total
  • http.consume_backpressure_rejections_total

Current batching, Oxia, and S3 counters include:

  • enqueue and flush totals
  • partition initialization attempts, cache hits and misses, successes, and failures
  • offsets assigned and committed
  • index-entry writes
  • meta/control reads and CAS stats
  • pending append recovery and materialization
  • Oxia client errors by exception type
  • S3 PUT count, bytes, latency totals, and failures
  • s3.billing request-cost and bucket-footprint estimates

Current s3.billing fields include:

  • pricing_model
  • storage_usd_per_gb_month
  • put_usd_per_1000
  • get_usd_per_1000
  • bucket_usage_refresh_interval_seconds
  • bucket_usage_last_refreshed_at_ms
  • bucket_usage_scan_success
  • bucket_usage_scan_error
  • bucket_object_count
  • bucket_stored_bytes
  • bucket_stored_gb
  • estimated_request_cost_usd
  • estimated_monthly_storage_cost_usd

Notes:

  • the request-cost estimate is derived from observed put, list, get, and range_get counts
  • the bucket-size fields come from cached list_objects_v2 scans under the configured LLOG_ROOT_PREFIX
  • pricing is a hardcoded S3 Standard us-east-1 estimate, not an authoritative AWS bill

Status:

  • 200 OK for /metrics
  • 404 Not Found for any other GET path

GET /metrics/prometheus

Returns:

  • Prometheus text exposition with content type text/plain; version=0.0.4; charset=utf-8

Current exported families include:

  • leaderless_log_inflight_requests
  • leaderless_log_batch_waiters_current
  • leaderless_log_batch_buffer_payload_bytes
  • leaderless_log_batch_append_payload_bytes_enqueued_total
  • leaderless_log_s3_put_requests_total
  • leaderless_log_s3_put_uploaded_bytes_total
  • leaderless_log_s3_put_failures_total
  • leaderless_log_batch_queued_payload_bytes_current
  • leaderless_log_batch_inflight_flush_bytes_current
  • leaderless_log_batch_pending_batches_current
  • leaderless_log_batch_sealed_batches_current
  • leaderless_log_batch_oldest_pending_batch_age_seconds
  • leaderless_log_produce_requests_total
  • leaderless_log_consume_long_poll_waiters_current
  • leaderless_log_consume_requests_total
  • leaderless_log_consume_records_returned_total
  • leaderless_log_consume_bytes_returned_total
  • leaderless_log_consume_partition_results_total
  • leaderless_log_consume_blob_cache_hits_total
  • leaderless_log_consume_blob_cache_misses_total
  • leaderless_log_consume_blob_cache_full_object_gets_total
  • leaderless_log_consume_blob_cache_range_gets_total
  • leaderless_log_consume_long_poll_hits_total
  • leaderless_log_consume_long_poll_timeouts_total
  • leaderless_log_consume_backpressure_rejections_total
  • leaderless_log_consume_request_duration_seconds
  • leaderless_log_consume_blob_cache_request_peak_bytes
  • leaderless_log_batch_backpressure_rejections_total
  • leaderless_log_batch_flushes_total
  • leaderless_log_batch_queue_wait_seconds
  • leaderless_log_batch_seal_to_flush_start_seconds
  • leaderless_log_batch_flush_duration_seconds
  • leaderless_log_oxia_operations_total
  • leaderless_log_oxia_operation_duration_seconds
  • leaderless_log_s3_operations_total
  • leaderless_log_s3_operation_duration_seconds
  • leaderless_log_s3_estimated_request_cost_usd_total
  • leaderless_log_s3_bucket_stored_bytes
  • leaderless_log_s3_bucket_stored_gb
  • leaderless_log_s3_bucket_objects
  • leaderless_log_s3_estimated_monthly_storage_cost_usd
  • leaderless_log_s3_bucket_usage_refresh_timestamp_seconds
  • leaderless_log_s3_bucket_usage_refresh_failures_total
  • leaderless_log_shared_wal_blob_bytes

Status:

  • 200 OK for /metrics/prometheus
  • 404 Not Found for any other GET path

POST /produce

Request body:

{
  "topic_partitions": [
    {
      "topic": "orders",
      "partition": 0,
      "records": [
        "alpha",
        {"base64": "AAE="}
      ]
    }
  ]
}

Rules:

  • request body must be valid UTF-8 JSON
  • top-level payload must be an object
  • topic_partitions must be a non-empty array
  • each item must contain:
    • topic: non-empty string
    • partition: non-negative integer
    • records: non-empty array
  • each record must be either:
    • a JSON string, encoded to bytes with UTF-8
    • an object with exactly one base64 field

Success response:

{
  "results": [
    {
      "topic": "orders",
      "partition": 0,
      "ok": true,
      "start_offset": 1,
      "end_offset": 2,
      "count": 2,
      "index_key": "llog/orders/partitions/0/index/00000000000000000002",
      "wal_uri": "s3://leaderless-log-wal/llog/wal-shared/<uuid>"
    }
  ],
  "success_count": 1,
  "error_count": 0
}

Failure item shape:

{
  "topic": "orders",
  "partition": 0,
  "ok": false,
  "error_type": "BackPressureRejected",
  "error": "buffer full"
}

Status semantics:

  • 200 OK when every topic-partition batch succeeds
  • 503 Service Unavailable when every result failed and every failure is BackPressureRejected
  • 409 Conflict for any other response that contains at least one failed batch
  • 400 Bad Request for missing Content-Length, empty bodies, malformed JSON, or invalid request shape
  • 404 Not Found for any other POST path

Operational notes:

  • partitions are initialized lazily on write and cached in broker memory after the first successful initialization
  • one request can contain multiple topic-partition batches
  • there is no cross-partition transaction
  • mixed success is allowed and returned per batch
  • the endpoint is available only when the broker role includes write

POST /consume

Request body:

{
  "topic_partitions": [
    {
      "topic": "orders",
      "partition": 0,
      "fetch_offset": 1,
      "partition_max_bytes": 1048576
    }
  ],
  "max_wait_ms": 250,
  "min_bytes": 1,
  "max_bytes": 52428800
}

Rules:

  • request body must be valid UTF-8 JSON
  • top-level payload must be an object
  • topic_partitions must be a non-empty array
  • each item must contain:
    • topic: non-empty string
    • partition: non-negative integer
    • fetch_offset: integer >= 1
  • optional fields:
    • partition_max_bytes: positive integer
    • max_wait_ms: non-negative integer, clamped by the broker cap
    • min_bytes: non-negative integer
    • max_bytes: positive integer

Success response:

{
  "results": [
    {
      "topic": "orders",
      "partition": 0,
      "ok": true,
      "high_watermark": 3,
      "start_offset": 1,
      "end_offset": 2,
      "next_fetch_offset": 3,
      "record_count": 2,
      "records": [
        {"offset": 1, "payload": "alpha"},
        {"offset": 2, "base64": "AAE="}
      ]
    }
  ],
  "success_count": 1,
  "error_count": 0
}

Failure item shape:

{
  "topic": "orders",
  "partition": 0,
  "ok": false,
  "error_type": "BlobNotFound",
  "error": "missing"
}

Status semantics:

  • 200 OK when the request is parsed and the consume service returns results
  • 503 Service Unavailable when the consume service rejects the request for backpressure
  • 400 Bad Request for missing Content-Length, empty bodies, malformed JSON, or invalid request shape
  • 404 Not Found when the broker role does not include read, or for any other POST path

Operational notes:

  • one request can fetch multiple topic-partitions
  • per-partition failures do not fail the whole request
  • records are returned as UTF-8 strings when decodable, otherwise as {"base64":"..."}
  • long polling is controlled by max_wait_ms and min_bytes
  • POST /consume checks the local tail cache before it falls back to Oxia/S3
  • a tail-cache response can also be "known empty at cached tail" with a high_watermark but no records, so long polls can sleep locally instead of probing Oxia/S3 again immediately
  • only cache misses, gaps, or evictions fall back to LeaderlessLogReader
  • when multiple requested partitions resolve to the same shared WAL object, the broker fetches that blob once per request and demuxes it in memory
  • recent local writes are eligible for a tail-cache hit before the reader falls back to Oxia/S3; the cache is in-process for same-process write+read brokers, or file-backed and cross-process when LLOG_TAIL_CACHE_DIR points writer and reader processes at the same local directory
  • LLOG_TAIL_CACHE_MAX_BYTES=0 disables the tail cache; the default cap is 536870912
  • the endpoint is available only when the broker role includes read

Definition Status

The HTTP API is moderately defined:

  • request validation is explicit in code
  • status code behavior is explicit in code
  • unit tests cover request parsing and status selection
  • there is no OpenAPI spec, versioned schema, auth layer, or broader admin/read API yet