Skip to content

Add rate-limited and aggregate logging to reduce log volume at high load#859

Open
dsingal0 wants to merge 1 commit intohuggingface:mainfrom
dsingal0:feature/rate-limited-aggregate-logging
Open

Add rate-limited and aggregate logging to reduce log volume at high load#859
dsingal0 wants to merge 1 commit intohuggingface:mainfrom
dsingal0:feature/rate-limited-aggregate-logging

Conversation

@dsingal0
Copy link
Copy Markdown

@dsingal0 dsingal0 commented Apr 8, 2026

Summary

At high load, logging every request can be overwhelming for logging solutions like Loki. This PR introduces two new CLI arguments to provide flexible logging controls:

New CLI Arguments

  • --log-sample-rate: Controls maximum request success logs per second (default: 10, set to 0 to disable per-request logging)
  • --log-aggregate-interval: Interval in seconds for logging aggregate statistics (default: 0 = disabled)

Features

Rate-Limited Logger

  • Limits per-request success logs to a maximum number per second
  • Prevents log flooding during traffic spikes
  • Backward compatible (defaults to 10 logs/sec)

Aggregate Logger

  • Tracks request statistics and logs aggregated summaries at regular intervals
  • Provides periodic summaries like:
    Request aggregate: 5234 requests (87.2/s), 12 errors | 10456789 chars (174280/s) | 2345678 tokens (39094/s)
    
  • Includes request count, error count, characters processed, and tokens processed per interval

Usage Examples

# Disable per-request logging, use aggregate logging every 60 seconds
text-embeddings-router --model-id model --log-sample-rate 0 --log-aggregate-interval 60

# Hybrid: max 2 individual logs/sec + aggregate stats every 30 seconds
text-embeddings-router --model-id model --log-sample-rate 2 --log-aggregate-interval 30

Both options can also be configured via environment variables:

  • LOG_SAMPLE_RATE (default: 10)
  • LOG_AGGREGATE_INTERVAL (default: 0)

Files Changed

  • router/src/logging.rs - Added RateLimitedLogger and AggregateLogger implementations
  • router/src/main.rs - Added CLI arguments
  • router/src/lib.rs - Updated run() signature to pass loggers
  • router/src/http/server.rs - Updated all 6 endpoint handlers to use new loggers
  • README.md - Added comprehensive documentation for the new features

Benefits

✅ Reduces logging overhead at high load
✅ Provides better observability through aggregate statistics
✅ Backward compatible - defaults similar to current behavior
✅ Works seamlessly with Loki and other log aggregation solutions
✅ Configurable via CLI args or environment variables

At high load, logging every request can be overwhelming for logging
solutions like Loki. This commit introduces two new CLI arguments:

- --log-sample-rate: Controls maximum request success logs per second
  (default: 10, set to 0 to disable per-request logging)
- --log-aggregate-interval: Interval in seconds for logging aggregate
  statistics (default: 0 = disabled)

The aggregate logger provides periodic summaries including request
count, error count, characters processed, and tokens processed per
interval, giving better observability while dramatically reducing
log volume.

Example aggregate log:
'Request aggregate: 5234 requests (87.2/s), 12 errors | 10456789 chars
(174280/s) | 2345678 tokens (39094/s)'

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant