Skip to content

[RFC/Architectural] Implementation of distributed circuit breaker and global rate-limit governor #3553

@niy-ati

Description

@niy-ati

The Issue:

If 20 workers are running the same analyzer and hit a rate limit, they will all fail and retry independently. There is no circuit breaker to pause that specific analyzer globally while the provider is down.

IntelOwl currently lacks a distributed state aware mechanism to manage external API health and rate limits across multiple Celery workers. Currently, each analyzer task operates in isolation. During an upstream service outage or a 429 Too Many Requests event, the system continues to flood the provider with doomed requests, leading to worker starvation, API key reputation degradation and retry.

Technical Root Cause

The logic in api_app/analyzers_manager/ (specifically within the Celery task execution loop) handles retries locally without checking a global registry.

Proposed Solution

I propose an adaptive Redis-backed Governor that implements:

Global Rate Limiter: A shared token bucket in Redis to enforce API limits across all workers.
Circuit Breaker Pattern: Automatically transition an analyzer to an OPEN state after N consecutive failures, with a HALF-OPEN state for periodic health probing.

I found this out while testing the FullHunt Analyzer and threat intelligence chatbot, I observed that upstream API rate limiting caused immediate worker pool saturation. The lack of a global, state aware coordination layer allows redundant tasks to trigger retry actions, thus exposing a critical gap in IntelOwl’s distributed orchestration.

If implemented it can reduce:

Worker Starvation: A single failing external API can clog the entire task queue with retries, preventing local analyzers (like Yara or PE-scan) from running.
API Reputation: Repeatedly hitting a 429 or 5xx endpoint can lead to the organization's API keys being blacklisted.
SOC Blind Spot: Attackers can time activities during API volatility to ensure automated enrichment fails silently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions