[RFC/Architectural] Implementation of distributed circuit breaker and global rate-limit governor

### The Issue:
If 20 workers are running the same analyzer and hit a rate limit, they will all fail and retry independently. There is no circuit breaker to pause that specific analyzer globally while the provider is down.

IntelOwl currently lacks a distributed state aware mechanism to manage external API health and rate limits across multiple Celery workers. Currently, each analyzer task operates in isolation. During an upstream service outage or a `429 Too Many Requests` event, the system continues to flood the provider with doomed requests, leading to worker starvation, API key reputation degradation and retry.

### Technical Root Cause
The logic in `api_app/analyzers_manager/` (specifically within the Celery task execution loop) handles retries locally without checking a global registry.

### Proposed Solution

I propose an adaptive Redis-backed Governor that implements: 

**Global Rate Limiter**:  A shared token bucket in Redis to enforce API limits across all workers.
**Circuit Breaker Pattern**: Automatically transition an analyzer to an OPEN state after _N_ consecutive failures, with a HALF-OPEN state for periodic health probing.



I found this out while testing the FullHunt Analyzer and threat intelligence chatbot, I observed that upstream API rate limiting caused immediate worker pool saturation. The lack of a global, state aware coordination layer allows redundant tasks to trigger retry actions, thus exposing a critical gap in IntelOwl’s distributed orchestration.


### If implemented it can reduce:
**Worker Starvation:** A single failing external API can clog the entire task queue with retries, preventing local analyzers (like Yara or PE-scan) from running.
**API Reputation:** Repeatedly hitting a `429 `or `5xx` endpoint can lead to the organization's API keys being blacklisted.
**SOC Blind Spot:** Attackers can time activities during API volatility to ensure automated enrichment fails silently.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC/Architectural] Implementation of distributed circuit breaker and global rate-limit governor #3553

The Issue:

Technical Root Cause

Proposed Solution

If implemented it can reduce:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[RFC/Architectural] Implementation of distributed circuit breaker and global rate-limit governor #3553

Description

The Issue:

Technical Root Cause

Proposed Solution

If implemented it can reduce:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions