Skip to content

Performance optimization for multi-country market forecasting with rate limiting - seeking guidance #68

@methunraj

Description

@methunraj

We're using gdeltdoc for a market forecasting application that analyzes news sentiment across 50+ countries with 15-20 search topics. We need to respect the 5-second rate limit to avoid IP
bans, but this creates significant performance challenges.

Current Implementation

  • Processing 50 countries × 15 topics (in batches of 3)
  • Each country requires 5 batches (15÷3)
  • With 5-second rate limiting: 5 batches × 5 seconds = 25 seconds per country
  • Total time: 50 countries × 25 seconds = 1,250 seconds (21 minutes minimum)
  • With random 5-10 second delays for safety: ~35-40 minutes

Code Structure

# Current approach - sequential due to rate limiting
for country in countries:
    for topic_batch in topic_batches:  # 3 topics per batch
        time.sleep(random.uniform(5, 10))  # Rate limit
        results = gdelt.article_search(filters)

## Question

Is there a recommended approach to optimize performance while respecting rate limits? Specifically:

1. Can we make parallel requests for different countries? (e.g., 4 different country queries simultaneously, each respecting 5-second spacing)
2. Is there a higher rate limit available for academic/commercial use cases?
3. Are there any best practices for bulk country-topic analysis that we should follow?
4. Would caching be acceptable for 24-hour periods to avoid repeated queries?

## Use Case

Market forecasting system analyzing news sentiment impact on infrastructure/technology markets across global regions. Need to balance comprehensive coverage with reasonable processing time.

Any guidance on optimizing this workflow while being a responsible API user would be greatly appreciated!

## EnvironmentPython 3.xgdeltdoc latest versionThreadPoolExecutor for parallel processing (currently serialized due to global rate limiter)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions