Skip to content

Implement asynchronous/background job execution for deliver_contents #183

@payamnj

Description

@payamnj

Summary

Currently, DeliverContentsJob processes each delivery sequentially in a while-loop — one email sent, then the next. This is the bottleneck when a large batch of deliveries are due at the same time, since each SMTP call blocks before the next one starts.

Decided approach: ThreadPoolExecutor

After reviewing the codebase, ThreadPoolExecutor is the right fit for a library:

  • The job is I/O-bound (SMTP + DB reads) — threads are the correct primitive, not processes
  • Zero additional dependencies required, which matters for a reusable library
  • No Celery, RQ, or Django-Q required on the adopter's side

What already works in our favour

DatabaseDeliveryQueue.get_next_batch() already uses SELECT FOR UPDATE SKIP LOCKED when claiming delivery schedules, so concurrent threads cannot double-deliver the same email — the DB-level locking is already solid.

One thing to fix before adding threads

next_task() shares a single self._task_iterator across calls. Multiple threads calling it concurrently would race on next(). Fix options:

  • Add a threading.Lock around next() in next_task()
  • Or have each worker call get_next_batch() independently rather than sharing the iterator

The second option is cleaner — each thread pulls its own batch, and skip_locked=True ensures no overlap.

Django DB connection handling

Each worker thread needs its own DB connection. Call django.db.close_old_connections() at the start of each worker function to ensure Django opens a fresh connection per thread rather than sharing one.

Implementation plan

  1. Make next_task() / batch fetching thread-safe (see above)
  2. Add DELIVERY_WORKERS to the DJANGO_EMAIL_LEARNING settings (default: 1 for backward compatibility)
  3. Replace the sequential while-loop in _run_job with a ThreadPoolExecutor(max_workers=DELIVERY_WORKERS)
  4. Call django.db.close_old_connections() at the start of each worker
  5. Document DELIVERY_WORKERS in the configuration docs

Acceptance criteria

  • DELIVERY_WORKERS = 1 (default) behaves identically to today — no breaking change
  • DELIVERY_WORKERS > 1 processes deliveries concurrently using threads
  • No double-deliveries under concurrent workers (guaranteed by skip_locked)
  • DB connections are properly managed per thread
  • DELIVERY_WORKERS is documented in the configuration reference
  • Existing tests pass; new test covers concurrent delivery without duplicates

Out of scope

Celery/RQ/Django-Q integration and true async HTTP responses (202 semantics on the API endpoint) are not part of this issue. ThreadPoolExecutor within the management command is sufficient and keeps the library dependency-free.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions