feat(quota): enforce monthly per-user API call quota (PR-M)#30
Merged
Conversation
Closes #215. The pricing page advertises 500 / 10 000 / 100 000 calls
per month for Free / Pro / Business tiers, and these limits have been
sitting in app/core/quotas.py marked "informational (used for UI
display / future enforcement)" since they were defined. PR-M wires
that future enforcement so the system actually keeps the promise the
pricing page makes.
Architecture
------------
app/core/usage.py is the single home for the writer + the gate.
Both helpers own their AsyncSession (mirrors app/core/audit.py and
app/core/metrics.py — request-path code does not thread `db=`).
- record_usage(user_id, api_key_id, endpoint, file_size_bytes,
duration_ms): writes one UsageRecord row on every successful
/convert + /compress (single + batch). Fire-and-forget; a failed
insert logs at WARNING but never breaks the request.
- enforce_monthly_quota(user): counts the user's UsageRecord rows
for the current calendar month (UTC) and raises HTTPException 429
with a Retry-After header pointing at the next-month boundary if
the user is at or above their tier limit.
Time window: calendar month, UTC. Picked over rolling-30-day because
it matches how the pricing page is read ("you get 10k per month") and
gives users a single, predictable reset boundary they can read off
their own calendar.
Counting rule: one HTTP call = one quota use, regardless of batch
size. A 25-file batch counts as 1, matching the pricing-page wording
"API calls per month". File-level counts go to the metrics table for
the cockpit. Failed conversions do NOT count toward the quota — only
completed work moves the user toward their limit.
Bypass paths:
- Anonymous tier (user is None): exempt; per-IP rate-limiter
(10/min) is the only constraint.
- Enterprise tier (api_calls_per_month=None): unlimited.
- Community Edition without DATABASE_URL: gate is a no-op (nothing
to count against); writer is a no-op too.
Wired into:
- app/api/routes/convert.py::_do_convert (single)
- app/api/routes/convert.py::_do_convert_batch (batch)
- app/api/routes/compress.py::_do_compress (single)
- app/api/routes/compress.py::_do_compress_batch (batch)
The gate runs AFTER the concurrency-slot acquisition and AFTER the
file-size check, BEFORE any disk I/O, so a refused request never
touches the temp dir.
Database
--------
Migration 007_usage_quota_index adds a composite index on
``usage(user_id, timestamp)``. The gate query
``COUNT(*) WHERE user_id=:uid AND timestamp >= :month_start`` becomes
a fast index range scan even at 100 000 rows / Business user / month.
Without the index it sequentially scans the whole usage table on
every /convert and /compress call — latency grows with
total-rows-ever, not with current-month rows.
Tests (tests/test_monthly_quota.py — 15 cases)
----------------------------------------------
- _month_start, _next_month_start helpers (3 cases incl. Dec→Jan)
- monthly_call_count: zero, current-month-only (last-month rows
excluded)
- enforce_monthly_quota: anonymous noop, enterprise noop,
below-limit noop, at-limit raises 429 with Retry-After,
pro tier 10k boundary, business tier 100k boundary (mocked count)
- record_usage: inserts one row on success, anonymous noop
- End-to-end /convert: returns 429 with Retry-After when user at
limit, returns 200 + writes a UsageRecord row when below limit
Verification
------------
pytest tests/test_monthly_quota.py -v → 15 passed
pytest tests/ → 554 passed (was 539)
ruff check + ruff format --check → clean
Docs
----
docs/api-reference.md "Rate Limiting" section now documents:
- per-tier monthly quota table
- what counts as one call (single + batch = 1 each)
- 429 response shape with Retry-After + JSON body example
- reset boundary (calendar-month UTC)
Out of scope (separate PRs)
---------------------------
- Dashboard UI: "X / Y this month" progress bar (data is now
available; render is cosmetic)
- Cockpit per-user usage table (existing /cockpit/usage-summary
is global-aggregate; per-user view is a follow-up)
- 80% / 95% advisory headers ("X-Quota-Used: 9500/10000")
- Email notification on hitting the limit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #215. The pricing page advertises 500 / 10 000 / 100 000 calls per month for Free / Pro / Business tiers, and these limits have been sitting in app/core/quotas.py marked "informational (used for UI display / future enforcement)" since they were defined. PR-M wires that future enforcement so the system actually keeps the promise the pricing page makes.
Architecture
app/core/usage.py is the single home for the writer + the gate. Both helpers own their AsyncSession (mirrors app/core/audit.py and app/core/metrics.py — request-path code does not thread
db=).Time window: calendar month, UTC. Picked over rolling-30-day because it matches how the pricing page is read ("you get 10k per month") and gives users a single, predictable reset boundary they can read off their own calendar.
Counting rule: one HTTP call = one quota use, regardless of batch size. A 25-file batch counts as 1, matching the pricing-page wording "API calls per month". File-level counts go to the metrics table for the cockpit. Failed conversions do NOT count toward the quota — only completed work moves the user toward their limit.
Bypass paths:
Wired into:
The gate runs AFTER the concurrency-slot acquisition and AFTER the file-size check, BEFORE any disk I/O, so a refused request never touches the temp dir.
Database
Migration 007_usage_quota_index adds a composite index on
usage(user_id, timestamp). The gate queryCOUNT(*) WHERE user_id=:uid AND timestamp >= :month_startbecomes a fast index range scan even at 100 000 rows / Business user / month.Without the index it sequentially scans the whole usage table on every /convert and /compress call — latency grows with total-rows-ever, not with current-month rows.
Tests (tests/test_monthly_quota.py — 15 cases)
Verification
pytest tests/test_monthly_quota.py -v → 15 passed
pytest tests/ → 554 passed (was 539)
ruff check + ruff format --check → clean
Docs
docs/api-reference.md "Rate Limiting" section now documents:
Out of scope (separate PRs)