chore(source-amplitude): upgrade CDK to v7.14.0, use weight-based rate limiting for Dashboard REST API#75406
Conversation
… Dashboard REST API streams Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
Note 📝 PR Converted to Draft More info...Thank you for creating this PR. As a policy to protect our engineers' time, Airbyte requires all PRs to be created first in draft status. Your PR has been automatically converted to draft status in respect for this policy. As soon as your PR is ready for formal review, you can proceed to convert the PR to "ready for review" status by clicking the "Ready for review" button at the bottom of the PR page. To skip draft status in future PRs, please include |
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
|
… rate limiting Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
|
/publish-connectors-prerelease
|
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 49a3a77. |
|
|
|
|
Due to a bug in the artifact generator (airbytehq/airbyte-ops-mcp#604), previous pre-release publishes from this PR registered an incorrect Corrupted version: Replacement version: Any connections currently pinned to version The underlying bug has been fixed in airbytehq/airbyte-ops-mcp#607 and airbytehq/airbyte#75435. |
|
/publish-connectors-prerelease
|
|
|
↪️ Triggering Reason: PR updates api_budget with per-endpoint rate limiting for Amplitude source. Ready for regression validation. |
Fix Validation Evidence —
|
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
|
Starting AI PR review for Session: https://app.devin.ai/sessions/e06df0ed975643c4a1f2f82b17f32907 |
AI PR Review ReportReview Action: NO ACTION (NOT ELIGIBLE)
📋 PR Details & EligibilityConnector & PR InfoConnector(s): Auto-Approve EligibilityEligible: No Review Action DetailsNO ACTION (NOT ELIGIBLE) — The Behavioral Changes gate is flagged (rate limiting policies materially changed) and requires human sign-off. Additionally, Live / E2E Tests evidence is incomplete (
🔍 Gate Evaluation DetailsGate-by-Gate Analysis
Behavioral Changes — DetailThe following rate limiting changes were detected:
Keywords matched: Human sign-off required: These changes alter how the connector interacts with the Amplitude API rate limits. A human reviewer should verify:
Live / E2E Tests — Detail
📚 Evidence ConsultedEvidence
❓ How to RespondProviding Context or JustificationYou can add explanations that the bot will see on the next review: Option 1: PR Description (recommended) ## AI PR Review Justification
### Behavioral Changes
[Your explanation here]Option 2: PR Comment After adding your response, re-run Note: The Behavioral Changes gate requires human sign-off and cannot be overridden by justification alone. A human reviewer must approve the behavioral changes. The Live / E2E Tests gate can be resolved by waiting for the |
| rates: | ||
| - limit: 900 | ||
| interval: PT1H | ||
| - limit: 8 |
There was a problem hiding this comment.
Should we configure this value based on the config value (group_by is enabled or not)?
There was a problem hiding this comment.
Good question. The CDK's api_budget doesn't support Jinja interpolation in rate limit values (the limit field is a typed integer, not an interpolated string). No existing connector uses config-based dynamic rate limits in api_budget policies.
The current approach uses the conservative estimate (group_by=country=true, 120 cost/request). When group_by is disabled, the actual cost drops to 30/request, meaning the 900 req/hr limit is 4× more conservative than needed — but safe. This is documented in the PR description.
To make this truly dynamic, we'd need either:
- A CDK enhancement to support interpolated rate limit values
- Two separate policies with config-based matcher selection (not supported either)
Daryna Ishchenko (@darynaishchenko) — should we keep the conservative approach, or would you prefer we investigate a CDK enhancement?
There was a problem hiding this comment.
devin can regex in url_path_pattern be used for this case?
There was a problem hiding this comment.
Regex in url_path_pattern matches URL paths, not config values — so it can't help with the group_by config problem since both group_by=true and group_by=false hit the same /2/users endpoint.
However, I've already used regex to solve a different problem (the shared budget issue raised by Anatolii Yatsuk (@tolik0)) — the latest commit merges both policies into a single one with url_path_pattern: "/2/(sessions/average|users)", so both endpoints now share one rate limit counter. The limits use the worst-case cost (120/request, assuming group_by=country=true) to stay safe.
There was a problem hiding this comment.
CDK Enhancement Investigation: Cost-Based Rate Limiting
I investigated the CDK's rate limiting code (airbyte_cdk/sources/streams/call_rate.py) and found that the infrastructure for weighted/cost-based requests already exists but is unused.
Key Finding
The try_acquire() method on both FixedWindowCallRatePolicy and MovingWindowCallRatePolicy accepts a weight parameter, and pyrate_limiter (the underlying library) fully supports it. However, APIBudget._do_acquire() hardcodes weight=1 on line 599:
policy.try_acquire(request, weight=1) # Always 1, regardless of endpoint costProposed Enhancement: cost Field on HttpRequestRegexMatcher
Add an optional cost field (default=1) to matchers:
api_budget:
type: HTTPAPIBudget
policies:
- type: MovingWindowCallRatePolicy
rates:
- limit: 108000 # Actual cost budget from Amplitude
interval: PT1H
- limit: 1000 # Actual burst budget
interval: PT5M
matchers:
- type: HttpRequestRegexMatcher
url_path_pattern: "/2/sessions/average"
cost: 60 # <-- NEW: per-endpoint cost
- type: HttpRequestRegexMatcher
url_path_pattern: "/2/users"
cost: 120 # <-- NEW: per-endpoint costChanges Required (~50 lines across 3 files)
declarative_component_schema.py: Addcost: Optional[int] = 1toHttpRequestRegexMatchermodelcall_rate.py: Store cost onHttpRequestRegexMatcher, add method toBaseCallRatePolicyto get cost for a matched request, updateAPIBudget._do_acquire()to pass the correct weightmodel_to_component_factory.py: Wire up thecostfield increate_http_request_matcher()
This is fully backward compatible — default cost=1 preserves all existing behavior. A future extension could support Jinja expressions for config-dependent costs (e.g., cost: "{{ 120 if config['active_users_group_by_country'] else 30 }}").
Full analysis with code details: see attached report.
Requested by Daryna Ishchenko (@darynaishchenko)
| - type: HttpRequestRegexMatcher | ||
| method: GET | ||
| url_path_pattern: "/2/sessions/average" | ||
| # /2/users: 108,000 cost/hr ÷ 120 cost/request = 900 requests/hr (conservative, assumes group_by=country) |
There was a problem hiding this comment.
Aren't two endpoints counted in the same budget?
There was a problem hiding this comment.
Yes, both /2/sessions/average and /2/users share the same Amplitude Dashboard REST API budget (108,000 cost/hr and 1,000 cost/5min). Each policy here is calculated as if the endpoint has the full budget to itself — this is a known limitation documented in the comments (lines 1068-1071) and in the PR description under "Shared budget caveat."
The CDK's api_budget doesn't support a shared cost pool across different URL matchers — each MovingWindowCallRatePolicy tracks its own independent counter. There's no way to enforce a single shared budget across two different matchers with the current CDK.
In practice, the risk is mitigated by:
concurrency_level: 5limiting total concurrent requests across all streams- Different slice sizes and sync patterns making it unlikely both streams saturate their limits simultaneously
- The 5-minute burst limits (16 and 8 req/5min) being more restrictive than the hourly limits, which further constrains burst behavior
A truly accurate shared budget would require a CDK enhancement (e.g., a shared cost-based rate limiter).
There was a problem hiding this comment.
devin can you use regex in url_path_pattern to mach both 2/sessions/average and /2/users?
There was a problem hiding this comment.
Great idea! Done — merged both policies into a single one with regex "/2/(sessions/average|users)". This way both endpoints share one rate limit counter, correctly reflecting Amplitude's shared budget.
The limits use the worst-case cost (120/request from /2/users with group_by=country) to ensure the shared budget is never exceeded regardless of the request mix:
- 900 req/hr (108,000 ÷ 120)
- 8 req/5min (1,000 ÷ 120)
This is slightly conservative for /2/sessions/average (which only costs 60/request), but safe.
Pushed in commit 09b5235.
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…rd REST API streams Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…st-case Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
|
/ai-canary-prerelease
|
🐤 Canary Prerelease Testing: StartedStarting canary prerelease testing for Plan:
|
|
Canary Prerelease: Deployment CompletePrerelease Version: 8 connections pinned (all previously on v0.7.29, no existing pins):
Coverage: 3 dataplanes (US, EU, US-Central), 5 destination types (BigQuery, Postgres, Redshift, S3/Datalake, Warehouse). Includes 1 connection from the affected customer in oncall issue. Monitoring will begin now. Next update in approximately 1-2 hours. For full customer details, see the linked private issue. |
Canary Monitoring Update (2026-03-30 15:08 UTC)Monitoring duration: approximately 50 minutes since pinning
Overall status: HEALTHY
Will continue monitoring. Next update in approximately 1 hour. |
Canary Monitoring Update (2026-03-30 15:50 UTC)Monitoring duration: approximately 90 minutes since pinning
Overall status: HEALTHY 🟢
Will continue monitoring until the 2-hour mark before posting the final canary report. |
Canary Prerelease: Final ReportConnector: SummaryThe prerelease version performed excellently across all canary connections that completed sync cycles. Over the monitoring period, 11 syncs completed across 6 connections with a 100% success rate and zero failures. Two connections with longer sync schedules (approximately 4-6 hour cycles) did not reach their next sync during the monitoring window, but their pre-pin syncs on the previous version were also successful. Detailed Results
Canary VerdictOverall Status: PASS ✅ The prerelease performed well across all canary connections. Key observations:
Since the PR is already merged, canary pins will be removed immediately as part of cleanup. For full customer details, see the linked private issue. |
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…anager.devin.ai/proxy/github.com/airbytehq/airbyte into devin/1774360379-amplitude-api-budget
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
d704986
into
master
…e limiting for Dashboard REST API (airbytehq#75406) Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
What
https://github.com/airbytehq/oncall/issues/11680
Replaces the single generic rate limit policy (60 calls/min for all endpoints) with weight-based sliding-window policies for the two Dashboard REST API streams where cost is known, based on Amplitude's documented cost-based rate limits.
Upgrades the CDK base image from v7.6.5 → v7.14.0 to use the new
weightfield onHttpRequestRegexMatcher(CDK PR #966). This allows specifying per-endpoint cost weights so the CDK deducts the correct amount from a shared rate limit budget on each request.The previous global fallback policy has been removed entirely. Endpoints without documented rate limits (events_list, annotations, cohorts, export) are now unconstrained by
api_budgetand rely solely onconcurrency_level: 5.Bumps connector version from 0.7.29 → 0.7.30.
How
Amplitude's Dashboard REST API enforces a shared budget using the formula:
cost = (# of days) × (# of conditions) × (query type cost)There are two rate limit tiers (docs):
With CDK v7.14.0's weight-based rate limiting, the policy limits are set to Amplitude's actual budgets and each matcher specifies a
weightequal to its per-request cost. The CDK deducts the matched weight from the shared budget on each request, accurately tracking cost consumption across endpoints.average_session_length/2/sessions/averageactive_users/2/usersevents_list/2/events/listThe
active_usersweight is dynamically resolved via Jinja interpolation based on theactive_users_group_by_countryconfig option:group_by=true(default): 30 days × 4 segments × 1 = 120 cost/requestgroup_by=false: 30 days × 1 × 1 = 30 cost/requestevents_listis intentionally not given a specific policy because its cost is trivially low (1 per request) and it only makes a single request per sync (full refresh, no date slicing).Other endpoints (annotations, cohorts, export) have no documented rate limits from Amplitude and are left without policies.
The 5-concurrent-request limit is enforced globally via
concurrency_level: 5. The CDK'sapi_budgetonly supports count-based rate policies, not per-endpoint concurrency policies.Policy type:
MovingWindowCallRatePolicyUses
MovingWindowCallRatePolicy(sliding window) rather thanFixedWindowCallRatePolicy(fixed window reset). The single policy specifies two rates (hourly + 5-minute) — both are enforced simultaneously, so whichever limit is hit first will throttle requests. Both matchers share the same policy, so cost is correctly deducted from a single shared budget.Review guide
airbyte-integrations/connectors/source-amplitude/manifest.yaml— api_budget policy with weight-based matchers and doc commentsairbyte-integrations/connectors/source-amplitude/metadata.yaml— CDK base image upgrade (7.6.5 → 7.14.0), version bump (0.7.29 → 0.7.30)docs/integrations/sources/amplitude.md— changelog entry for 0.7.30Key things to verify:
weightfield: This is the first connector to use theweightfeature added in CDK v7.14.0. Extra scrutiny on whether the CDK correctly deducts the matched weight is warranted.config.get('active_users_group_by_country', true)evaluates correctly in the CDK'sInterpolatedStringcontext (the factory passesconfigbutparameters={}).api_budgetrate limiting. Confirm this is acceptable given that these endpoints either have trivial cost (events_list) or no documented limits (annotations, cohorts, export).P15Dfor sessions/average,P1Mfor users). If slice sizes change in a future update, these weights would need to be updated too.User Impact
More accurate rate limiting for Dashboard REST API streams with both hourly and short-term burst protection. The weight-based approach correctly tracks shared cost consumption across endpoints, preventing budget overuse when multiple Dashboard REST API streams sync concurrently. Streams that were previously throttled by the generic 60 calls/min limit can now use their endpoint-specific cost allowances. Users with
active_users_group_by_country=falsebenefit from a lighter weight (30 vs 120), allowing more requests within the budget. Other endpoints (events_list, annotations, cohorts, export) are no longer subject to anyapi_budgetrate limiting and rely only on the global concurrency limit.Can this PR be safely reverted and rolled back?
Link to Devin session: https://app.devin.ai/sessions/642ec5f275684572aa83aa9c7da87444
Requested by: Daryna Ishchenko (@darynaishchenko)