Skip to content

chore(source-amplitude): upgrade CDK to v7.14.0, use weight-based rate limiting for Dashboard REST API#75406

Merged
Daryna Ishchenko (darynaishchenko) merged 17 commits into
masterfrom
devin/1774360379-amplitude-api-budget
Mar 31, 2026
Merged

chore(source-amplitude): upgrade CDK to v7.14.0, use weight-based rate limiting for Dashboard REST API#75406
Daryna Ishchenko (darynaishchenko) merged 17 commits into
masterfrom
devin/1774360379-amplitude-api-budget

Conversation

@darynaishchenko
Copy link
Copy Markdown
Collaborator

@darynaishchenko Daryna Ishchenko (darynaishchenko) commented Mar 24, 2026

What

https://github.com/airbytehq/oncall/issues/11680

Replaces the single generic rate limit policy (60 calls/min for all endpoints) with weight-based sliding-window policies for the two Dashboard REST API streams where cost is known, based on Amplitude's documented cost-based rate limits.

Upgrades the CDK base image from v7.6.5 → v7.14.0 to use the new weight field on HttpRequestRegexMatcher (CDK PR #966). This allows specifying per-endpoint cost weights so the CDK deducts the correct amount from a shared rate limit budget on each request.

The previous global fallback policy has been removed entirely. Endpoints without documented rate limits (events_list, annotations, cohorts, export) are now unconstrained by api_budget and rely solely on concurrency_level: 5.

Bumps connector version from 0.7.29 → 0.7.30.

How

Amplitude's Dashboard REST API enforces a shared budget using the formula:
cost = (# of days) × (# of conditions) × (query type cost)

There are two rate limit tiers (docs):

  • Burst limit: 1,000 cost per 5 minutes
  • Hourly limit: 108,000 cost per hour

With CDK v7.14.0's weight-based rate limiting, the policy limits are set to Amplitude's actual budgets and each matcher specifies a weight equal to its per-request cost. The CDK deducts the matched weight from the shared budget on each request, accurately tracking cost consumption across endpoints.

Stream Endpoint Slice Conditions Query Cost Weight
average_session_length /2/sessions/average 15d 1 4 (User Sessions) 60 (static)
active_users /2/users ~30d 4 or 1 1 120 or 30 (config-driven)
events_list /2/events/list N/A (full refresh) 1 1 (no policy)

The active_users weight is dynamically resolved via Jinja interpolation based on the active_users_group_by_country config option:

weight: "{{ 120 if config.get('active_users_group_by_country', true) else 30 }}"
  • group_by=true (default): 30 days × 4 segments × 1 = 120 cost/request
  • group_by=false: 30 days × 1 × 1 = 30 cost/request

events_list is intentionally not given a specific policy because its cost is trivially low (1 per request) and it only makes a single request per sync (full refresh, no date slicing).

Other endpoints (annotations, cohorts, export) have no documented rate limits from Amplitude and are left without policies.

The 5-concurrent-request limit is enforced globally via concurrency_level: 5. The CDK's api_budget only supports count-based rate policies, not per-endpoint concurrency policies.

Policy type: MovingWindowCallRatePolicy

Uses MovingWindowCallRatePolicy (sliding window) rather than FixedWindowCallRatePolicy (fixed window reset). The single policy specifies two rates (hourly + 5-minute) — both are enforced simultaneously, so whichever limit is hit first will throttle requests. Both matchers share the same policy, so cost is correctly deducted from a single shared budget.

Review guide

  1. airbyte-integrations/connectors/source-amplitude/manifest.yaml — api_budget policy with weight-based matchers and doc comments
  2. airbyte-integrations/connectors/source-amplitude/metadata.yaml — CDK base image upgrade (7.6.5 → 7.14.0), version bump (0.7.29 → 0.7.30)
  3. docs/integrations/sources/amplitude.md — changelog entry for 0.7.30

Key things to verify:

  • First consumer of CDK weight field: This is the first connector to use the weight feature added in CDK v7.14.0. Extra scrutiny on whether the CDK correctly deducts the matched weight is warranted.
  • Cost calculations: Do the per-stream weight values (60, 120/30) match Amplitude's formula given each stream's slice size and query parameters in the manifest?
  • Jinja interpolation: Confirm that config.get('active_users_group_by_country', true) evaluates correctly in the CDK's InterpolatedString context (the factory passes config but parameters={}).
  • No fallback policy: Endpoints without a specific matcher now have no api_budget rate limiting. Confirm this is acceptable given that these endpoints either have trivial cost (events_list) or no documented limits (annotations, cohorts, export).
  • Stale weights: The weight values are derived from the connector's current slice sizes (P15D for sessions/average, P1M for users). If slice sizes change in a future update, these weights would need to be updated too.

User Impact

More accurate rate limiting for Dashboard REST API streams with both hourly and short-term burst protection. The weight-based approach correctly tracks shared cost consumption across endpoints, preventing budget overuse when multiple Dashboard REST API streams sync concurrently. Streams that were previously throttled by the generic 60 calls/min limit can now use their endpoint-specific cost allowances. Users with active_users_group_by_country=false benefit from a lighter weight (30 vs 120), allowing more requests within the budget. Other endpoints (events_list, annotations, cohorts, export) are no longer subject to any api_budget rate limiting and rely only on the global concurrency limit.

Can this PR be safely reverted and rolled back?

  • YES 💚

Link to Devin session: https://app.devin.ai/sessions/642ec5f275684572aa83aa9c7da87444
Requested by: Daryna Ishchenko (@darynaishchenko)


Open with Devin

… Dashboard REST API streams

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@octavia-bot octavia-bot Bot marked this pull request as draft March 24, 2026 13:55
@octavia-bot
Copy link
Copy Markdown
Contributor

octavia-bot Bot commented Mar 24, 2026

Note

📝 PR Converted to Draft

More info...

Thank you for creating this PR. As a policy to protect our engineers' time, Airbyte requires all PRs to be created first in draft status. Your PR has been automatically converted to draft status in respect for this policy.

As soon as your PR is ready for formal review, you can proceed to convert the PR to "ready for review" status by clicking the "Ready for review" button at the bottom of the PR page.

To skip draft status in future PRs, please include [ready] in your PR title or add the skip-draft-status label when creating your PR.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@devin-ai-integration devin-ai-integration Bot changed the title source-amplitude: update api_budget with per-endpoint rate limits for Dashboard REST API streams chore(source-amplitude): update api_budget with per-endpoint rate limits for Dashboard REST API streams Mar 24, 2026
devin-ai-integration Bot and others added 2 commits March 24, 2026 14:21
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

source-amplitude Connector Test Results

13 tests   10 ✅  7s ⏱️
 2 suites   3 💤
 2 files     0 ❌

Results for commit 266c235.

♻️ This comment has been updated with latest results.

… rate limiting

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@darynaishchenko Daryna Ishchenko (darynaishchenko) marked this pull request as ready for review March 24, 2026 15:13
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@darynaishchenko
Copy link
Copy Markdown
Collaborator Author

Daryna Ishchenko (darynaishchenko) commented Mar 24, 2026

/publish-connectors-prerelease

Pre-release Connector Publish Started

Publishing pre-release build for connector source-amplitude.
PR: #75406

Pre-release versions will be tagged as {version}-preview.9201c94
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish: SUCCESS

Docker image (pre-release):
airbyte/source-amplitude:0.7.30-preview.9201c94

Docker Hub: https://hub.docker.com/layers/airbyte/source-amplitude/0.7.30-preview.9201c94

Registry JSON:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

Deploy preview for airbyte-docs ready!

✅ Preview
https://airbyte-docs-p67m3rjaj-airbyte-growth.vercel.app

Built with commit 49a3a77.
This pull request is being automatically deployed with vercel-action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 25, 2026

Pre-release Connector Publish Started

Publishing pre-release build for connector source-amplitude.
PR: #75406

Pre-release versions will be tagged as {version}-preview.9201c94
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish FAILED for source-amplitude.

devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 25, 2026

Pre-release Connector Publish Started

Publishing pre-release build for connector source-amplitude.
PR: #75406

Pre-release versions will be tagged as {version}-preview.67cc2c0
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish FAILED for source-amplitude.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 25, 2026

Pre-release Connector Publish Started

Publishing pre-release build for connector source-amplitude.
PR: #75406

Pre-release versions will be tagged as {version}-preview.67cc2c0
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish: SUCCESS

Docker image (pre-release):
airbyte/source-amplitude:0.7.30-preview.67cc2c0

Docker Hub: https://hub.docker.com/layers/airbyte/source-amplitude/0.7.30-preview.67cc2c0

Registry JSON:

@devin-ai-integration
Copy link
Copy Markdown
Contributor

⚠️ Corrupted pre-release version(s) detected

Due to a bug in the artifact generator (airbytehq/airbyte-ops-mcp#604), previous pre-release publishes from this PR registered an incorrect dockerImageTag in the connector registry. The bare version 0.7.30 was written into the registry artifact instead of the full preview tag — but only the preview-tagged Docker image was actually pushed to DockerHub.

Corrupted version: source-amplitude:0.7.30 (image does not exist on DockerHub)

Replacement version: source-amplitude:0.7.30-preview.67cc2c0 (published with the fix, image exists on DockerHub)

Any connections currently pinned to version 0.7.30 should be re-pinned to 0.7.30-preview.67cc2c0 to resolve image pull backoff errors.

The underlying bug has been fixed in airbytehq/airbyte-ops-mcp#607 and airbytehq/airbyte#75435.


Devin session

@darynaishchenko
Copy link
Copy Markdown
Collaborator Author

Daryna Ishchenko (darynaishchenko) commented Mar 25, 2026

/publish-connectors-prerelease

Pre-release Connector Publish Started

Publishing pre-release build for connector source-amplitude.
PR: #75406

Pre-release versions will be tagged as {version}-preview.67cc2c0
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish: SUCCESS

Docker image (pre-release):
airbyte/source-amplitude:0.7.30-preview.67cc2c0

Docker Hub: https://hub.docker.com/layers/airbyte/source-amplitude/0.7.30-preview.67cc2c0

Registry JSON:

@octavia-bot
Copy link
Copy Markdown
Contributor

octavia-bot Bot commented Mar 25, 2026

🔍 AI Prove Fix session starting... Running readiness checks and testing against customer connections. View playbook

Devin AI session created successfully!

@devin-ai-integration
Copy link
Copy Markdown
Contributor

↪️ Triggering /ai-prove-fix per Hands-Free AI Triage Project triage next step.

Reason: PR updates api_budget with per-endpoint rate limiting for Amplitude source. Ready for regression validation.
https://github.com/airbytehq/oncall/issues/11680

Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor

devin-ai-integration Bot commented Mar 25, 2026

Fix Validation Evidence — /ai-prove-fix

Status: In Progress — Evidence Plan Posted, Regression Tests Running

Session: Devin Session


Pre-flight Checks
  • Viability: Fix correctly targets Dashboard REST API endpoints (/2/sessions/average, /2/users) with per-endpoint MovingWindowCallRatePolicy rate limits. Leaves Export API unconstrained as intended.
  • Safety: Only YAML manifest changes — no suspicious code patterns.
  • Breaking Change: NOT breaking. Patch version bump (0.7.29 → 0.7.30). No schema, spec, state, or stream changes.
  • Reversibility: Fully reversible — no state format changes, previous version can read state written by this version.
  • Design Intent: ⚠️ WARNING — Rate limits are calculated against the hourly budget (108,000 cost/hr) but ignore the stricter 5-minute budget (1,000 cost/5min). This makes the limits ~9x more permissive than they should be. Does not block testing but should be reviewed before merge.
Evidence Plan

Proving Criteria

  1. A sync on an internal connection with Dashboard REST API streams (average_session_length, active_users) completes successfully with the pre-release version (0.7.30-preview.67cc2c0)
  2. Regression tests pass — showing no regressions introduced by the rate limit changes
  3. No rate limit errors (429/HTTP throttling) appear in sync logs for Dashboard REST API streams

Disproving Criteria

  1. Sync fails with rate limit errors despite the new per-endpoint rate limits
  2. New errors appear that were not present in the v0.7.29 baseline
  3. Regression tests show data differences or failures

Cases Planned

  1. Internal connection (integration-test-sandboxes) — Has average_session_length + active_users streams. Currently dormant, will pin to pre-release and trigger sync.
  2. Internal connection (backup workspace A) — Same affected streams. Backup if Case 1 is blocked.
  3. Internal connection (backup workspace B) — Same affected streams. Second backup.

Regression Tests

  • Triggered comparison regression tests (awaiting results)
  • Will proceed to live connection tests after regression results are in.
Connector & PR Details

Connector: source-amplitude
PR: #75406
Pre-release Version: 0.7.30-preview.67cc2c0
Oncall Issue: Linked in private repo


Will update this comment as evidence is gathered.

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

Starting AI PR review for source-amplitude PR. Evaluating all gates now.

Session: https://app.devin.ai/sessions/e06df0ed975643c4a1f2f82b17f32907

@devin-ai-integration
Copy link
Copy Markdown
Contributor

AI PR Review Report

Review Action: NO ACTION (NOT ELIGIBLE)

Gate Status
PR Hygiene PASS
Code Hygiene PASS
Code Security PASS
Per-Record Performance PASS
Breaking Dependencies PASS
Backwards Compatibility PASS
Forwards Compatibility PASS
Behavioral Changes FAIL
Out-of-Scope Changes PASS
CI Checks PASS
Live / E2E Tests UNKNOWN

Behavioral Changes requires human sign-off — rate limiting policies were materially changed. Live / E2E Tests is inconclusive — /ai-prove-fix session is still in progress.


📋 PR Details & Eligibility

Connector & PR Info

Connector(s): source-amplitude
PR: #75406
HEAD SHA: a6f3f69108ebd555dab7ee0f97ade065f10f6b2c
Session: https://app.devin.ai/sessions/e06df0ed975643c4a1f2f82b17f32907

Auto-Approve Eligibility

Eligible: No
Category: not-eligible
Reason: PR contains functional changes to rate limiting configuration in manifest.yaml (api_budget policies changed from a single global FixedWindowCallRatePolicy to per-endpoint MovingWindowCallRatePolicy entries). This is not docs-only, additive-spec, patch/minor-deps, or comment/whitespace-only.

Review Action Details

NO ACTION (NOT ELIGIBLE) — The Behavioral Changes gate is flagged (rate limiting policies materially changed) and requires human sign-off. Additionally, Live / E2E Tests evidence is incomplete (/ai-prove-fix is still in progress). No PR review is submitted. Human review is required.

Note: This bot can approve PRs when all gates pass AND the PR is eligible for auto-approval (docs-only, additive spec changes, patch/minor dependency bumps, or comment/whitespace-only changes). PRs with other types of changes require human review even if all gates pass.

🔍 Gate Evaluation Details

Gate-by-Gate Analysis

Gate Status Enforced? Details
PR Hygiene PASS Yes Description present with What/How/Review Guide/User Impact/Rollback. Changelog updated. Version bump present.
Code Hygiene PASS WARNING YAML config changes only. 13 tests ran (10 pass, 3 skip, 0 fail). No new Python/Java code requiring additional tests.
Code Security PASS Yes No auth/credential patterns in diff. Changes are rate limit policies and documentation only.
Per-Record Performance PASS WARNING No changes to record processing loops. Rate limits are applied at HTTP request level.
Breaking Dependencies PASS WARNING No dependency files modified (no pyproject.toml, poetry.lock changes).
Backwards Compatibility PASS Blocks Auto-Approve No spec changes, no schema changes, no state format changes. Patch version bump (0.7.29 → 0.7.30). Existing connections continue to work without user action.
Forwards Compatibility PASS Blocks Auto-Approve No state format changes. Rolling back to 0.7.29 restores old rate limiting. State is not affected.
Behavioral Changes FAIL Blocks Auto-Approve Rate limiting behavior materially changed (see details below).
Out-of-Scope Changes PASS Skip All 3 changed files are within connector scope.
CI Checks PASS Yes All core checks passed: Connector CI Summary, Test source-amplitude, Lint source-amplitude, Build and Verify Artifacts, Format Check, Enforce PR structure, Check Changelog Updated.
Live / E2E Tests UNKNOWN Yes Pre-release 0.7.30-preview.67cc2c0 published. /ai-prove-fix triggered but still in progress (regression tests running). No completed validation evidence available yet.

Behavioral Changes — Detail

The following rate limiting changes were detected:

  1. Policy type changed: FixedWindowCallRatePolicyMovingWindowCallRatePolicy (sliding window instead of fixed window)
  2. Global fallback removed: The previous policy (60 calls/min for all endpoints, matchers: []) was removed entirely
  3. Per-endpoint policies added:
    • /2/sessions/average: 1,800 req/hr + 16 req/5min
    • /2/users: 900 req/hr + 8 req/5min
  4. Unconstrained endpoints: events_list, annotations, cohorts, export now have no api_budget rate limiting (rely only on concurrency_level: 5)
  5. Shared budget caveat: Both policies assume independent access to the full Amplitude budget (108,000 cost/hr, 1,000 cost/5min). If streams run concurrently, combined cost could theoretically exceed the shared budget. This is a documented CDK limitation.

Keywords matched: api_budget, MovingWindowCallRatePolicy, FixedWindowCallRatePolicy, call_limit, rates, limit, interval, concurrency_level

Human sign-off required: These changes alter how the connector interacts with the Amplitude API rate limits. A human reviewer should verify:

  • The cost-per-request calculations are correct
  • The shared budget risk is acceptable
  • The removal of the global fallback policy is intentional

Live / E2E Tests — Detail

  • Pre-release version: 0.7.30-preview.67cc2c0 (published successfully)
  • /ai-prove-fix status: In Progress (regression tests running, evidence plan posted)
  • Validation labels: None detected (live-tests-passed, prerelease-validated not present)
  • Pre-Release Checks CI: Passed (but this is the CI-level check, not live connection validation)
  • Recommendation: Wait for /ai-prove-fix to complete, then re-run /ai-review to get a definitive Live / E2E Tests verdict.
📚 Evidence Consulted

Evidence

  • Changed files: 3 files (+55 -12)
    • airbyte-integrations/connectors/source-amplitude/manifest.yaml — api_budget policy changes
    • airbyte-integrations/connectors/source-amplitude/metadata.yaml — version bump 0.7.29 → 0.7.30
    • docs/integrations/sources/amplitude.md — changelog entry
  • CI checks: 39 passed, 7 skipped, 0 failed, 0 pending (all core checks green)
  • PR labels: auto-labeled (area/connectors, connectors/source/amplitude expected)
  • PR description: Present and detailed
  • Existing bot reviews: Devin Review COMMENTED (not REQUEST CHANGES); no prior /ai-review REQUEST CHANGES for this SHA
❓ How to Respond

Providing Context or Justification

You can add explanations that the bot will see on the next review:

Option 1: PR Description (recommended)
Add a section to your PR description:

## AI PR Review Justification

### Behavioral Changes
[Your explanation here]

Option 2: PR Comment
Add a comment starting with:

AI PR Review Justification:
[Your explanation here]

After adding your response, re-run /ai-review to have the bot evaluate it.

Note: The Behavioral Changes gate requires human sign-off and cannot be overridden by justification alone. A human reviewer must approve the behavioral changes. The Live / E2E Tests gate can be resolved by waiting for the /ai-prove-fix session to complete.


Devin session

rates:
- limit: 900
interval: PT1H
- limit: 8
Copy link
Copy Markdown
Contributor

@tolik0 Anatolii Yatsuk (tolik0) Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we configure this value based on the config value (group_by is enabled or not)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The CDK's api_budget doesn't support Jinja interpolation in rate limit values (the limit field is a typed integer, not an interpolated string). No existing connector uses config-based dynamic rate limits in api_budget policies.

The current approach uses the conservative estimate (group_by=country=true, 120 cost/request). When group_by is disabled, the actual cost drops to 30/request, meaning the 900 req/hr limit is 4× more conservative than needed — but safe. This is documented in the PR description.

To make this truly dynamic, we'd need either:

  1. A CDK enhancement to support interpolated rate limit values
  2. Two separate policies with config-based matcher selection (not supported either)

Daryna Ishchenko (@darynaishchenko) — should we keep the conservative approach, or would you prefer we investigate a CDK enhancement?


Devin session

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devin can regex in url_path_pattern be used for this case?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex in url_path_pattern matches URL paths, not config values — so it can't help with the group_by config problem since both group_by=true and group_by=false hit the same /2/users endpoint.

However, I've already used regex to solve a different problem (the shared budget issue raised by Anatolii Yatsuk (@tolik0)) — the latest commit merges both policies into a single one with url_path_pattern: "/2/(sessions/average|users)", so both endpoints now share one rate limit counter. The limits use the worst-case cost (120/request, assuming group_by=country=true) to stay safe.


Devin session

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDK Enhancement Investigation: Cost-Based Rate Limiting

I investigated the CDK's rate limiting code (airbyte_cdk/sources/streams/call_rate.py) and found that the infrastructure for weighted/cost-based requests already exists but is unused.

Key Finding

The try_acquire() method on both FixedWindowCallRatePolicy and MovingWindowCallRatePolicy accepts a weight parameter, and pyrate_limiter (the underlying library) fully supports it. However, APIBudget._do_acquire() hardcodes weight=1 on line 599:

policy.try_acquire(request, weight=1)  # Always 1, regardless of endpoint cost

Proposed Enhancement: cost Field on HttpRequestRegexMatcher

Add an optional cost field (default=1) to matchers:

api_budget:
  type: HTTPAPIBudget
  policies:
    - type: MovingWindowCallRatePolicy
      rates:
        - limit: 108000       # Actual cost budget from Amplitude
          interval: PT1H
        - limit: 1000          # Actual burst budget
          interval: PT5M
      matchers:
        - type: HttpRequestRegexMatcher
          url_path_pattern: "/2/sessions/average"
          cost: 60             # <-- NEW: per-endpoint cost
        - type: HttpRequestRegexMatcher
          url_path_pattern: "/2/users"
          cost: 120            # <-- NEW: per-endpoint cost

Changes Required (~50 lines across 3 files)

  1. declarative_component_schema.py: Add cost: Optional[int] = 1 to HttpRequestRegexMatcher model
  2. call_rate.py: Store cost on HttpRequestRegexMatcher, add method to BaseCallRatePolicy to get cost for a matched request, update APIBudget._do_acquire() to pass the correct weight
  3. model_to_component_factory.py: Wire up the cost field in create_http_request_matcher()

This is fully backward compatible — default cost=1 preserves all existing behavior. A future extension could support Jinja expressions for config-dependent costs (e.g., cost: "{{ 120 if config['active_users_group_by_country'] else 30 }}").

Full analysis with code details: see attached report.


Devin session

Requested by Daryna Ishchenko (@darynaishchenko)

- type: HttpRequestRegexMatcher
method: GET
url_path_pattern: "/2/sessions/average"
# /2/users: 108,000 cost/hr ÷ 120 cost/request = 900 requests/hr (conservative, assumes group_by=country)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't two endpoints counted in the same budget?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both /2/sessions/average and /2/users share the same Amplitude Dashboard REST API budget (108,000 cost/hr and 1,000 cost/5min). Each policy here is calculated as if the endpoint has the full budget to itself — this is a known limitation documented in the comments (lines 1068-1071) and in the PR description under "Shared budget caveat."

The CDK's api_budget doesn't support a shared cost pool across different URL matchers — each MovingWindowCallRatePolicy tracks its own independent counter. There's no way to enforce a single shared budget across two different matchers with the current CDK.

In practice, the risk is mitigated by:

  • concurrency_level: 5 limiting total concurrent requests across all streams
  • Different slice sizes and sync patterns making it unlikely both streams saturate their limits simultaneously
  • The 5-minute burst limits (16 and 8 req/5min) being more restrictive than the hourly limits, which further constrains burst behavior

A truly accurate shared budget would require a CDK enhancement (e.g., a shared cost-based rate limiter).


Devin session

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devin can you use regex in url_path_pattern to mach both 2/sessions/average and /2/users?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Done — merged both policies into a single one with regex "/2/(sessions/average|users)". This way both endpoints share one rate limit counter, correctly reflecting Amplitude's shared budget.

The limits use the worst-case cost (120/request from /2/users with group_by=country) to ensure the shared budget is never exceeded regardless of the request mix:

  • 900 req/hr (108,000 ÷ 120)
  • 8 req/5min (1,000 ÷ 120)

This is slightly conservative for /2/sessions/average (which only costs 60/request), but safe.

Pushed in commit 09b5235.


Devin session

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…rd REST API streams

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title chore(source-amplitude): update api_budget with per-endpoint rate limits for Dashboard REST API streams chore(source-amplitude): upgrade CDK to v7.14.0, use weight-based rate limiting for Dashboard REST API Mar 30, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@darynaishchenko
Copy link
Copy Markdown
Collaborator Author

Daryna Ishchenko (darynaishchenko) commented Mar 30, 2026

/ai-canary-prerelease

AI Canary Prerelease Started

Rolling out to 5-10 connections, watching results, and reporting findings.
View workflow run
🐤 AI Canary Prerelease session starting... Rolling out to 5-10 connections, watching results, and reporting findings. View playbook

Devin AI session created successfully!

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🐤 Canary Prerelease Testing: Started

Starting canary prerelease testing for source-amplitude.

Plan:

  1. Evaluate breaking change / reversibility safety gates
  2. Select 5-10 canary connections
  3. Pin to prerelease version and monitor syncs
  4. Report findings

Devin session

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

Pre-release Connector Publish Started

Publishing pre-release build for connector source-amplitude.
PR: #75406

Pre-release versions will be tagged as {version}-preview.49a3a77
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish: SUCCESS

Docker image (pre-release):
airbyte/source-amplitude:0.7.31-preview.49a3a77

Docker Hub: https://hub.docker.com/layers/airbyte/source-amplitude/0.7.31-preview.49a3a77

Registry JSON:

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Canary Prerelease: Deployment Complete

Prerelease Version: 0.7.31-preview.49a3a77 (CDK v7.14.0 with weight-based rate limiting)
Approval: Approved by Daryna Ishchenko (@darynaishchenko) via Slack

8 connections pinned (all previously on v0.7.29, no existing pins):

Connection Dataplane Destination Status
Internal-A-US-Central-1 US-Central BigQuery Pinned
Customer-B-US-1 US Postgres Pinned
Customer-C-EU-1 EU BigQuery Pinned
Customer-D-US-1 US Datalake Pinned
Customer-E-EU-1 EU Redshift Pinned
Customer-F-US-Central-1 US-Central BigQuery Pinned
Customer-G-US-1 US BigQuery Pinned
Customer-H-US-1 US Warehouse Pinned

Coverage: 3 dataplanes (US, EU, US-Central), 5 destination types (BigQuery, Postgres, Redshift, S3/Datalake, Warehouse). Includes 1 connection from the affected customer in oncall issue.

Monitoring will begin now. Next update in approximately 1-2 hours.

For full customer details, see the linked private issue.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Canary Monitoring Update (2026-03-30 15:08 UTC)

Monitoring duration: approximately 50 minutes since pinning
Prerelease version: 0.7.31-preview.49a3a77

Connection Post-Pin Syncs Succeeded Failed Notes
Internal-A-US-Central-1 0 0 0 Awaiting next scheduled sync
Customer-B-US-1 1 1 0 Postgres dest, succeeded
Customer-C-EU-1 1 1 0 BigQuery dest, succeeded
Customer-D-US-1 0 0 0 Awaiting next scheduled sync
Customer-E-EU-1 0 0 0 Awaiting next scheduled sync
Customer-F-US-Central-1 1 1 0 BigQuery dest, succeeded
Customer-G-US-1 0 0 0 Awaiting next scheduled sync
Customer-H-US-1 1 1 0 Warehouse dest, succeeded

Overall status: HEALTHY

  • 4/8 connections have completed syncs on the prerelease version
  • 4/4 syncs succeeded (100% success rate)
  • 0 failures detected
  • 4 connections awaiting their next scheduled sync cycle

Will continue monitoring. Next update in approximately 1 hour.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Canary Monitoring Update (2026-03-30 15:50 UTC)

Monitoring duration: approximately 90 minutes since pinning
Prerelease version: 0.7.31-preview.49a3a77

Connection Post-Pin Syncs Succeeded Failed Notes
Internal-A-US-Central-1 0 0 0 Awaiting next scheduled sync (last pre-pin: 12:37 UTC)
Customer-B-US-1 1 1 0 Postgres dest
Customer-C-EU-1 2 2 0 BigQuery dest
Customer-D-US-1 1 1 0 BigQuery dest
Customer-E-EU-1 0 0 0 Awaiting next scheduled sync (last pre-pin: 12:22 UTC)
Customer-F-US-Central-1 2 2 0 BigQuery dest
Customer-G-US-1 1 1 0 Snowflake dest
Customer-H-US-1 2 2 0 BigQuery dest

Overall status: HEALTHY 🟢

  • 6/8 connections have completed syncs on the prerelease version
  • 9/9 syncs succeeded (100% success rate)
  • 0 failures detected
  • 3 connections have completed multiple sync cycles successfully
  • 2 connections (Internal-A-US-Central-1, Customer-E-EU-1) have longer sync schedules and haven't reached their next cycle yet

Will continue monitoring until the 2-hour mark before posting the final canary report.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Canary Prerelease: Final Report

Connector: source-amplitude
Prerelease Version: 0.7.31-preview.49a3a77
Monitoring Period: 2026-03-30 14:19 UTC to 16:12 UTC (approximately 113 minutes)
Connections Tested: 8

Summary

The prerelease version performed excellently across all canary connections that completed sync cycles. Over the monitoring period, 11 syncs completed across 6 connections with a 100% success rate and zero failures. Two connections with longer sync schedules (approximately 4-6 hour cycles) did not reach their next sync during the monitoring window, but their pre-pin syncs on the previous version were also successful.

Detailed Results

Connection Total Syncs Success Rate Issues
Internal-A-US-Central-1 0 N/A Long sync schedule; last pre-pin sync succeeded
Customer-B-US-1 2 100% None — Postgres dest
Customer-C-EU-1 2 100% None — BigQuery dest
Customer-D-US-1 1 100% None — BigQuery dest
Customer-E-EU-1 0 N/A Long sync schedule; last pre-pin sync succeeded
Customer-F-US-Central-1 2 100% None — BigQuery dest
Customer-G-US-1 2 100% None — Snowflake dest
Customer-H-US-1 2 100% None — BigQuery dest

Canary Verdict

Overall Status: PASS

The prerelease performed well across all canary connections. Key observations:

  • 11/11 syncs succeeded (100% success rate) across 6 connections
  • 0 failures detected — not just in canary connections, but across all source-amplitude connections in the dataset
  • Multiple connections completed 2+ sync cycles successfully, demonstrating stability
  • Diverse destination types tested: BigQuery, Postgres, Snowflake
  • Diverse dataplanes tested: US, EU, US-Central

Since the PR is already merged, canary pins will be removed immediately as part of cleanup.

For full customer details, see the linked private issue.


Devin session

Comment thread airbyte-integrations/connectors/source-amplitude/metadata.yaml Outdated
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Comment thread docs/integrations/sources/amplitude.md Outdated
Comment thread docs/integrations/sources/amplitude.md Outdated
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@darynaishchenko Daryna Ishchenko (darynaishchenko) merged commit d704986 into master Mar 31, 2026
42 of 44 checks passed
@darynaishchenko Daryna Ishchenko (darynaishchenko) deleted the devin/1774360379-amplitude-api-budget branch March 31, 2026 11:47
dilanalex pushed a commit to dilanalex/airbyte that referenced this pull request Apr 6, 2026
…e limiting for Dashboard REST API (airbytehq#75406)

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants