Skip to content

feat(source-amazon-seller-partner): configurable cooldown retry and 429 error handling#77837

Merged
octavia-bot-admin[bot] merged 16 commits into
masterfrom
devin/1778160571-amazon-sp-failed-retry-cooldown
May 12, 2026
Merged

feat(source-amazon-seller-partner): configurable cooldown retry and 429 error handling#77837
octavia-bot-admin[bot] merged 16 commits into
masterfrom
devin/1778160571-amazon-sp-failed-retry-cooldown

Conversation

@darynaishchenko
Copy link
Copy Markdown
Collaborator

@darynaishchenko Daryna Ishchenko (darynaishchenko) commented May 7, 2026

What

Configures cooldown-aware deferred retry for report streams that receive Amazon's FATAL status due to SP-API's undocumented per-report-type cooldown, and adds a dedicated 429 error handler on creation_requester with configurable max retries and custom backoff.

Previously, when a report request FATALed, the CDK retried immediately 3 times — all of which failed because the cooldown (~30 min for near-real-time FBA reports) hadn't elapsed. This wasted all retry attempts and surfaced as a system_error to users.

Depends on CDK v7.19.0 which includes:

How

  1. Configurable deferred retryfailed_retry_wait_time_in_seconds on basic_async_retriever is interpolated from config with a default of 1800 (30 min), minimum of 1, and maximum of 14400 (4 hours). Connector users or support can override it via the config API without a code change.

  2. Dedicated 429 error handler — A new DefaultErrorHandler entry in the creation_requester's CompositeErrorHandler matches HTTP 429 specifically, with max_retries interpolated from config.get('creation_requester_429_max_retries', 5) and a user-facing error_message. The handler uses the existing AmazonSellerPartnerWaitTimeFromHeaderBackoffStrategy for backoff. Other errors (e.g. 403) retain the existing handler with default max retries.

  3. Hidden spec fields — Two fields added to the connector spec with airbyte_hidden: true (not visible in UI, settable via API):

    • failed_retry_wait_time_in_seconds (integer, default 1800, range 1–14400)
    • creation_requester_429_max_retries (integer, default 5, minimum 0)
  4. CDK base image bumpsource-declarative-manifest updated from 7.18.0 → 7.19.0. Unit test CDK dependency bumped from ^7.18.0^7.19.0 (required for max_retries interpolation support).

  5. Unit test fixes for deferred retry — FATAL status tests updated to work with cooldown-aware retry:

    • ConfigBuilder.with_failed_retry_wait_time_in_seconds(1) overrides the 30-min default to 1 second so tests complete quickly
    • @freezegun.freeze_time(NOW.isoformat(), tick=True) allows time to advance during the test so the deferred retry cooldown actually elapses
    • RequestBuilder.without_amz_date() removes the dynamic x-amz-date header from mock matchers — with tick=True, the header's timestamp changes on each request, but HttpRequest.matches() uses subset matching so excluding it lets mocks still match
    • Each mock provides 3 responses to cover _DEFAULT_MAX_JOB_RETRY = 3 retry cycles
    • All 314 unit tests pass (~16 minutes); FATAL tests specifically complete in ~2 minutes
  6. Troubleshooting documentation — Three documentation updates:

    • New "Reports failing with FATAL status" section explaining Amazon's cooldown mechanism, affected report types, and failed_retry_wait_time_in_seconds tuning
    • New "Report creation failing with 429 rate limit errors" section documenting the 429 retry behavior, creation_requester_429_max_retries, and max_done_report_age_hours as mitigations
    • Updated existing "Rate Limit issue for Report Streams" section to include all three configurable fields (max_done_report_age_hours, creation_requester_429_max_retries, failed_retry_wait_time_in_seconds) as recommended steps

Review guide

  1. manifest.yaml — spec properties (~line 336): two hidden config fields with defaults, descriptions, and validation constraints
  2. manifest.yamlbasic_async_retriever definition (~line 3050): configurable failed_retry_wait_time_in_seconds via Jinja interpolation
  3. manifest.yamlcreation_requester.error_handler (~line 3098): new 429 handler with interpolated max_retries, error_message, and custom backoff strategy, placed before existing 403 handler
  4. metadata.yaml — base image bump (7.18.0 → 7.19.0) and version bump (5.7.6-rc.2 → 5.7.6-rc.3)
  5. unit_tests/pyproject.toml + poetry.lock — CDK test dependency bump to ^7.19.0
  6. unit_tests/integration/config.py — new with_failed_retry_wait_time_in_seconds() builder method
  7. unit_tests/integration/request_builder.py — new without_amz_date() helper for tick-compatible mock matching
  8. unit_tests/integration/test_report_based_streams.py — FATAL tests updated with tick=True, 1s cooldown override, and 3-response mocks
  9. docs/integrations/sources/amazon-seller-partner.md — changelog entry, two new troubleshooting sections, updated Rate Limit section

Items for reviewer attention:

  • Verify both config.get('failed_retry_wait_time_in_seconds', 1800) and config.get('creation_requester_429_max_retries', 5) evaluate correctly when the keys are absent from user config (relies on CDK Jinja interpolation with dict .get())
  • Confirm hidden spec fields are resolved by the CDK during interpolation even though airbyte_hidden: true
  • Confirm CompositeErrorHandler ordering: 429 handler must precede 403 handler so each matches the intended status code
  • creation_requester_429_max_retries has minimum: 0 — setting to 0 disables 429 retries entirely; verify this is acceptable
  • FATAL tests assume _DEFAULT_MAX_JOB_RETRY = 3 — if the CDK default changes, the [...] * 3 response mocking will need updating
  • without_amz_date() relies on HttpRequest.matches() using subset matching for headers (_is_subdict) — confirm this CDK behavior is stable
  • Review troubleshooting docs for accuracy — cooldown durations are based on community reports, not official Amazon documentation

User Impact

Reports that previously failed with system_error due to Amazon's cooldown will now succeed on deferred retry. The 429-specific handler gives clearer error messages and dedicated backoff for rate-limit errors on report creation. No user-facing configuration changes are required — defaults preserve current behavior — but support can tune both failed_retry_wait_time_in_seconds and creation_requester_429_max_retries via connector config if needed.

Can this PR be safely reverted and rolled back?

  • YES 💚

Important

Active progressive rollout warning for source-amazon-seller-partner.

  • (Click to Approve:) Bypass the active progressive rollout warning for source-amazon-seller-partner in the PR comment here.

Link to Devin session: https://app.devin.ai/sessions/2cbe53a30ea14f9f8151f407f423283b
Requested by: Daryna Ishchenko (@darynaishchenko)

…L report failures

Add failed_retry_wait_time_in_seconds: 1800 (30 minutes) to the shared
basic_async_retriever definition. When Amazon SP-API returns FATAL status
due to its undocumented per-report-type cooldown, the CDK will now defer
the retry until the cooldown elapses instead of retrying immediately.

This requires the CDK change from airbytehq/airbyte-python-cdk#1016.

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
      • Bump types: patch (default), minor, major, major_rc, rc, promote.
      • The rc type is a smart default: applies minor_rc if stable, or bumps the RC number if already RC.
      • The promote type strips the RC suffix to finalize a release.
      • Example: /bump-version type=rc or /bump-version type=minor
    • /bump-progressive-rollout-version - Alias for /bump-version type=rc. Bumps with an RC suffix and enables progressive rollout.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 📝 AI Documentation:
    • /ai-docs-review - AI-powered documentation review for PRs with connector changes.
    • /ai-create-docs-pr - Creates a documentation PR for connector changes, stacked on the current PR.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

devin-ai-integration Bot and others added 2 commits May 11, 2026 15:35
…d update CDK to 7.19.0

- Make failed_retry_wait_time_in_seconds configurable via config with default 1800 (30m)
- Add separate 429 error handler on creation_requester with configurable max_retries
- Update base image to source-declarative-manifest:7.19.0 (includes deferred retry feature)

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@devin-ai-integration devin-ai-integration Bot changed the title feat(source-amazon-seller-partner): add cooldown-aware retry for FATAL report failures feat(source-amazon-seller-partner): configurable cooldown retry and 429 error handling May 11, 2026
devin-ai-integration Bot and others added 4 commits May 11, 2026 15:42
….6-rc.3

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…ields

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@darynaishchenko Daryna Ishchenko (darynaishchenko) marked this pull request as ready for review May 11, 2026 15:49
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Deploy preview for airbyte-docs ready!

Project:airbyte-docs
Status: ✅  Deploy successful!
Preview URL:https://airbyte-docs-e9eknrygk-airbyte-growth.vercel.app
Latest Commit:bdbd2a1

Deployed with vercel-action

devin-ai-integration Bot and others added 4 commits May 11, 2026 15:53
…section

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…ection

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
… section

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

source-amazon-seller-partner Connector Test Results

551 tests    80 ✅  15m 27s ⏱️
  1 suites    0 💤
  1 files    471 ❌

For more details on these failures, see this check.

Results for commit 99eb100.

♻️ This comment has been updated with latest results.

devin-ai-integration Bot and others added 2 commits May 11, 2026 16:20
…on_requester_429_max_retries spec field

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…K to 7.19.0 which supports interpolated max_retries

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@darynaishchenko
Copy link
Copy Markdown
Collaborator Author

Daryna Ishchenko (darynaishchenko) commented May 11, 2026

/publish-connectors-prerelease

Pre-release Connector Publish Started

Publishing pre-release build for connector source-amazon-seller-partner.
PR: #77837

Pre-release versions will be tagged as {version}-preview.38de536
and are available for version pinning via the scoped_configuration API.

View workflow run
Pre-release Publish: SUCCESS

Docker image (pre-release):
airbyte/source-amazon-seller-partner:5.7.6-preview.38de536

Docker Hub: https://hub.docker.com/layers/airbyte/source-amazon-seller-partner/5.7.6-preview.38de536

Registry JSON:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

Important

Active progressive rollout warning for source-amazon-seller-partner.

This PR can bypass the warning only after the PR-description ACK checkbox is checked.

Detected signals:

  • Active rollout: true
  • Rollout version: 5.7.6-rc.2
  • Rollout state: paused
  • Master version: 5.7.6-rc.2
  • Master RC marker: true
  • Bypass ACK checked: false

To bypass this warning, check this box in the PR description:

- [ ] (Click to Approve:) Bypass the active progressive rollout warning for source-amazon-seller-partner in the PR comment [here](https://github.com/airbytehq/airbyte/pull/77837#issuecomment-4429076385).

What happens if this PR is merged?

Checking the ACK box does not stop the active rollout by itself. It only allows this workflow's required check to pass. If the PR then merges, the result depends on what connector version is published after merge.

Expected outcomes by version-change type
  • No connector version change: no new connector version should be released, and the active rollout should continue unchanged.
  • RC to GA: the merged PR may publish the GA version and make it default for eligible non-pinned actors. The existing rollout is not stopped immediately by the merge, but the rollout worker can later cancel it as superseded when finalizing.
  • RCn to RCn+1: the merged PR may publish a new release candidate and replace the active RC marker. The previous incomplete rollout is canceled without unpinning, and a new rollout is created for RCn+1.

Workflow run

devin-ai-integration Bot and others added 2 commits May 12, 2026 10:59
- Add with_failed_retry_wait_time_in_seconds() to ConfigBuilder for test config override
- Add without_amz_date() to RequestBuilder for tick=True compatibility
- Update FATAL tests to use 1s cooldown + tick=True + 3 response mocks for retry cycles
- All 314 tests pass in ~16 minutes (FATAL tests complete in ~2 minutes)

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…_response

Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
@darynaishchenko
Copy link
Copy Markdown
Collaborator Author

Daryna Ishchenko (darynaishchenko) commented May 12, 2026

/force-merge reason="long running integration tests"

Force-merge job started... Check job output.

Force merge successful! This PR has been merged.

@octavia-bot-admin octavia-bot-admin Bot merged commit f3bacee into master May 12, 2026
42 of 44 checks passed
@octavia-bot-admin octavia-bot-admin Bot deleted the devin/1778160571-amazon-sp-failed-retry-cooldown branch May 12, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants