feat(source-amazon-seller-partner): check for existing reports before creating new ones#76093
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
Note 📝 PR Converted to Draft More info...Thank you for creating this PR. As a policy to protect our engineers' time, Airbyte requires all PRs to be created first in draft status. Your PR has been automatically converted to draft status in respect for this policy. As soon as your PR is ready for formal review, you can proceed to convert the PR to "ready for review" status by clicking the "Ready for review" button at the bottom of the PR page. To skip draft status in future PRs, please include |
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
|
Deploy preview for airbyte-docs ready!
Deployed with vercel-action |
|
|
/publish-connectors-prerelease
|
|
/publish-connectors-prerelease
|
ceaa531 to
2e963cc
Compare
83986e4 to
6747652
Compare
|
↪️ Triggering Reason: PR is ready for review with no prior |
Reviewing PR for connector safety and quality.
|
AI PR Review ReportReview Action: NO ACTION (INCONCLUSIVE)
📋 PR DetailsConnector & PR InfoConnector(s): Risk LevelLevel: 4 — High Risk Level is reported for downstream consumers (e.g. auto-merge policy, reviewer routing). It does not change the review action — APPROVE here means "no blocking objection," not "safe to merge unattended." Review Action DetailsNO ACTION (INCONCLUSIVE) - The
🔍 Gate Evaluation DetailsGate-by-Gate Analysis
PR Hygiene — PASSPR Description Check:
Docs Changelog Check:
Peer Feedback Check:
Test Coverage — PASS
Code Security — PASSPath-based patterns: No changed files match auth/credential/token path patterns. Keyword-based patterns (diff-hunk scoped):
Behavioral Changes — FAIL (WARNING enforcement)Operational risk keywords found in diff hunks:
Note: This is a WARNING-level gate that elevates Risk Level but does not by itself block APPROVE. Breaking Dependencies — WARNING
CI Checks — UNKNOWN
The Live / E2E Tests — PASSValidation-Required Detection:
Evidence (Priority 2 — CI Check-Runs):
Backwards Compatibility — PASSSpec Comparison: No spec file (spec.json, spec.yaml) was modified. Other breaking change indicators:
Forwards Compatibility — PASSKeywords checked in diff hunks:
Tests proving forwards compatibility:
📚 Evidence ConsultedEvidence
❓ How to RespondProviding Context or JustificationThe review is currently INCONCLUSIVE because the
You can add explanations that the bot will see on the next review: Option 1: PR Description (recommended) ## AI PR Review Justification
### {Gate Name}
[Your explanation here]Option 2: PR Comment After adding your response, re-run Note: Justifications provide context for the bot to evaluate. For some gates (like the Live / E2E Tests gate), a sufficient justification can lead to PASS. For other gates, justifications help explain the situation but may still require escalation if the gate cannot be remediated. |
e57de39 to
7bcd592
Compare
| requested_marketplace_ids = body_json.get("marketplaceIds", []) if body_json else [] | ||
|
|
||
| if report_type: | ||
| existing_report = self._find_existing_report( |
There was a problem hiding this comment.
Should we check whether max_done_report_age_hours > 0 here?
There was a problem hiding this comment.
No — max_done_report_age_hours only controls whether DONE reports are reused. The pre-check (_find_existing_report) is still valuable when max_done_report_age_hours == 0 because it reuses reports in other statuses:
- IN_QUEUE / IN_PROGRESS — avoids creating a duplicate while one is already running
- CANCELLED — reused so the CDK maps it to SKIPPED (no retry, no wasted API call)
- FATAL — reused so CDK retries don't create new reports
Only DONE reports are filtered out by _is_report_fresh when the config is 0. So the GET /reports call should always happen regardless of the config value.
118cd8a to
36b6966
Compare
Reviewing PR for connector safety and quality.
|
|
↪️ Triggering Reason: Previous AI review returned UNKNOWN. Re-running review for definitive result. |
AI PR Review ReportReview Action: NO ACTION (INCONCLUSIVE)
📋 PR DetailsConnector & PR InfoConnector(s): Risk LevelLevel: 4 — High Risk Level is reported for downstream consumers (e.g. auto-merge policy, reviewer routing). It does not change the review action — APPROVE here means "no blocking objection," not "safe to merge unattended." Review Action DetailsNO ACTION (INCONCLUSIVE) — The
🔍 Gate Evaluation DetailsGate-by-Gate Analysis
Spec Comparison:
Behavioral Changes Detail:
📚 Evidence ConsultedEvidence
❓ How to RespondProviding Context or JustificationYou can add explanations that the bot will see on the next review: Option 1: PR Description (recommended) ## AI PR Review Justification
### {Gate Name}
[Your explanation here]Option 2: PR Comment After adding your response, re-run Note: The CI Checks gate is UNKNOWN because tests are still running. Once CI completes, re-run |
|
↪ Triggering Reason: Re-review needed — substantive commits since last UNKNOWN review at 22e54eb (refactor: filter by marketplaceIds in API; refactor: move CANCELLED check after date/freshness filters; current head c5ffcfb). |
Reviewing PR for connector safety and quality.
|
Add max_done_report_age_hours to connector spec (integer, min=0, max=24, default=0). When 0 (default), DONE reports are never reused — new reports are always created. When set to a positive value, DONE reports younger than that threshold are reused. IN_PROGRESS/IN_QUEUE/CANCELLED/FATAL reports are unaffected by this setting. Updated 48 unit tests covering all configurable threshold scenarios. Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
… dict directly Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
…od using self.config Co-Authored-By: bot_apk <apk@cognition.ai>
…avior Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
…d reports Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
…w report CANCELLED reports are no longer reused. Instead, a new report is created, which acts as a retry mechanism: if data is now available the new report will succeed; if still no data, the new report will also be CANCELLED and the CDK's SKIPPED status mapping will handle it silently. Co-Authored-By: bot_apk <apk@cognition.ai>
…return first result - _fetch_reports now passes pageSize=100 and marketplaceIds query params - Removed _marketplace_ids_match (filtering done server-side) - _find_existing_report returns first match (API sorts by createdTime desc) - Removed best-candidate tracking loop Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
… noisy logs Co-Authored-By: bot_apk <apk@cognition.ai>
…uester pre-check Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
0cf61fa to
a401559
Compare
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
|
🙋 Escalating to human review per Hands-Free AI Triage Project triage next step. Reason: |
Co-Authored-By: bot_apk <apk@cognition.ai>
|
/force-merge reason="long running ci for this source, passed for previous commit"
|
What
Addresses oncall#11837 — reduces 429 rate limit errors on report creation by reusing existing SP-API reports instead of always calling
createReport.How
ReportCreationRequester(components.py)Custom
HttpRequesterthat queriesGET /reportsbefore creating a new one. If a matching report exists (samereportType, date range,marketplaceIds), it's reused — nocreateReportcall is made. Falls back to normal creation on any error._fetch_reportspassesreportTypes,marketplaceIds, andpageSize=100as query params to the API. No client-side marketplace matching needed.createdTimedescending, so the first matching report in the response is the most recently created.SKIPPEDstatus mapping handles it silently. The CANCELLED check runs after date range and freshness filters to avoid noisy logs for unrelated CANCELLED reports.max_done_report_age_hoursconfig param (0–24, default 0). When 0, DONE reports are never reused; when >0, only DONE reports younger than the threshold are reused. Non-DONE reports (IN_QUEUE, IN_PROGRESS, FATAL) always reused regardless.CANCELLED → SKIPPED (
manifest.yaml)Per SP-API docs, CANCELLED means "no data to return." Mapped to CDK
SKIPPEDstatus (airbyte-python-cdk#980) — silently drops the partition with no retries, no record fetching, no state advancement.Other changes
request_body_jsonpassthrough — overrides__post_init__to ensure manifest-defined request body JSON is available viaInterpolatedRequestOptionsProvidermax_done_report_age_hoursto setup guides, new "Report reuse" section, rate limit mitigation recommendation, and a "Report cancellation and data loss" section explaining auto vs manual cancellation with a caution about data loss7.18.0(includes SKIPPED async job status)SECRET_SOURCE-AMAZON-SELLER-PARTNER_OLD_DATA_CREDSinmetadata.yaml(no longer valid)5.7.6Updates since last revision
GET /reportsmock (_get_reports_request/_get_reports_response) to 7 test methods that usehttp_mocker.clear_all_matchers(). These tests previously only mocked thePOST /reportsendpoint and failed withNoMockAddressafter theReportCreationRequesterpre-check was introduced.test_given_http_error_500_on_create_report_when_read_then_no_records_and_error_logged(bothTestFullRefreshandTestVendorSalesReportsFullRefresh) from"Giving up _send(...) after 6 tries"to"Max retry limit reached after"to match CDK 7.18.0's changed backoff error message._marketplace_ids_match— marketplace filtering moved server-side viamarketplaceIdsquery param_fetch_reportsnow sendspageSize=100(API max) for more complete first page_find_existing_reportto return first match instead of tracking best candidate (API pre-sorts bycreatedTimedesc)Review guide
components.py—ReportCreationRequester:send_requestoverride,_fetch_reports(withpageSize/marketplaceIdsparams),_find_existing_report(date match → freshness → CANCELLED skip → first-match return),_is_report_fresh,_date_ranges_match,_build_synthetic_responsemanifest.yaml—CustomRequesterswap,skipped: [CANCELLED]/timeout: [], newmax_done_report_age_hoursspec propertytest_report_creation_requester.py— 43 unit tests covering fetch params, matching, staleness, CANCELLED skipping, configurable thresholdstest_report_based_streams.py— integration tests for CANCELLED→SKIPPED, incremental state invariance (CANCELLED + empty DONE), plus GET /reports mocks for error-path testsamazon-seller-partner.md— setup guides, performance considerations, report cancellation/data-loss docsReview checklist
__post_init__creates_request_options_providertwice (super creates one, then we replace it) — verify no side effects_build_synthetic_responseusesResponse._content(private attr) — acceptable?CDK version must be updated from prerelease to GA before merge— updated to7.18.0_date_ranges_matchcompares full datetime with==— if SP-API returns slightly different time components for the same logical range, match fails (safe fallback to create, but could miss reuse)max_done_report_age_hours— relies solely on manifest specminimum: 0/maximum: 24. Out-of-range values would be silently accepted if spec validation is bypassed.marketplaceIdspassed as comma-separated string to GET /reports — verify this matches SP-API expected format (vs. repeated query params)createdTimedescending — if API sorting changes, wrong report could be selected (safe fallback: creates new report if no match)"Max retry limit reached after"— if CDK version changes again, these assertions may breakUser Impact
Customers hitting 429 rate limits on report creation should see significantly fewer
createReportcalls. CANCELLED reports are skipped and retried with a new report — if data is now available it succeeds, if not the CDK silently skips it. Newmax_done_report_age_hoursconfig (highly recommended for rate-limited connections) enables reuse of recent completed reports. No config changes required for the base optimization — in-progress reports are always reused by default.Documentation explicitly warns users not to manually cancel reports that are being used by Airbyte syncs, as this can cause data loss for that period.
Can this PR be safely reverted and rolled back?
Link to Devin session: https://app.devin.ai/sessions/45ba0c82b9654fbb8ce037502492bd3d