Skip to content

fix(source-s3): Consume CDK concurrent source fix#78291

Open
devin-ai-integration[bot] wants to merge 1 commit into
masterfrom
devin/1779286511-source-s3-cdk-7-19-2
Open

fix(source-s3): Consume CDK concurrent source fix#78291
devin-ai-integration[bot] wants to merge 1 commit into
masterfrom
devin/1779286511-source-s3-cdk-7-19-2

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

What

Resolves https://github.com/airbytehq/oncall/issues/12663:

A source-s3 large full refresh can start a stream but fail to emit a terminal stream status if the concurrent file-based read path starves partition readers. This bumps source-s3 from 4.15.4 to 4.15.5 and consumes airbyte-cdk 7.19.2, which includes the concurrent partition-generator cap for the file-based read path.

Requested by API User via /ai-fix.

How

  • Refresh the source-s3 lockfile so airbyte-cdk resolves to 7.19.2.
  • Add a regression test that verifies the consumed CDK defers starting another partition generator when the configured concurrent-generator cap is reached.
  • Bump connector metadata and pyproject.toml to 4.15.5 and add the docs changelog entry.

Review guide

  1. airbyte-integrations/connectors/source-s3/poetry.lock
  2. airbyte-integrations/connectors/source-s3/unit_tests/v4/test_source.py
  3. airbyte-integrations/connectors/source-s3/metadata.yaml
  4. docs/integrations/sources/s3.md

User Impact

Large source-s3 full refresh syncs should be less likely to hang after emitting STARTED without a terminal stream status. No schema, spec, stream, or state format changes are included.

Declarative-First Evaluation

source-s3 is a Python CDK file-based connector (language:python, cdk:python-file-based), not a manifest-only or low-code declarative connector. The fix is a runtime dependency update, so declarative manifest alternatives do not apply.

Breaking Change Evaluation

This is not a breaking change: it does not change schemas, primary keys, cursors, connector spec fields, streams, or state format. Standard patch versioning is used.

Test Coverage

  • Added test_airbyte_cdk_limits_concurrent_partition_generators, which fails without the CDK runtime support for max_concurrent_partition_generators and passes with airbyte-cdk 7.19.2.

Test plan

  • cd /home/ubuntu/repos/airbyte/airbyte-integrations/connectors/source-s3 && poetry run pytest unit_tests/ -x
  • cd /home/ubuntu/repos/airbyte/airbyte-integrations/connectors/source-s3 && poetry run poe check-ruff-lint
  • cd /home/ubuntu/repos/airbyte/airbyte-integrations/connectors/source-s3 && poetry run mypy unit_tests/v4/test_source.py --follow-imports=skip
  • cd /home/ubuntu/repos/airbyte-python-cdk && poetry run pytest unit_tests/sources/streams/concurrent/test_concurrent_read_processor.py -q -k 'max_concurrent_partition_generators or concurrent_limit'
  • cd /home/ubuntu/repos/airbyte-python-cdk && poetry run ruff check airbyte_cdk/sources/concurrent_source/concurrent_source.py airbyte_cdk/sources/concurrent_source/concurrent_read_processor.py unit_tests/sources/streams/concurrent/test_concurrent_read_processor.py

poetry run poe check-mypy was also attempted for the full connector, but it fails on existing source-s3 typing/stub issues outside this change (for example missing airbyte/jsonschema stubs and legacy type errors in source files).

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
      • Bump types: patch (default), minor, major, major_rc, rc, promote.
      • The rc type is a smart default: applies minor_rc if stable, or bumps the RC number if already RC.
      • The promote type strips the RC suffix to finalize a release.
      • Example: /bump-version type=rc or /bump-version type=minor
    • /bump-progressive-rollout-version - Alias for /bump-version type=rc. Bumps with an RC suffix and enables progressive rollout.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 📝 AI Documentation:
    • /ai-docs-review - AI-powered documentation review for PRs with connector changes.
    • /ai-create-docs-pr - Creates a documentation PR for connector changes, stacked on the current PR.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@@ -356,6 +356,7 @@ This connector utilizes the open source [Unstructured](https://unstructured-io.g

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change

@github-actions
Copy link
Copy Markdown
Contributor

Deploy preview for airbyte-docs ready!

Project:airbyte-docs
Status: ✅  Deploy successful!
Preview URL:https://airbyte-docs-fkxnh1z6o-airbyte-growth.vercel.app
Latest Commit:9e45f22

Deployed with vercel-action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

source-s3 Connector Test Results

219 tests   196 ✅  6m 41s ⏱️
  3 suites   23 💤
  3 files      0 ❌

Results for commit 9e45f22.

♻️ This comment has been updated with latest results.

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

↪️ Triggering /ai-review per Hands-Free AI Triage Project triage next step.

Reason: This linked Green P1 issue has a draft fix PR with CI passing, while a newer human-authored S3 PR is handling expanded CDK/state-throttle follow-up. AI review should verify whether this original fix PR should continue, be superseded, or be closed in favor of the newer path.

https://github.com/airbytehq/oncall/issues/12663
#78325
airbytehq/airbyte-python-cdk#1032

Devin session

@octavia-bot octavia-bot Bot marked this pull request as ready for review May 22, 2026 11:40
@octavia-bot
Copy link
Copy Markdown
Contributor

octavia-bot Bot commented May 22, 2026

AI PR Review starting...

Reviewing PR for connector safety and quality.
View playbook

Devin AI session created successfully!

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

AI PR Review is starting for this PR.

Session: https://app.devin.ai/sessions/918de40026fd41c490b65156fe64e1e3

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

devin-ai-integration Bot commented May 22, 2026

AI PR Review Report

Review Action: REQUEST CHANGES
Risk Level: 4 — source-s3 bug fix updates the Airbyte CDK runtime dependency and targets full-refresh concurrency behavior, with failed pre-release validation.

Gate Status
PR Hygiene PASS
Code Hygiene PASS
Test Coverage PASS
Code Security PASS
Per-Record Performance PASS
Breaking Dependencies WARNING
Backwards Compatibility PASS
Forwards Compatibility PASS
Behavioral Changes WARNING
Out-of-Scope Changes PASS
CI Checks UNKNOWN
Live / E2E Tests FAIL
🔧 Remediation Required

Required Actions

  • Live / E2E Tests: Fix the failing source-s3 Pre-Release Checks job. The failure is concrete validation evidence: docs/integrations/sources/s3.md contains https://github.com/airbytehq/airbyte/pull/TBD, which returns 404. Replace it with https://github.com/airbytehq/airbyte/pull/78291, then rerun validation.
  • CI Checks: Wait for the pending Test source-s3 Connector re-run to complete. If it fails, fix the failing connector tests.

📋 PR Details

Connector & PR Info

Connector(s): source-s3
PR: #78291
HEAD SHA: 9e45f22b6f63558c3e08a4fd6cfec69981d9c23f
Session: https://app.devin.ai/sessions/918de40026fd41c490b65156fe64e1e3

Risk Level

Level: 4 — High
Rationale: source-s3 bug fix updates the Airbyte CDK runtime dependency and targets full-refresh concurrency behavior; the Live / E2E validation check currently fails.

Risk Level is reported for downstream consumers (e.g. auto-merge policy, reviewer routing). It does not change the review action — APPROVE here means "no blocking objection," not "safe to merge unattended."

Review Action Details

REQUEST CHANGES - The Live / E2E Tests enforced gate failed because source-s3 Pre-Release Checks failed for the current HEAD SHA. CI Checks are also currently inconclusive because the latest Test source-s3 Connector re-run is still pending.

Note: The bot submits APPROVE whenever all enforced gates PASS, regardless of what the PR changes. APPROVE is a quality signal, not a merge authorization; any auto-merge behavior is governed by separate policy reading decision and risk_level from the metrics marker.

🔍 Gate Evaluation Details

Gate-by-Gate Analysis

Gate Status Enforced? Details
PR Hygiene PASS Yes PR body is substantive and source-s3 docs changelog was modified. PR Body Length (raw): 3223; after stripping: 3223; visible content: 3018; preview: ## What\nResolves https://github.com/airbytehq/oncall/issues/12663:\n- https://github.com/airbytehq/on. No unresolved human review comments were found.
Code Hygiene PASS WARNING Source-code coverage heuristic is satisfied because the PR modifies airbyte-integrations/connectors/source-s3/unit_tests/v4/test_source.py with new test content. Source Files Modified: none under the source-code heuristic except version/dependency files; Test Files Modified: airbyte-integrations/connectors/source-s3/unit_tests/v4/test_source.py; Coverage Evidence: Found.
Test Coverage PASS Yes Behavioral change detected from title keyword fix and linked oncall issue. New test content found: def test_airbyte_cdk_limits_concurrent_partition_generators() -> None: in airbyte-integrations/connectors/source-s3/unit_tests/v4/test_source.py.
Code Security PASS Yes No path-based auth/credential/security files changed, and no security keywords were found in changed manifest/metadata diff hunks. The metadata hunk only changes dockerImageTag from 4.15.4 to 4.15.5.
Per-Record Performance PASS WARNING No new per-record processing loops, network calls, or transformations were introduced in connector runtime code. The only Python code change is a whitespace line in source_s3/v4/__init__.py and a unit test.
Breaking Dependencies WARNING WARNING airbyte-integrations/connectors/source-s3/poetry.lock changes airbyte-cdk from 7.19.1 to 7.19.2; dependency behavior is intentionally changed to consume the concurrent partition-generator cap.
Backwards Compatibility PASS Warning (elevates Risk Level) No spec, schema, stream, primary key, cursor, or state format files changed. metadata.yaml only bumps dockerImageTag; pyproject.toml only bumps the connector version.
Forwards Compatibility PASS Warning (elevates Risk Level) No state/cursor/pagination/transformation keywords were found in functional connector runtime diff hunks. Test-only references are not runtime compatibility changes.
Behavioral Changes WARNING Warning (elevates Risk Level) The PR is a bug fix changing sync concurrency behavior via airbyte-cdk dependency bump. The warning is advisory; the PR includes a regression test and documents rollback safety.
Out-of-Scope Changes PASS Skip PR is in scope: it changes airbyte-integrations/connectors/source-s3/** and docs/integrations/sources/s3.md.
CI Checks UNKNOWN Yes Core CI excludes pre-release checks. Latest core checks include successful Lint source-s3 Connector, Build and Verify Artifacts (source-s3), required Connector CI Checks Summary, Format Check, Check Changelog Updated, and Enforce PR structure; the latest Test source-s3 Connector re-run is still pending.
Live / E2E Tests FAIL Yes Validation is required because this is a bug fix and sync behavior change. MCP verification was unavailable because the Ops MCP Cloud SQL proxy was not running. Fallback CI evidence shows source-s3 Pre-Release Checks failed. Logs report: Links used in connector documentation are valid failed because https://github.com/airbytehq/airbyte/pull/TBD returns 404.

Detailed trigger evidence:

  • Changed files: airbyte-integrations/connectors/source-s3/metadata.yaml, airbyte-integrations/connectors/source-s3/poetry.lock, airbyte-integrations/connectors/source-s3/pyproject.toml, airbyte-integrations/connectors/source-s3/source_s3/v4/__init__.py, airbyte-integrations/connectors/source-s3/unit_tests/v4/test_source.py, docs/integrations/sources/s3.md.
  • Live / E2E trigger indicators: PR title contains fix; PR body links https://github.com/airbytehq/oncall/issues/12663; dependency change targets full-refresh read-path concurrency.
  • Pre-release failure log: docs/integrations/sources/s3.md changelog row uses https://github.com/airbytehq/airbyte/pull/TBD, which returns 404.
📚 Evidence Consulted

Evidence

  • Changed files: 6 files
  • CI checks: source-s3 Pre-Release Checks failed; Test source-s3 Connector pending; Lint source-s3 Connector passed; Build and Verify Artifacts (source-s3) passed; Connector CI Checks Summary passed; Format Check passed; Check Changelog Updated passed; Enforce PR structure passed.
  • PR labels: connectors/source/s3
  • PR description: present; raw length 3223, stripped length 3223, visible content length 3018.
  • Existing bot reviews: none from this AI review bot for HEAD SHA 9e45f22b6f63558c3e08a4fd6cfec69981d9c23f; one prior github-actions[bot] COMMENTED review with a markdownlint-fix suggestion.
  • MCP verification: attempted Ops MCP query_prod_connector_versions for source-s3 definition 69589781-7828-43c5-9f63-8925b1c1ccc2; unavailable because Cloud SQL proxy was not running.
❓ How to Respond

Providing Context or Justification

You can add explanations that the bot will see on the next review:

Option 1: PR Description (recommended)
Add a section to your PR description:

## AI PR Review Justification

### {Gate Name}
[Your explanation here]

Option 2: PR Comment
Add a comment starting with:

AI PR Review Justification:
[Your explanation here]

After adding your response, re-run /ai-review to have the bot evaluate it.

Note: Justifications provide context for the bot to evaluate. For some gates (like the Live / E2E Tests gate), a sufficient justification can lead to PASS. For other gates, justifications help explain the situation but may still require escalation if the gate cannot be remediated.


Devin session

Copy link
Copy Markdown
Contributor

@octavia-bot octavia-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI PR Review: Gates Failed

This PR has failing gates that require attention before merge. See the gate report comment above for details and remediation steps.

This review was automatically submitted by the AI PR Review system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant