feat(source-pinterest): Update CDK to v6#65960
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Helpful Resources
PR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
|
|
/format-fix
|
|
|
/bump-version
|
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 412b505. |
|
/format-fix
|
|
/poe source pinterest use-cdk-latest
🤖 Auto-commit successful: f08bbaa
|
|
/format-fix
|
|
/poe source pinterest use-cdk-latest
🤖 Auto-commit successful: 00fba70
|
Brian Lai (brianjlai)
left a comment
There was a problem hiding this comment.
added some comments around the shift back to v6 CDK. feel free to ping me if you want to discuss more about the changes. but yeah as max mentioned, v7 would have some shenanigans. but once its fully ready to migrate to manifest-only most of the pain points should no longer be relevant
| python = "^3.10,<3.12" | ||
| pendulum = "==2.1.2" | ||
| airbyte-cdk = "^4" | ||
| airbyte-cdk = "^7.0.1" |
There was a problem hiding this comment.
this would work, but we've usually just used ^7. Did you have to do the full version for some reason to get the latest CDK?
There was a problem hiding this comment.
It is just how the /poe source pinterest use-cdk-latest updated it, because I had a failed CI check with ^7
| title: "Account ID" | ||
| description: "The Pinterest account ID you want to fetch data for. This ID must be provided to filter the data for a specific account." | ||
| examples: ["1234567890"] | ||
| num_workers: |
There was a problem hiding this comment.
as per the conversation during retro about the confusion between Airbyte workers and connector workers, lets change this to num_threads and in the title and description let's just say "Number of concurrent threadsandThe number of parallel threads to use for the sync.`
| concurrency_level: | ||
| type: ConcurrencyLevel | ||
| default_concurrency: "{{ config['num_workers'] or 2 }}" | ||
| max_concurrency: 20 |
There was a problem hiding this comment.
can you add a comment about where the boundaries for concurrency level were determined
|
|
||
| return declarative_streams + report_streams | ||
|
|
||
| def _create_ad_accounts_stream(self, config): |
There was a problem hiding this comment.
unfortunately, I suspect that because we're still going to be using v6 of the CDK here, we might still run into this same issue since RFR on python streams is still auto-enabled.
If we were going to v7, then you are right that RFR w/ synthetic cursors is not used anymore and we could share the streams. It may be safer to leave this and once we go to v7 + manifest-only, this whole section will get deleted anyway
There was a problem hiding this comment.
I encountered the same issue with version 6 — AdAccounts was created as a DefaultStream, but we’re iterating through it using stream_slices. I tested it and confirmed that we’re still receiving all AdAccounts records when running a sync for multiple report streams.
Would it be safer to explicitly set is_resumable = False?
There was a problem hiding this comment.
I tested in the Cloud and got the same number of records for the old and new versions:
https://cloud.airbyte.com/workspaces/9ef0ca9a-d759-484c-8499-93565374eaea/connections/dbcd0609-3073-4bd6-b313-eaea6089c698/timeline
https://cloud.airbyte.com/workspaces/9ef0ca9a-d759-484c-8499-93565374eaea/connections/246f95fc-fe40-43f5-bd70-43eca230b1c5/timeline
There was a problem hiding this comment.
ah yeah you're right we have part of the changes to return back DefaultStream still in v6, so that is still a real scenario especially for a simple stream like ad accounts. its fine how you have it.
Would it be safer to explicitly set is_resumable = False?
Yeah I think it would be. So let's disable it. Either way once we go to v7 that setting won't matter any more (once on v7 we'll resume incremental syncs running as full refresh, but full refresh streams will only checkpoint once at the end). This should hopefully just be a short term behavior anyway
Its possible the reason records matched was because we would only see this appear if we have more than one page of ad accounts, but I imagine most customers don't have many pages of ad accounts.
| dockerImageTag: 2.1.9-rc.1 | ||
| dockerRepository: airbyte/source-pinterest | ||
| connectorBuildOptions: | ||
| baseImage: docker.io/airbyte/python-connector-base:4.0.0@sha256:d9894b6895923b379f3006fa251147806919c62b7d9021b5cd125bb67d7bbe22 |
There was a problem hiding this comment.
small nit, but I think we should also be bumping the base image to 4.0.2 since we had some recent updates to the image to get some additional security fixes: https://hub.docker.com/layers/airbyte/python-connector-base/4.0.2/images/sha256-f24aa7ddd043ec3c21851d814e95736deb3e1e657b74def0f1dec14e4d51504e
| yield record | ||
|
|
||
|
|
||
| class AdAccounts(PinterestStream): |
There was a problem hiding this comment.
pending the above comment about not being able to reuse the same AdAccounts stream, would it make sense to continue to use the low-code version of the stream instead?
i assume we needed this change since in v7 we would get back a DefaultStream which might incompatible with the python streams, but now back on v6 it should be a DeclarativeStream.
|
/format-fix
|
|
|
||
| return declarative_streams + report_streams | ||
|
|
||
| def _create_ad_accounts_stream(self, config): |
There was a problem hiding this comment.
ah yeah you're right we have part of the changes to return back DefaultStream still in v6, so that is still a real scenario especially for a simple stream like ad accounts. its fine how you have it.
Would it be safer to explicitly set is_resumable = False?
Yeah I think it would be. So let's disable it. Either way once we go to v7 that setting won't matter any more (once on v7 we'll resume incremental syncs running as full refresh, but full refresh streams will only checkpoint once at the end). This should hopefully just be a short term behavior anyway
Its possible the reason records matched was because we would only see this appear if we have more than one page of ad accounts, but I imagine most customers don't have many pages of ad accounts.
| description = "Source implementation for Pinterest." | ||
| authors = [ "Airbyte <contact@airbyte.io>",] | ||
| license = "MIT" | ||
| license = "ELv2" |
What
Update CDK to v6.
Migration to v7 is currently blocked, as version 7 expects all streams — including those in Python implementations — to align with the concurrent CDK.
Resolves: https://github.com/airbytehq/airbyte-internal-issues/issues/14217
How
Review guide
User Impact
Can this PR be safely reverted and rolled back?