chore(file-based-cdk): add UploadableRemoteFile, refactor upload method#780
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@daryna/file-based/refactor-upload-method#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch daryna/file-based/refactor-upload-methodHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
|
/autofix
|
📝 WalkthroughWalkthroughImplements a concrete upload flow in AbstractFileBasedStreamReader with a 1.5GB file-size guard, introduces an UploadableRemoteFile interface with download/size semantics, updates imports and type hints, removes the abstract file_size method, and adds unit tests exercising upload and size-limit behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Reader as AbstractFileBasedStreamReader
participant File as UploadableRemoteFile
participant Local as LocalFS
participant Log as Logger
Caller->>Reader: upload(file, local_directory, logger)
Reader->>File: size (property)
alt size > FILE_SIZE_LIMIT
Reader-->>Caller: raise FileSizeLimitError (FailureType.config_error)
else size within limit
Reader->>Reader: _get_file_transfer_paths(file)
Reader->>Log: log start (file_uri_for_logging, size)
Reader->>File: download_to_local_directory(local_path)
File->>Local: write bytes
Reader->>Log: log completion (duration)
Reader-->>Caller: return (FileRecordData, AirbyteRecordMessageFileReference)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Would you like adding a negative test asserting the exact Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (1)
22-26: Consider usingdatetimeinstead ofstrfor timestamps.The
file_created_atproperty returns astr, but timestamps are typically better represented asdatetimeobjects for type safety and easier manipulation. The same applies tofile_updated_at(lines 30-34). This would align with howRemoteFile.last_modifiedis typed asdatetime. Wdyt about usingdatetimehere?airbyte_cdk/sources/file_based/file_based_stream_reader.py (2)
193-194: Remove commented-out code.These commented lines duplicate the guard logic now in
__init__. They should be removed to keep the code clean, wdyt?Apply this diff:
- # if self.file_transfer_reader_class is None and self.upload.__func__ == AbstractFileBasedStreamReader.upload: - # raise NotImplementedError("One of file_transfer_reader_class or upload method must be defined to support file transfer.") -
213-215: Consider extracting the size formatting logic.The size formatting appears here and potentially elsewhere. Extracting it to a helper function could improve maintainability, wdyt?
Something like:
def _format_file_size(size_bytes: int) -> str: mb = size_bytes / (1024 * 1024) gb = mb / 1024 return f"{mb:,.2f} MB ({gb:.2f} GB)"Then use it as:
logger.info( f"Starting to download the file {file_transfer.file_uri_for_logging} with size: {self._format_file_size(file_size)}" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py(1 hunks)airbyte_cdk/sources/file_based/file_based_stream_reader.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (3)
unit_tests/sources/file_based/file_types/test_excel_parser.py (1)
remote_file(46-47)airbyte_cdk/sources/file_based/remote_file.py (1)
RemoteFile(11-18)airbyte_cdk/sources/file_based/file_based_stream_reader.py (1)
file_size(131-138)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (6)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (8)
AbstractFileBasedFileTransferReader(6-64)file_size(38-42)source_file_relative_path(53-57)file_uri_for_logging(60-64)download_to_local_directory(45-49)file_id(14-18)file_created_at(22-26)file_updated_at(30-34)airbyte_cdk/sources/file_based/file_record_data.py (1)
FileRecordData(11-23)airbyte_cdk/sources/file_based/remote_file.py (1)
RemoteFile(11-18)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
FailureType(749-752)airbyte_cdk/sources/file_based/exceptions.py (1)
FileSizeLimitError(158-159)airbyte_cdk/sources/file_based/file_types/file_transfer.py (1)
upload(18-30)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py
[error] 60-60: Function is missing a return type annotation [no-untyped-def]
airbyte_cdk/sources/file_based/file_based_stream_reader.py
[error] 49-49: "Callable[[RemoteFile, str, Logger], tuple[FileRecordData, Any]]" has no attribute "func" [attr-defined]
[error] 196-196: "AbstractFileBasedFileTransferReader" not callable [operator]
[error] 196-196: "None" not callable [misc]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (1)
225-240: LGTM on the FileRecordData and file reference construction!The logic correctly assembles the metadata from the file transfer object and constructs both
FileRecordDataandAirbyteRecordMessageFileReferencewith all the necessary fields. Nice work on maintaining the complete audit trail!
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py
Outdated
Show resolved
Hide resolved
|
/autofix
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (1)
176-181: Docstring return type is outdated.The method returns a tuple (FileRecordData, AirbyteRecordMessageFileReference) but the docstring documents only the file reference. Update to avoid confusion, wdyt?
- Returns: - AirbyteRecordMessageFileReference: A file reference object containing: - - staging_file_url (str): The absolute path to the referenced file in the staging area. - - file_size_bytes (int): The size of the referenced file in bytes. - - source_file_relative_path (str): The relative path to the referenced file in source. + Returns: + Tuple[FileRecordData, AirbyteRecordMessageFileReference]: + - FileRecordData: + - folder (str): The folder of the referenced file. + - file_name (str): The name of the referenced file. + - bytes (int): Size in bytes. + - id (Optional[str]): File identifier. + - mime_type (Optional[str]): MIME type. + - created_at (Optional[str]): Creation timestamp (string). + - updated_at (Optional[str]): Update timestamp (string). + - source_uri (str): Original source URI. + - AirbyteRecordMessageFileReference: + - staging_file_url (str): Absolute path in staging. + - source_file_relative_path (str): Source-relative path. + - file_size_bytes (int): Size in bytes.
🧹 Nitpick comments (3)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (1)
7-7: Consider ClassVar and unit consistency for FILE_SIZE_LIMIT.Would you like to:
- Type it as a ClassVar[int] for clarity with type-checkers?
- Confirm whether limits/logs use decimal GB or binary GiB and align across messages to avoid confusion, wdyt?
Apply:
+from typing import ClassVar @@ -class AbstractFileBasedFileTransferReader(ABC): - FILE_SIZE_LIMIT = 1_500_000_000 +class AbstractFileBasedFileTransferReader(ABC): + FILE_SIZE_LIMIT: ClassVar[int] = 1_500_000_000airbyte_cdk/sources/file_based/file_based_stream_reader.py (2)
190-194: Unify size units with logs (GB vs GiB) and stable formatting.The error uses decimal GB (1e9) while logs use 1024-based MB/GB. Would you want to standardize on binary units and label as MiB/GiB for consistency, plus format to a fixed precision, wdyt?
- message = f"File size exceeds the {file_transfer.FILE_SIZE_LIMIT / 1e+9} GB limit." + message = f"File size exceeds the {file_transfer.FILE_SIZE_LIMIT / (1024 ** 3):.2f} GiB limit."
204-214: Refactor logging to lazy formatting; also likely fixes Ruff formatting failure.These f-strings are long and pre-format eagerly. Would you switch to logger’s lazy % formatting, shorten lines, and align to MiB/GiB units, wdyt?
- logger.info( - f"Starting to download the file {file_transfer.file_uri_for_logging} with size: {file_size / (1024 * 1024):,.2f} MB ({file_size / (1024 * 1024 * 1024):.2f} GB)" - ) + logger.info( + "Starting to download the file %s with size: %.2f MiB (%.2f GiB)", + file_transfer.file_uri_for_logging, + file_size / (1024 ** 2), + file_size / (1024 ** 3), + ) @@ - logger.info( - f"Finished downloading the file {file_transfer.file_uri_for_logging} and saved to {local_file_path} in {write_duration:,.2f} seconds." - ) + logger.info( + "Finished downloading the file %s and saved to %s in %.2f seconds.", + file_transfer.file_uri_for_logging, + local_file_path, + write_duration, + )Please also run the project formatter (e.g., poetry run ruff format .) to ensure the pipeline’s “would be reformatted” check passes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py(1 hunks)airbyte_cdk/sources/file_based/file_based_stream_reader.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (2)
unit_tests/sources/file_based/file_types/test_excel_parser.py (1)
remote_file(46-47)airbyte_cdk/sources/file_based/remote_file.py (1)
RemoteFile(11-18)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (6)
airbyte_cdk/sources/file_based/exceptions.py (1)
FileSizeLimitError(158-159)airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (8)
AbstractFileBasedFileTransferReader(6-64)file_size(38-42)source_file_relative_path(53-57)file_uri_for_logging(60-64)download_to_local_directory(45-49)file_id(14-18)file_created_at(22-26)file_updated_at(30-34)airbyte_cdk/sources/file_based/file_types/file_transfer.py (1)
upload(18-30)unit_tests/sources/file_based/in_memory_files_source.py (2)
upload(143-146)file_size(140-141)unit_tests/sources/file_based/test_file_based_stream_reader.py (2)
upload(87-90)file_size(84-85)airbyte_cdk/sources/file_based/file_record_data.py (1)
FileRecordData(11-23)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_based_stream_reader.py
[error] 188-188: Ruff format check failed. 1 file would be reformatted by 'poetry run ruff format --diff .'
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (4)
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (1)
59-64: Nice: return type annotation added.The property type hint resolves the earlier linter error and matches RemoteFile.uri’s str type. LGTM.
airbyte_cdk/sources/file_based/file_based_stream_reader.py (3)
46-52: Good runtime guard for configuration/override.Comparing type(self).upload to the base method avoids the func mypy issue. LGTM.
159-161: Correct return annotation for class property.Returning Type[AbstractFileBasedFileTransferReader] | None is the right contract. LGTM.
182-186: Explicit None check before instantiation looks good.This prevents calling None and clarifies the default method contract. LGTM.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
unit_tests/sources/file_based/test_file_based_stream_reader.py (2)
72-99: Test stub looks good, but consider delegatingfile_uri_for_loggingto the base class?The test class correctly implements all abstract members. However, overriding
file_uri_for_loggingto return a hardcoded string might cause confusion if logs are checked. The base class already provides a sensible default (self.remote_file.uri). Would it make sense to remove the override and rely on the base implementation, wdyt?If you decide to remove the override, apply this diff:
- @property - def file_uri_for_logging(self) -> str: - return "logging/url" -
535-541: Strengthen test assertions for better coverage?The test only checks that
file_record_dataandfile_referenceare truthy. Could you strengthen these assertions to verify specific fields or types? For example, you could assert thatfile_reference.file_size_bytes == 200andfile_record_data.id == "test_file_id", wdyt?Apply this diff to strengthen the assertions:
file_record_data, file_reference = stream_reader.upload(remote_file, "test_directory", logger) - assert file_record_data - assert file_reference + assert file_record_data.id == "test_file_id" + assert file_record_data.bytes == 200 + assert file_reference.file_size_bytes == 200 + assert file_reference.source_file_relative_path == "source/path"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/file_based/test_file_based_stream_reader.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/sources/file_based/test_file_based_stream_reader.py (4)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (4)
config(55-56)config(60-70)file_transfer_reader_class(160-161)upload(163-231)airbyte_cdk/sources/file_based/exceptions.py (1)
FileSizeLimitError(158-159)airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py (1)
AbstractFileBasedFileTransferReader(6-64)airbyte_cdk/sources/file_based/remote_file.py (1)
RemoteFile(11-18)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
unit_tests/sources/file_based/test_file_based_stream_reader.py (3)
10-10: LGTM!The new imports are necessary for the test scaffolding and align with the file-transfer reader pattern introduced in the PR.
Also applies to: 16-19
101-133: LGTM!The test stream reader correctly implements all abstract members and sets
file_transfer_reader_classto enable the file-transfer-backed upload path. The__test__ = Falsemarker correctly prevents pytest from treating this as a test class.
531-549: Fix missing dependency and initialize config
- Could you add
dunamaito the project dependencies (or adjust pytest warning filters) to resolve the ModuleNotFoundError inairbyte_cdk/__init__.py?- Could you initialize
stream_reader.config = TestSpec(**{"streams": []})before invokinguploadintest_upload_with_file_transfer_reader?wdyt?
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Maxime Carbonneau-Leclerc (maxi297)
left a comment
There was a problem hiding this comment.
I like this, this is cool! Just a last concern on the design and I think we're there after that
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/file_based/file_based_file_transfer_reader.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
unit_tests/sources/file_based/test_file_based_stream_reader.py (1)
524-528: Remove the duplicate size limit test?Lines 525-526 and 527-528 are identical assertions testing the same
FileSizeLimitErrorbehavior. Could you consolidate these into a single assertion, wdyt?Apply this diff to remove the duplication:
blob.size = 2_500_000_000 with pytest.raises(FileSizeLimitError): stream_reader.upload(uploadable_remote_file, "test_directory", logger) - with pytest.raises(FileSizeLimitError): - stream_reader.upload(uploadable_remote_file, "test_directory", logger)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/file_based/file_based_stream_reader.py(5 hunks)airbyte_cdk/sources/file_based/remote_file.py(2 hunks)unit_tests/sources/file_based/test_file_based_stream_reader.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
airbyte_cdk/sources/file_based/remote_file.py (1)
unit_tests/sources/file_based/test_file_based_stream_reader.py (2)
size(504-505)download_to_local_directory(507-508)
unit_tests/sources/file_based/test_file_based_stream_reader.py (3)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (6)
config(46-47)config(51-61)AbstractFileBasedStreamReader(34-247)get_matching_files(79-99)open_file(64-76)upload(150-215)airbyte_cdk/sources/file_based/exceptions.py (1)
FileSizeLimitError(158-159)airbyte_cdk/sources/file_based/remote_file.py (4)
RemoteFile(11-18)UploadableRemoteFile(21-57)size(32-36)download_to_local_directory(39-43)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (3)
airbyte_cdk/sources/file_based/exceptions.py (1)
FileSizeLimitError(158-159)airbyte_cdk/sources/file_based/file_record_data.py (1)
FileRecordData(11-23)airbyte_cdk/sources/file_based/remote_file.py (6)
RemoteFile(11-18)UploadableRemoteFile(21-57)size(32-36)source_file_relative_path(46-50)file_uri_for_logging(53-57)download_to_local_directory(39-43)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/remote_file.py
[error] 4-4: I001 Import block is unsorted or un-formatted. Found 1 error. Run 'poetry run ruff check --fix .' to fix.
airbyte_cdk/sources/file_based/file_based_stream_reader.py
[error] 206-206: Command: poetry run mypy --config-file mypy.ini airbyte_cdk. Argument 'created_at' to 'FileRecordData' has incompatible type 'datetime | None'; expected 'str | None' [arg-type]
[error] 207-207: Command: poetry run mypy --config-file mypy.ini airbyte_cdk. Argument 'updated_at' to 'FileRecordData' has incompatible type 'datetime | None'; expected 'str | None' [arg-type]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (5)
airbyte_cdk/sources/file_based/file_based_stream_reader.py (3)
6-6: LGTM!The new imports and FILE_SIZE_LIMIT constant are well-chosen. The 1.5GB limit provides a reasonable balance for file transfer operations.
Also applies to: 12-12, 14-14, 24-24, 26-26, 40-40
150-178: LGTM!The type guard and size validation logic are solid. The error message correctly shows the 1.5 GB limit dynamically, and using
FailureType.config_erroris appropriate here.
180-198: LGTM!The download logic is well-instrumented with detailed logging and timing information. This will be helpful for debugging and monitoring file transfer operations.
unit_tests/sources/file_based/test_file_based_stream_reader.py (2)
10-10: LGTM!The new imports are appropriate for the test coverage being added.
Also applies to: 16-16, 18-18
69-99: LGTM!The test helper class correctly implements all required abstract methods with minimal stubs, allowing the default
upload()implementation to be tested. The__test__ = Falseflag is a nice touch to prevent pytest confusion.
|
/autofix
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/file_based/remote_file.py (4)
21-24: Minor formatting nit in the docstring.There's a missing space before the opening parenthesis in "uploading(file transferring)". Should it be "uploading (file transferring)" or perhaps just "uploading or file transferring", wdyt?
Apply this diff:
- A file in a file-based stream that supports uploading(file transferring). + A file in a file-based stream that supports uploading (file transferring).
26-28: Consider usingdatetimefor timestamp fields.The
created_atandupdated_atfields are typed asOptional[str], but the parent class usesdatetimeforlast_modified. Usingdatetimeobjects for timestamps provides better type safety and easier manipulation. Is there a specific reason these need to be strings (e.g., varying formats from different sources), wdyt?If datetime is appropriate, apply this diff:
+from datetime import datetime + class UploadableRemoteFile(RemoteFile, ABC): """ A file in a file-based stream that supports uploading(file transferring). """ id: Optional[str] = None - created_at: Optional[str] = None - updated_at: Optional[str] = None + created_at: Optional[datetime] = None + updated_at: Optional[datetime] = None
38-43: Consider documenting error handling behavior.The
download_to_local_directorymethod could encounter various errors (network failures, disk I/O issues, permission problems). Would it be helpful to add a note in the docstring about expected exceptions or error handling behavior that implementers should follow, wdyt?For example:
@abstractmethod def download_to_local_directory(self, local_file_path: str) -> None: """ Download the file from remote source to local storage. + + Args: + local_file_path: The local path where the file should be saved. + + Raises: + Should raise appropriate exceptions for network, I/O, or permission errors. """ ...
45-57: Both properties return the sameurivalue.The
source_file_relative_pathandfile_uri_for_loggingproperties both returnself.uri. While they convey different semantic intents, having multiple accessors for the same value might create confusion about which to use when. Is there a plan for these to diverge in behavior, or could we simplify by using just one property, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/file_based/remote_file.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/file_based/remote_file.py (1)
unit_tests/sources/file_based/test_file_based_stream_reader.py (2)
size(504-505)download_to_local_directory(507-508)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
Maxime Carbonneau-Leclerc (maxi297)
left a comment
There was a problem hiding this comment.
That's very clean, I like it very much. Thanks for tackling those comments and this issue. This will definitely make the maintenance of the file upload easier
c67570b
into
main
| class UploadableRemoteFile(RemoteFile, ABC): | ||
| """ | ||
| A file in a file-based stream that supports uploading(file transferring). | ||
| """ | ||
|
|
||
| id: Optional[str] = None | ||
| created_at: Optional[str] = None | ||
| updated_at: Optional[str] = None | ||
|
|
||
| @property | ||
| @abstractmethod | ||
| def size(self) -> int: | ||
| """ | ||
| Returns the file size in bytes. | ||
| """ | ||
| ... | ||
|
|
||
| @abstractmethod | ||
| def download_to_local_directory(self, local_file_path: str) -> None: | ||
| """ | ||
| Download the file from remote source to local storage. | ||
| """ | ||
| ... | ||
|
|
||
| @property | ||
| def source_file_relative_path(self) -> str: | ||
| """ | ||
| Returns the relative path of the source file. | ||
| """ | ||
| return self.uri | ||
|
|
There was a problem hiding this comment.
Summary by CodeRabbit