Skip to content

feat(gcs): implement delete for GCS object store#6958

Open
daiping8 wants to merge 2 commits into
Eventual-Inc:mainfrom
daiping8:fix/gcs-delete-support
Open

feat(gcs): implement delete for GCS object store#6958
daiping8 wants to merge 2 commits into
Eventual-Inc:mainfrom
daiping8:fix/gcs-delete-support

Conversation

@daiping8
Copy link
Copy Markdown

Summary

Changes

  • Adds UnableToDeleteFile error variant to the GCS error enum
  • Implements delete on GCSClientWrapper using google_cloud_storage's delete_object API
  • Handles 404/410 responses as success (idempotent delete per ObjectSource trait contract)
  • Overrides delete in ObjectSource impl for GCSSource

Test plan

  • Verify write_parquet(write_mode="overwrite") succeeds on a gs:// path
  • Verify delete is idempotent (overwriting a non-existent path does not fail)
  • make build passes
  • DAFT_RUNNER=native make test EXTRA_ARGS="-v tests/io/" passes

@daiping8 daiping8 requested a review from a team as a code owner May 19, 2026 01:18
@github-actions github-actions Bot added the feat label May 19, 2026
@daiping8 daiping8 marked this pull request as draft May 19, 2026 01:19
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR implements the delete method for the GCS object store backend, fixing write_parquet(write_mode='overwrite') on gs:// paths. The implementation correctly handles idempotent deletes by treating both GError::Response and GError::HttpClient 404/410 responses as success, and also refactors client-config construction to support a no-anonymous-fallback code path used in new smoke tests.

  • Adds UnableToDeleteFile error variant, is_delete_idempotent_error helper, and GCSClientWrapper::delete / GCSSource::delete implementations with proper semaphore and IO-stats wiring.
  • Refactors get_client into get_client_config + client_from_config to allow a test-only get_client_without_anonymous_fallback path.
  • Adds Rust unit/smoke tests and a new Python integration test for the parquet-overwrite round-trip on GCS.

Confidence Score: 5/5

The change is narrowly scoped to adding a new delete operation on GCS; existing read/list paths are untouched and the refactored client-construction logic preserves the same behavior for callers of get_client.

The idempotency guards correctly cover both the API-level (GError::Response) and transport-level (GError::HttpClient) 404/410 cases, the semaphore and IO-stats wiring match the pattern used by other methods, and the client-config refactor is straightforward. The only finding is a cosmetic dead-code pattern (is_success always true).

No files require special attention.

Important Files Changed

Filename Overview
src/daft-io/src/google_cloud.rs Adds delete to GCSClientWrapper and GCSSource with correct idempotent 404/410 handling (both GError::Response and GError::HttpClient); refactors client-config construction; adds Rust unit and smoke tests. One minor: the is_success boolean is always true, leaving the if is_success guard as dead code.
tests/integration/io/parquet/test_writes_gcs.py New integration test for GCS parquet overwrite; guarded by --credentials flag and GCS_WRITE_TEST_BUCKET env var, with gcsfs-based cleanup in fixture teardown. Looks correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant GCSSource
    participant GCSClientWrapper
    participant GCSClient as google_cloud_storage Client

    Caller->>GCSSource: delete(uri, io_stats)
    GCSSource->>GCSClientWrapper: delete(uri, io_stats)
    GCSClientWrapper->>GCSClientWrapper: parse_raw_uri(uri)
    alt key is empty
        GCSClientWrapper-->>Caller: Err(NotAFile)
    else key is non-empty
        GCSClientWrapper->>GCSClientWrapper: acquire semaphore permit
        GCSClientWrapper->>GCSClient: delete_object(bucket, key)
        alt Ok(())
            GCSClient-->>GCSClientWrapper: Ok(())
            GCSClientWrapper->>GCSClientWrapper: mark_delete_requests(1)
            GCSClientWrapper-->>Caller: Ok(())
        else Err — 404 or 410
            GCSClient-->>GCSClientWrapper: Err(GError::Response 404/410 OR HttpClient 404/410)
            GCSClientWrapper->>GCSClientWrapper: is_delete_idempotent_error → true
            GCSClientWrapper->>GCSClientWrapper: mark_delete_requests(1)
            GCSClientWrapper-->>Caller: Ok(())
        else Err — other error
            GCSClient-->>GCSClientWrapper: Err(other)
            GCSClientWrapper->>GCSClientWrapper: is_delete_idempotent_error → false
            GCSClientWrapper-->>Caller: Err(UnableToDeleteFile → super::Error)
        end
    end
Loading

Reviews (2): Last reviewed commit: "test(gcs): add delete and overwrite regr..." | Re-trigger Greptile

Comment thread src/daft-io/src/google_cloud.rs Outdated
Comment thread src/daft-io/src/google_cloud.rs Outdated
@daiping8 daiping8 force-pushed the fix/gcs-delete-support branch 3 times, most recently from c056856 to 3178914 Compare May 19, 2026 09:10
## Summary
- Implements `delete` method for the GCS object store backend (`GCSSource`)
- Fixes `write_parquet(write_mode='overwrite')` on GCS paths (closes Eventual-Inc#6928)

## Changes
- Adds `UnableToDeleteFile` error variant to the GCS error enum
- Implements `delete` on `GCSClientWrapper` using `google_cloud_storage`'s `delete_object` API
- Handles 404/410 responses as success (idempotent delete per `ObjectSource` trait contract)
- Overrides `delete` in `ObjectSource` impl for `GCSSource`

## Test plan
- Verify `write_parquet(write_mode="overwrite")` succeeds on a `gs://` path
- Verify delete is idempotent (overwriting a non-existent path does not fail)
- `make build` passes
- `DAFT_RUNNER=native make test EXTRA_ARGS="-v tests/io/"` passes

Author: daiping8 <dai.ping88@zte.com.cn>
@daiping8 daiping8 force-pushed the fix/gcs-delete-support branch from 3178914 to b7e5386 Compare May 19, 2026 09:14
@rchowell
Copy link
Copy Markdown
Contributor

@daiping8 is this ready for review or are you still working on it?

@daiping8 daiping8 force-pushed the fix/gcs-delete-support branch from 22b5560 to 2cca089 Compare May 23, 2026 06:47
@daiping8
Copy link
Copy Markdown
Author

daiping8 commented May 23, 2026

@daiping8 is this ready for review or are you still working on it?

Sorry for the late reply. I've added a commit with test code and have tested the PR as thoroughly as possible. The PR now includes two commits: one for the implementation and one for the tests. Please review, thank you.

@daiping8 daiping8 marked this pull request as ready for review May 23, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

write_parquet(write_mode='overwrite') on GCS fails with Method not implemented: Deletes is not yet supported!

2 participants