Skip to content

Support CRC checksum #611

@kdn36

Description

@kdn36

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We are sinking large dataframes to S3. For AWS, SHA256 is the only supported checksum. When profiling a high-throughput setup (100 Gbps NIC), the checksum adds 25-30% overhead as measured by AWS upload times, compared with not using a checksum.

Describe the solution you'd like
Support CRC-64/NVME (CRC64NVME) (which is the AWS default).

Describe alternatives you've considered
Turning off checksum completely ("AWS_UNSIGNED_PAYLOAD": "true") improves performance but adds unacceptable object integrity risk for large files using multi-part upload.

Additional context
Use case: high-end data file sink and read from AWS S3 and other S3 storage back-ends using polars dataframe library.

Note. In addition, it would be nice to see SHA256 offloaded to hardware, even though not all hardware supports this. Currently all SHA256 operations are done on the CPU. This may require swapping the ring crate for a more modern crypto crate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions