Skip to content

[Enh]: Add map_batches to pyspark #3578

@pedro-villanueva-bcom

Description

@pedro-villanueva-bcom

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

No response

Please describe the purpose of the new feature or describe the problem to solve.

UDFs can be very useful, for example for statistical methods that don't exist natively. Narwhals implements map_batches for eager backends, but not for pyspark. Pyspark has good support for several kinds of UDFs.
Currently trying to use map_batches with pyspark raises

NotImplementedError: 'map_batches' is not implemented for: "<Implementation.PYSPARK: 'pyspark'>"

I'm working on a PR to add this functionality.

Suggest a solution if possible.

No response

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions