Fix PySparkProcessor V3 ProcessingInput construction#5759
Conversation
SummaryThis PR updates ProblemIn V3,
and instead expects V3 fields such as However,
This can cause validation failures during pipeline definition / upsert. FixThis change:
TestsAdded regression tests covering:
Example failure before this change |
|
Hi Evan, Thanks for opening this PR. I noticed the This is blocking the use of the PySparkProcessor. Would be good for someone to escalate a review of this. |
|
Hi @NathanCYee , Thanks a lot for catching this issue and calling it out, especially on spark_event_logs_s3_uri and ProcessingOutput. I’ve submitted a new code update to address it. When you have a moment, could you please take another look and review the latest changes? Really appreciate your help on this. |
Use V3-compatible ProcessingInput construction in PySparkProcessor.
PySparkProcessor still built internal ProcessingInput objects with the
legacy source/destination fields in _stage_configuration() and
_stage_submit_deps(). In V3, ProcessingInput now expects s3_input, so
those internal code paths can fail during pipeline definition or upsert
with validation errors.
This change updates both code paths to build ProcessingInput with
ProcessingS3Input while preserving the same staged S3 URIs and local
mount paths. It also adds regression tests covering configuration
staging and local dependency staging