Fix PySparkProcessor V3 ProcessingInput construction by Evan-W-ang · Pull Request #5759 · aws/sagemaker-python-sdk

Evan-W-ang · 2026-04-15T08:32:47Z

Use V3-compatible ProcessingInput construction in PySparkProcessor.

PySparkProcessor still built internal ProcessingInput objects with the
legacy source/destination fields in _stage_configuration() and
_stage_submit_deps(). In V3, ProcessingInput now expects s3_input, so
those internal code paths can fail during pipeline definition or upsert
with validation errors.

This change updates both code paths to build ProcessingInput with
ProcessingS3Input while preserving the same staged S3 URIs and local
mount paths. It also adds regression tests covering configuration
staging and local dependency staging

Evan-W-ang · 2026-04-15T08:36:21Z

Summary

This PR updates PySparkProcessor to construct ProcessingInput using the
V3-compatible s3_input=ProcessingS3Input(...) shape instead of the legacy
source / destination fields.

Problem

In V3, sagemaker.core.processing.ProcessingInput no longer accepts:

source
destination

and instead expects V3 fields such as input_name and s3_input.

However, PySparkProcessor still used the legacy constructor internally in:

_stage_configuration()
_stage_submit_deps()

This can cause validation failures during pipeline definition / upsert.

Fix

This change:

replaces internal legacy ProcessingInput(...) construction with
V3-style ProcessingS3Input(...)
preserves the existing S3 staging behavior
preserves the existing local mount path behavior
avoids relying on legacy .destination access where an explicit local path is sufficient

Tests

Added regression tests covering:

_stage_configuration() building a V3-compatible ProcessingInput
_stage_submit_deps() building a V3-compatible ProcessingInput for local dependencies

Example failure before this change

ValidationError: 2 validation errors for ProcessingInput
source
  Extra inputs are not permitted
destination
  Extra inputs are not permitted

Motivation

Users migrating to V3 naturally update their own processing inputs/outputs to the new schema, but Spark processing can still fail because of internal legacy construction in 
PySparkProcessor. This patch makes that internal behavior consistent with the V3 processing models.


**Test command**
```bash
cd ~/sagemaker-python-sdk/sagemaker-core
. .venv/bin/activate
python -m pytest tests/unit/spark/test_processing.py tests/unit/test_processing.py -q

Files to include

sagemaker-core/src/sagemaker/core/spark/processing.py
sagemaker-core/tests/unit/spark/test_processing.py

NathanCYee · 2026-05-29T17:57:54Z

Hi Evan,

Thanks for opening this PR. I noticed the spark_event_logs_s3_uri parameter also has a similar issue with ProcessingOutput.

ValidationError: 4 validation errors for ProcessingOutput
output_name
  Field required [type=missing, input_value={'source': '/opt/ml/proce...oad_mode': 'Continuous'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.13/v/missing
source
  Extra inputs are not permitted [type=extra_forbidden, input_value='/opt/ml/processing/spark-events/', 
input_type=str]
    For further information visit https://errors.pydantic.dev/2.13/v/extra_forbidden
destination
  Extra inputs are not permitted [type=extra_forbidden, 
input_value='s3://amazon-sagemaker-74...owdtqwioecn/spark-logs/', input_type=str]
    For further information visit https://errors.pydantic.dev/2.13/v/extra_forbidden
s3_upload_mode
  Extra inputs are not permitted [type=extra_forbidden, input_value='Continuous', input_type=str]
    For further information visit https://errors.pydantic.dev/2.13/v/extra_forbidden

This is blocking the use of the PySparkProcessor. Would be good for someone to escalate a review of this.

Evan-W-ang · 2026-06-08T02:39:53Z

Hi @NathanCYee ,

Thanks a lot for catching this issue and calling it out, especially on spark_event_logs_s3_uri and ProcessingOutput.

I’ve submitted a new code update to address it. When you have a moment, could you please take another look and review the latest changes?

Really appreciate your help on this.

Fix PySparkProcessor V3 ProcessingInput construction

96f4200

Evan-W-ang had a problem deploying to manual-approval April 15, 2026 08:32 — with GitHub Actions Error

Evan-W-ang had a problem deploying to manual-approval April 15, 2026 08:33 — with GitHub Actions Error

Evan-W-ang closed this Apr 15, 2026

Evan-W-ang reopened this Apr 15, 2026

Evan-W-ang had a problem deploying to manual-approval April 15, 2026 08:36 — with GitHub Actions Failure

Fix V3 processing shapes in spark and model monitor

fc9aff4

Evan-W-ang requested a deployment to manual-approval June 8, 2026 02:34 — with GitHub Actions Waiting

Evan-W-ang closed this Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix PySparkProcessor V3 ProcessingInput construction#5759

Fix PySparkProcessor V3 ProcessingInput construction#5759
Evan-W-ang wants to merge 2 commits into
aws:masterfrom
Evan-W-ang:fix/pysparkprocessor-v3-processinginput

Evan-W-ang commented Apr 15, 2026

Uh oh!

Evan-W-ang commented Apr 15, 2026

Uh oh!

NathanCYee commented May 29, 2026 •

edited

Loading

Uh oh!

Evan-W-ang commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Evan-W-ang commented Apr 15, 2026

Uh oh!

Evan-W-ang commented Apr 15, 2026

Summary

Problem

Fix

Tests

Example failure before this change

Uh oh!

NathanCYee commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Evan-W-ang commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NathanCYee commented May 29, 2026 •

edited

Loading