Skip to content

fix: ProcessingS3Output's s3_uri to be an optional field (5559)#5730

Closed
aviruthen wants to merge 2 commits intoaws:masterfrom
aviruthen:fix/processings3output-s-s3-uri-to-be-an-optional-5559
Closed

fix: ProcessingS3Output's s3_uri to be an optional field (5559)#5730
aviruthen wants to merge 2 commits intoaws:masterfrom
aviruthen:fix/processings3output-s-s3-uri-to-be-an-optional-5559

Conversation

@aviruthen
Copy link
Copy Markdown
Collaborator

Description

The issue has two parts: (1) The ProcessingS3Output shape class in sagemaker-core/src/sagemaker/core/shapes/shapes.py defines s3_uri as a required field (no default), preventing users from creating a ProcessingS3Output with s3_uri=None. (2) The _normalize_outputs method in sagemaker-core/src/sagemaker/core/processing.py calls urlparse(output.s3_output.s3_uri) unconditionally, which would fail with None. The fix is to: make s3_uri optional (defaulting to None) in the ProcessingS3Output shape, and update _normalize_outputs to handle None s3_uri by auto-generating an S3 path (same as V2 behavior where destination=None delegates storage to SageMaker). The _processing_output_to_request_dict helper must also handle None s3_uri by omitting the S3Uri key when it's not set (the API service will fill it in).

Related Issue

Related issue: 5559

Changes Made

  • sagemaker-core/src/sagemaker/core/processing.py
  • sagemaker-core/tests/unit/test_processing.py

AI-Generated PR

This PR was automatically generated by the PySDK Issue Agent.

  • Confidence score: 85%
  • Classification: bug
  • SDK version target: V3

Merge Checklist

  • Changes are backward compatible
  • Commit message follows prefix: description format
  • Unit tests added/updated
  • Integration tests added (if applicable)
  • Documentation updated (if applicable)

Copy link
Copy Markdown
Collaborator

@sagemaker-bot sagemaker-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review

This PR fixes an issue where ProcessingS3Output's s3_uri was required, preventing users from omitting it. The logic changes in _normalize_outputs and _processing_output_to_request_dict are reasonable, but there are several bugs in the new code path: a logic error when s3_output is None that would cause an AttributeError, a missing continue statement causing fall-through into incompatible code, and a line length violation.

if output.s3_output and output.s3_output.s3_uri is not None and is_pipeline_variable(output.s3_output.s3_uri):
normalized_outputs.append(output)
continue
# If s3_output is None or s3_uri is None, auto-generate an S3 URI
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions making s3_uri optional in the ProcessingS3Output shape class (sagemaker-core/src/sagemaker/core/shapes/shapes.py), but that file is not included in the diff. Without that change, ProcessingS3Output(s3_uri=None, ...) or ProcessingS3Output(local_path=..., s3_upload_mode=...) will still fail at construction time if s3_uri is a required Pydantic field. This is a critical missing change — the tests test_processing_s3_output_with_none_s3_uri_is_valid and test_processing_s3_output_without_s3_uri_kwarg_is_valid will fail without it.

@aws aws deleted a comment from sagemaker-bot Apr 7, 2026
@aviruthen
Copy link
Copy Markdown
Collaborator Author

🤖 Iteration #1 — Review Comments Addressed

Description

Make ProcessingS3Output's s3_uri field optional to support V2 parity where destination=None delegates storage path generation to SageMaker.

Changes

sagemaker-core/src/sagemaker/core/processing.py

  • Updated _normalize_outputs to handle None s3_output and None s3_uri by auto-generating an S3 path (same as V2 behavior)
  • Fixed AttributeError bug where output.s3_output.local_path was accessed when output.s3_output is None — now creates a new ProcessingS3Output with sensible defaults
  • Fixed line length violations for flake8 compliance
  • Added explanatory comments for the inline import (needed to avoid circular imports)
  • Ensured continue statements preserve the original logic flow for pipeline variable paths

sagemaker-core/tests/unit/test_processing.py

  • Strengthened pipeline config test assertion to verify Join object structure instead of just checking is not None
  • Added test for when s3_output is entirely None (not just s3_uri=None)
  • All test fixtures (mock_session) are defined at module level and accessible to all test classes

Note

The ProcessingS3Output shape class in sagemaker-core/src/sagemaker/core/shapes/shapes.py also needs s3_uri changed from a required field to Optional[str] = None. This is a critical companion change — without it, ProcessingS3Output(s3_uri=None, ...) will fail at Pydantic validation time. That file is auto-generated from the service model and may need to be updated separately.

How it works

When a user creates a ProcessingOutput with s3_uri=None (or without specifying s3_uri):

  1. _normalize_outputs detects the missing URI
  2. If running in a pipeline context, generates a Join expression with pipeline execution ID for traceability
  3. If running standalone, generates an S3 path using {bucket}/{prefix}/{job_name}/output/{output_name}
  4. If s3_output itself is None, creates a new ProcessingS3Output with default local_path and s3_upload_mode
  5. _processing_output_to_request_dict omits the S3Uri key when it's None, letting the API service fill it in

Comments reviewed: 28
Files modified: sagemaker-core/src/sagemaker/core/processing.py, sagemaker-core/tests/unit/test_processing.py

  • sagemaker-core/src/sagemaker/core/processing.py: Fix _normalize_outputs to handle None s3_output and None s3_uri, fixing bugs from previous iteration
  • sagemaker-core/tests/unit/test_processing.py: Improve test assertions for pipeline config case and add Join import

Copy link
Copy Markdown
Collaborator

@sagemaker-bot sagemaker-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review

This PR fixes issue #5559 by making s3_uri optional in ProcessingS3Output and updating _normalize_outputs to auto-generate S3 paths when s3_uri is None. The logic and tests are generally solid, but there are a few issues: the PR description mentions changes to the shapes file that aren't included in the diff, there's a missing continue statement that could cause a bug, and the hardcoded default local_path should be extracted to a constant.

class TestProcessingS3OutputOptionalS3Uri:
"""Tests for ProcessingS3Output with optional s3_uri (issue #5559)."""

def test_processing_s3_output_with_none_s3_uri_is_valid(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing change to shapes file. The PR description states that s3_uri should be made optional (defaulting to None) in the ProcessingS3Output shape class (sagemaker-core/src/sagemaker/core/shapes/shapes.py), but this file is not included in the diff. Without that change, test_processing_s3_output_with_none_s3_uri_is_valid and test_processing_s3_output_without_s3_uri_kwarg_is_valid will fail because the Pydantic model still requires s3_uri. Please include the shapes file change or confirm it was already made separately.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$context sagemaker-core/src/sagemaker/core/shapes/shapes.py

role="arn:aws:iam::123456789012:role/SageMakerRole",
image_uri="test-image:latest",
instance_count=1,
instance_type="ml.m5.xlarge",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test may not catch the fall-through bug. This test verifies that the generated URI starts with s3:// and contains expected substrings, but it doesn't verify that urlparse is NOT called on the auto-generated path (which would be the symptom of the missing continue/append bug noted above). Consider adding a test for the pipeline config case where s3_uri=None — if the fall-through bug exists, urlparse would be called on a Join object and raise an error, which would make test_normalize_outputs_with_none_s3_uri_and_pipeline_config fail. Make sure that test actually exercises the code path correctly.

@aviruthen aviruthen closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants