Skip to content

fix: Pipeline TypeError: can only concatenate list (not "NoneType") to list Using Sou (#5518)#25

Closed
aviruthen wants to merge 3 commits intomasterfrom
fix/pipeline-typeerror-can-only-concatenate-list-not-5518-2
Closed

fix: Pipeline TypeError: can only concatenate list (not "NoneType") to list Using Sou (#5518)#25
aviruthen wants to merge 3 commits intomasterfrom
fix/pipeline-typeerror-can-only-concatenate-list-not-5518-2

Conversation

@aviruthen
Copy link
Copy Markdown
Owner

Description

Fix TypeError when dependencies is None in get_processing_code_hash

Fixes aws#5518

Problem

When using SourceCode without specifying requirements and calling pipeline.upsert(), a TypeError: can only concatenate list (not "NoneType") to list is raised. This happens because get_processing_code_hash() performs list concatenation like [source_dir] + dependencies and [code] + dependencies without guarding against dependencies being None.

While get_training_code_hash already has if dependencies: guards, the get_processing_code_hash function did not have any such protection.

Solution

Add a defensive default at the start of get_processing_code_hash to convert None to an empty list:

dependencies = dependencies or []

This is consistent with how other functions in the codebase handle optional list parameters and ensures that list concatenation operations work correctly regardless of whether dependencies is None or an empty list.

Testing

Added three new test cases:

  • test_get_processing_code_hash_with_none_dependencies - code only with None dependencies
  • test_get_processing_code_hash_with_source_dir_and_none_dependencies - source_dir with None dependencies
  • test_get_processing_code_hash_with_code_only_and_none_dependencies - explicit code-only path with None dependencies

Related Issue

Fixes aws#5518

Changes Made

The bug is in sagemaker-core/src/sagemaker/core/workflow/utilities.py. The get_processing_code_hash function performs list concatenation like [source_dir] + dependencies and [code] + dependencies without guarding against dependencies being None. While get_training_code_hash already has if dependencies: guards, the get_processing_code_hash function does not. Additionally, based on the user's traceback (which points to get_training_code_hash line 251 with hash_files_or_dirs([source_dir] + dependencies)), it appears the user may be running a version where get_training_code_hash lacks the guard too. The safest fix is to default dependencies to an empty list at the start of get_processing_code_hash, and to add the same defensive defaulting in get_code_hash where requirements (passed as dependencies) comes from source_code.requirements which can be None.

AI-Generated PR

This PR was automatically generated by the PySDK Issue Agent.

  • Confidence score: 88%
  • Classification: bug
  • SDK version target: V3

Merge Checklist

  • Changes are backward compatible
  • Commit message follows prefix: description format
  • Unit tests added/updated
  • Integration tests added (if applicable)
  • Documentation updated (if applicable)

Copy link
Copy Markdown
Owner Author

@aviruthen aviruthen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review

The PR attempts to fix a TypeError when dependencies is None in get_processing_code_hash, but the fix is applied in the wrong function (get_code_hash) where dependencies is not a parameter, meaning the fix will itself cause a NameError. The tests also appear to test get_processing_code_hash directly but the bug is in get_code_hash which dispatches to it. The core issue needs to be fixed correctly.

Comment thread sagemaker-core/src/sagemaker/core/workflow/utilities.py Outdated
Comment thread sagemaker-core/tests/unit/workflow/test_utilities.py
Comment thread sagemaker-core/tests/unit/workflow/test_utilities.py
Copy link
Copy Markdown
Owner Author

@aviruthen aviruthen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review

This PR attempts to fix a TypeError when dependencies is None in get_processing_code_hash, but the fix is applied to the wrong function (get_code_hash instead of get_processing_code_hash). The variable dependencies doesn't exist in get_code_hash's scope, so this change will introduce a NameError. The tests for the actual fix target get_processing_code_hash correctly, but the code change is misplaced.

Returns:
str: A hash string representing the unique code artifact(s) for the step
"""

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Fix is applied to the wrong function. This line is inside get_code_hash(step: Entity), which does not have a dependencies parameter. The variable dependencies is not defined in this scope, so this will raise a NameError at runtime.

The fix should be applied inside get_processing_code_hash() instead. Looking at the function signature of get_processing_code_hash, it likely accepts code, source_dir, and dependencies parameters. The defensive dependencies = dependencies or [] should be added at the top of that function body.

Additionally, you should also consider fixing get_code_hash where it calls get_training_code_hash or get_processing_code_hash — if source_code.requirements can be None, ensure it's passed as [] rather than None. For example, where get_code_hash calls these functions, you could do:

dependencies = source_code.requirements or []

before passing it to the downstream hash functions.

Comment thread sagemaker-core/tests/unit/workflow/test_utilities.py
Comment thread sagemaker-core/tests/unit/workflow/test_utilities.py
Comment thread sagemaker-core/tests/unit/workflow/test_utilities.py
Comment thread sagemaker-core/tests/unit/workflow/test_utilities.py
Copy link
Copy Markdown
Owner Author

@aviruthen aviruthen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review

This PR fixes a legitimate bug where get_processing_code_hash crashes with a TypeError when dependencies is None. The fix is minimal and correct. However, the test for get_code_hash is overly complex due to mocking isinstance, which is fragile and unnecessary, and there are some minor issues with test structure and formatting.

mock_step.step_args = mock_step_args

with patch("sagemaker.core.workflow.utilities.isinstance") as mock_isinstance:
def isinstance_side_effect(obj, cls):
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is extremely fragile and overly complex. Patching builtins.isinstance via sagemaker.core.workflow.utilities.isinstance is a code smell — it's brittle, hard to maintain, and can break in unexpected ways.

Instead, consider constructing a real TrainingStep (or a minimal subclass/mock that passes isinstance checks naturally), or simply test get_code_hash indirectly by testing get_training_code_hash with dependencies=None directly. The requirements or [] fix on line 176 is already covered if you just call get_training_code_hash(entry_point, source_dir, None) in a simpler test.

def test_get_training_code_hash_with_none_dependencies(self):
    """Test get_training_code_hash with None dependencies does not raise TypeError"""
    with tempfile.TemporaryDirectory() as temp_dir:
        entry_file = Path(temp_dir, "train.py")
        entry_file.write_text("print('training')")

        result = get_training_code_hash(
            entry_point=str(entry_file),
            source_dir=temp_dir,
            dependencies=None,
        )

        assert result is not None
        assert len(result) == 64

return builtins_isinstance(obj, cls)

import builtins
builtins_isinstance = builtins.isinstance
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing builtins and storing builtins.isinstance inside a side-effect closure that patches isinstance is very fragile. If the implementation of get_code_hash changes how it checks types, this test will silently break or give false positives. Please simplify this test as suggested above.

temp_file = f.name

try:
result = get_processing_code_hash(code=temp_file, source_dir=None, dependencies=None)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There's an extra blank line here (two blank lines between methods inside a class). PEP 8 convention for methods within a class is one blank line. This is minor but worth noting since CI may flag it.

if source_code:
source_dir = source_code.source_dir
requirements = source_code.requirements
requirements = source_code.requirements or []
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. However, for consistency and defense-in-depth, consider also updating the function signature of get_processing_code_hash to default dependencies to None (if it doesn't already) and document that None is acceptable. Currently the type hint says List[str] — it should be list[str] | None to accurately reflect that None is a valid input.

Also, get_training_code_hash should be checked — the PR description mentions it already has guards, but the requirements or [] fix on this line suggests it may not handle None in all paths either. Please verify.

@aviruthen aviruthen closed this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pipeline TypeError: can only concatenate list (not "NoneType") to list Using SourceCode

1 participant