Conversation
…o list Using Sou (aws#5518)
aviruthen
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
The PR attempts to fix a TypeError when dependencies is None in get_processing_code_hash, but the fix is applied in the wrong function (get_code_hash) where dependencies is not a parameter, meaning the fix will itself cause a NameError. The tests also appear to test get_processing_code_hash directly but the bug is in get_code_hash which dispatches to it. The core issue needs to be fixed correctly.
aviruthen
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR attempts to fix a TypeError when dependencies is None in get_processing_code_hash, but the fix is applied to the wrong function (get_code_hash instead of get_processing_code_hash). The variable dependencies doesn't exist in get_code_hash's scope, so this change will introduce a NameError. The tests for the actual fix target get_processing_code_hash correctly, but the code change is misplaced.
| Returns: | ||
| str: A hash string representing the unique code artifact(s) for the step | ||
| """ | ||
|
|
There was a problem hiding this comment.
Bug: Fix is applied to the wrong function. This line is inside get_code_hash(step: Entity), which does not have a dependencies parameter. The variable dependencies is not defined in this scope, so this will raise a NameError at runtime.
The fix should be applied inside get_processing_code_hash() instead. Looking at the function signature of get_processing_code_hash, it likely accepts code, source_dir, and dependencies parameters. The defensive dependencies = dependencies or [] should be added at the top of that function body.
Additionally, you should also consider fixing get_code_hash where it calls get_training_code_hash or get_processing_code_hash — if source_code.requirements can be None, ensure it's passed as [] rather than None. For example, where get_code_hash calls these functions, you could do:
dependencies = source_code.requirements or []before passing it to the downstream hash functions.
aviruthen
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes a legitimate bug where get_processing_code_hash crashes with a TypeError when dependencies is None. The fix is minimal and correct. However, the test for get_code_hash is overly complex due to mocking isinstance, which is fragile and unnecessary, and there are some minor issues with test structure and formatting.
| mock_step.step_args = mock_step_args | ||
|
|
||
| with patch("sagemaker.core.workflow.utilities.isinstance") as mock_isinstance: | ||
| def isinstance_side_effect(obj, cls): |
There was a problem hiding this comment.
This test is extremely fragile and overly complex. Patching builtins.isinstance via sagemaker.core.workflow.utilities.isinstance is a code smell — it's brittle, hard to maintain, and can break in unexpected ways.
Instead, consider constructing a real TrainingStep (or a minimal subclass/mock that passes isinstance checks naturally), or simply test get_code_hash indirectly by testing get_training_code_hash with dependencies=None directly. The requirements or [] fix on line 176 is already covered if you just call get_training_code_hash(entry_point, source_dir, None) in a simpler test.
def test_get_training_code_hash_with_none_dependencies(self):
"""Test get_training_code_hash with None dependencies does not raise TypeError"""
with tempfile.TemporaryDirectory() as temp_dir:
entry_file = Path(temp_dir, "train.py")
entry_file.write_text("print('training')")
result = get_training_code_hash(
entry_point=str(entry_file),
source_dir=temp_dir,
dependencies=None,
)
assert result is not None
assert len(result) == 64| return builtins_isinstance(obj, cls) | ||
|
|
||
| import builtins | ||
| builtins_isinstance = builtins.isinstance |
There was a problem hiding this comment.
Importing builtins and storing builtins.isinstance inside a side-effect closure that patches isinstance is very fragile. If the implementation of get_code_hash changes how it checks types, this test will silently break or give false positives. Please simplify this test as suggested above.
| temp_file = f.name | ||
|
|
||
| try: | ||
| result = get_processing_code_hash(code=temp_file, source_dir=None, dependencies=None) |
There was a problem hiding this comment.
Nit: There's an extra blank line here (two blank lines between methods inside a class). PEP 8 convention for methods within a class is one blank line. This is minor but worth noting since CI may flag it.
| if source_code: | ||
| source_dir = source_code.source_dir | ||
| requirements = source_code.requirements | ||
| requirements = source_code.requirements or [] |
There was a problem hiding this comment.
Good fix. However, for consistency and defense-in-depth, consider also updating the function signature of get_processing_code_hash to default dependencies to None (if it doesn't already) and document that None is acceptable. Currently the type hint says List[str] — it should be list[str] | None to accurately reflect that None is a valid input.
Also, get_training_code_hash should be checked — the PR description mentions it already has guards, but the requirements or [] fix on this line suggests it may not handle None in all paths either. Please verify.
Description
Fix TypeError when
dependenciesis None inget_processing_code_hashFixes aws#5518
Problem
When using
SourceCodewithout specifyingrequirementsand callingpipeline.upsert(), aTypeError: can only concatenate list (not "NoneType") to listis raised. This happens becauseget_processing_code_hash()performs list concatenation like[source_dir] + dependenciesand[code] + dependencieswithout guarding againstdependenciesbeingNone.While
get_training_code_hashalready hasif dependencies:guards, theget_processing_code_hashfunction did not have any such protection.Solution
Add a defensive default at the start of
get_processing_code_hashto convertNoneto an empty list:This is consistent with how other functions in the codebase handle optional list parameters and ensures that list concatenation operations work correctly regardless of whether
dependenciesisNoneor an empty list.Testing
Added three new test cases:
test_get_processing_code_hash_with_none_dependencies- code only withNonedependenciestest_get_processing_code_hash_with_source_dir_and_none_dependencies- source_dir withNonedependenciestest_get_processing_code_hash_with_code_only_and_none_dependencies- explicit code-only path withNonedependenciesRelated Issue
Fixes aws#5518
Changes Made
The bug is in
sagemaker-core/src/sagemaker/core/workflow/utilities.py. Theget_processing_code_hashfunction performs list concatenation like[source_dir] + dependenciesand[code] + dependencieswithout guarding againstdependenciesbeingNone. Whileget_training_code_hashalready hasif dependencies:guards, theget_processing_code_hashfunction does not. Additionally, based on the user's traceback (which points toget_training_code_hashline 251 withhash_files_or_dirs([source_dir] + dependencies)), it appears the user may be running a version whereget_training_code_hashlacks the guard too. The safest fix is to defaultdependenciesto an empty list at the start ofget_processing_code_hash, and to add the same defensive defaulting inget_code_hashwhererequirements(passed asdependencies) comes fromsource_code.requirementswhich can beNone.AI-Generated PR
This PR was automatically generated by the PySDK Issue Agent.
Merge Checklist
prefix: descriptionformat