feat(feature-processor): Add Lake Formation credential vending and Spark 3.5/Python 3.12 support by BassemHalim · Pull Request #5816 · aws/sagemaker-python-sdk

BassemHalim · 2026-04-30T22:26:37Z

This PR upgrades the Feature Processor module to support PySpark 3.5 and Python 3.12, adds Lake Formation credential vending for data ingestion, and improves Spark version handling across the board.

Description of changes:

Spark version upgrade: Bumps PySpark from 3.3.2 to 3.5.1 and sagemaker-feature-store-pyspark from 3.3 to 2.0.0 in
pyproject.toml.
Python 3.12 support: Extends the allowed Python versions for Spark remote jobs from [3.9] to [3.9, 3.12] in
sagemaker-core/job.py.
Auto-detect PySpark version: _get_default_spark_image now detects the installed PySpark version at runtime instead of using a
hardcoded default.
Auto-inject Feature Store PySpark dependency: When spark_config is set, _JobSettings.init automatically appends pip
install sagemaker-feature-store-pyspark and a JAR copy command to pre_execution_commands.
New _image_resolver.py module: Introduces _get_spark_image_uri() with a SPARK_IMAGE_SUPPORT_MATRIX that maps Spark versions
to supported Python versions, replacing the hardcoded logic in feature_scheduler.py.
Dynamic Hadoop version resolution: New SPARK_TO_HADOOP_MAP and _get_hadoop_version() in _spark_factory.py resolve the correct
Hadoop Maven coordinates based on the installed PySpark version.
Lazy imports for feature_store_pyspark: Moves feature_store_pyspark imports from module-level to inside methods, preventing
import errors when the package isn't installed.
Feature Store JARs always on classpath: spark.jars config now includes version-matched Feature Store JARs for both training
and non-training jobs. A new _install_feature_store_jars() method copies JARs to /usr/lib/spark/jars/.
Lake Formation credential vending: Adds use_lake_formation_credentials parameter to @feature_processor decorator, threaded
through FeatureProcessorConfig -> _udf_output_receiver.ingest_udf_output() -> FeatureStoreManager.ingest_data.
ECDSA signing key generation: _config_uploader._prepare_and_upload_callable() now generates an ECDSA key pair, passes the
private key to StoredFunction, and returns the public key PEM. The public key is set as REMOTE_FUNCTION_SECRET_KEY environment
variable on the ModelTrainer.
Conditional image_uri in scheduler: _get_remote_decorator_config_from_input now only sets image_uri if one isn't already
provided, allowing user-specified images.
Updated pre_execution_commands: Integration test helper uses python3 -m pip and python3 -m awscli patterns, installs awscli
explicitly, and installs mlops_whl instead of sagemaker_whl with [feature-processor] extras.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add configurable use_lake_formation_credentials parameter to the @feature_processor decorator, defaulting to False. The value flows through FeatureProcessorConfig to the Spark connector's ingest_data() call, enabling Lake Formation credential vending when set to True. --- X-AI-Prompt: make useLakeFormationCreds configurable, defaults to False, passed to feature_processor X-AI-Tool: kiro-cli

Generate ECDSA signing key in ConfigUploader and pass it to StoredFunction for function payload signature verification. The public key PEM is returned to callers for remote-side verification. --- X-AI-Prompt: fix StoredFunction missing signing_key error in feature_processor pipeline X-AI-Tool: kiro-cli

Add _image_resolver module that resolves the SageMaker Spark processing container image URI based on installed PySpark and Python versions. Supports Spark 3.1/3.2/3.3/3.5 with appropriate Python version mapping. Uses container_version=v1 as a floating tag. --- X-AI-Prompt: add image resolver with container_version v1 for spark processing image X-AI-Tool: kiro-cli

…cheduler Update feature_scheduler to use _get_spark_image_uri for dynamic image resolution instead of _JobSettings._get_default_spark_image. Thread public_key_pem from ConfigUploader through to ModelTrainer environment as REMOTE_FUNCTION_SECRET_KEY. Allow user-provided image_uri to take precedence over auto-resolved URI. --- X-AI-Prompt: integrate image resolver and signing key into feature scheduler pipeline X-AI-Tool: kiro-cli

…Store JARs Resolve Hadoop version dynamically based on installed PySpark version instead of hardcoding 3.3.1. Move Feature Store JAR classpath setup outside the non-training-job guard so spark.jars is always set, fixing FeatureStoreManager class loading in training job mode. --- X-AI-Prompt: fix spark factory hadoop version and jar classpath for spark 3.5 X-AI-Tool: kiro-cli

Update _get_default_spark_image to accept Python 3.12 in addition to 3.9. Auto-detect Spark version from installed pyspark instead of hardcoding 3.3, falling back to the default if pyspark is not installed. Also resolve correct Python binary in Spark bootstrap script to avoid PATH conflicts with system python3. --- X-AI-Prompt: fix job.py to select correct spark image for py312 and detect pyspark version X-AI-Tool: kiro-cli

…on 3.12 Update expected error message in remote function tests to reflect that SageMaker Spark images now support Python versions 3.9 and 3.12.

…or deps Pin pyspark==3.5.1 in both feature-processor and test optional dependencies to ensure consistent Spark version across environments.

…ersions SageMaker Spark image only supports Python 3.9 and 3.12. Add skipif markers to three feature processor integ tests that fail on Python 3.10.

Inject sagemaker-feature-store-pyspark>=2,<3 via pre_execution_commands in _get_remote_decorator_config_from_input so it gets installed on the remote container automatically. Update integ tests: add skipif for Python 3.10 Spark tests, remove manual feature-store-pyspark install, use python3 instead of python3.12.

… feature-store-pyspark - Update test error messages to reflect Python 3.9 and 3.12 support - Add pyspark 3.5.1 to test and feature-processor optional deps - Skip Spark integ tests on unsupported Python versions (3.10) - Auto-install sagemaker-feature-store-pyspark>=2,<3 via pre_execution_commands in to_pipeline and copy version-matched JAR to Spark classpath - Use standard SageMaker Spark image resolution via SparkConfig - Use python3 instead of python3.12 in integ test pre_execution_commands

… remote jobs When spark_config is set on a remote job, _JobSettings now automatically injects pip install of sagemaker-feature-store-pyspark and copies the Spark 3.5-matched JAR to /usr/lib/spark/jars/ via pre_execution_commands. This makes the package work transparently when the SageMaker Spark image does not pre-install sagemaker-feature-store-pyspark. - Make feature_store_pyspark imports lazy in _spark_factory.py to avoid deserialization failures when the module is not yet installed - Add sagemaker-feature-store-pyspark to integ test requirements.txt - Remove duplicate injection from feature_scheduler.py (to_pipeline path) since _JobSettings now handles all Spark remote jobs

…e_execution_commands _JobSettings now auto-injects feature-store-pyspark install and JAR copy commands when spark_config is set, so update the test assertion to expect these commands in the _prepare_and_upload_workspace call.

@patch

… tests - Add @patch decorator to mock feature_store_pyspark.classpath_jars in test_spark_session_factory_configuration - Add @patch decorator to mock feature_store_pyspark.classpath_jars in test_spark_session_factory_configuration_on_training_job - Add @patch decorator to mock feature_store_pyspark.classpath_jars in test_spark_session_factory - Add @patch decorator to mock feature_store_pyspark.classpath_jars in test_spark_session_factory_with_iceberg_config - Add @patch decorator to mock feature_store_pyspark.classpath_jars in test_spark_session_factory_same_instance - Add @patch decorator to mock feature_store_pyspark.classpath_jars in test_spark_configs_use_dynamic_hadoop_version - Replace direct calls to feature_store_pyspark.classpath_jars() with mock_classpath_jars.return_value - Update test_repack_model.py to use resolved path variable for consistency in _get_safe_members test

Add IcebergProperties to the feature_store __init__.py imports and __all__ list so users can import it directly from the package instead of reaching into the internal feature_group_manager module. --- X-AI-Prompt: review staged change and commit if good X-AI-Tool: kiro-cli

BassemHalim · 2026-05-11T23:04:38Z

+                "pip install --root-user-action=ignore"
+                " 'sagemaker-feature-store-pyspark>=2,<3'"
+            )
+            copy_jar_cmd = (


Pyspark is not installed where this command runs so we read the $SPARK_HOME/usr/lib/spark/RELEASE file which has the currently installed spark version

BassemHalim · 2026-05-11T23:09:55Z

@@ -92,7 +92,7 @@ def test_is_bad_link_unsafe():

 def test_get_safe_members_all_safe():


This is not related to this PR but It was failing in CI so I fixed it

mollyheamazon · 2026-05-12T17:42:04Z


 import attr
+from cryptography.hazmat.primitives.asymmetric import ec
+from cryptography.hazmat.primitives import serialization as crypto_serialization


Is this intro of a library needed? If so, I see that sagemaker-mlops/pyproject.toml is not changed to add cryptography as a dependency.

They are needed yes.
Cryptography is a dependency of sagemaker-core "cryptography>=46.0.0",
which sagemaker-mlops depends on but I think I will still add it to mlops in case it is removed from core in the future

… explicit cryptography dep Add comment explaining REMOTE_FUNCTION_SECRET_KEY is a legacy misnomer — the value is an ECDSA public key for signature verification, not a secret. Add cryptography>=46.0.0 as explicit dependency in sagemaker-mlops. --- X-AI-Prompt: resolve CR comment about misleading REMOTE_FUNCTION_SECRET_KEY naming and missing cryptography dependency X-AI-Tool: kiro

BassemHalim added 7 commits April 28, 2026 14:38

Add Integ test (WIP)

932c997

BassemHalim temporarily deployed to auto-approve April 30, 2026 22:26 — with GitHub Actions Inactive

Merge branch 'master' into feat/feature-store-fp-lf

6d7e7bf

BassemHalim temporarily deployed to auto-approve April 30, 2026 22:49 — with GitHub Actions Inactive

BassemHalim added 2 commits April 30, 2026 20:42

fix(sagemaker-core): Update Spark image error message to include Pyth…

5eca24b

…on 3.12 Update expected error message in remote function tests to reflect that SageMaker Spark images now support Python versions 3.9 and 3.12.

chore(sagemaker-mlops): Add pyspark 3.5.1 to test and feature-process…

6a9e051

…or deps Pin pyspark==3.5.1 in both feature-processor and test optional dependencies to ensure consistent Spark version across environments.

BassemHalim temporarily deployed to auto-approve May 1, 2026 03:43 — with GitHub Actions Inactive

test(sagemaker-mlops): Skip Spark integ tests on unsupported Python v…

eb1eaf8

…ersions SageMaker Spark image only supports Python 3.9 and 3.12. Add skipif markers to three feature processor integ tests that fail on Python 3.10.

BassemHalim temporarily deployed to auto-approve May 1, 2026 04:35 — with GitHub Actions Inactive

BassemHalim temporarily deployed to auto-approve May 1, 2026 17:21 — with GitHub Actions Inactive

BassemHalim added 3 commits May 1, 2026 14:51

BassemHalim temporarily deployed to auto-approve May 1, 2026 21:53 — with GitHub Actions Inactive

Merge branch 'master' into feat/feature-store-fp-lf

f8479c5

BassemHalim temporarily deployed to auto-approve May 1, 2026 22:04 — with GitHub Actions Inactive

BassemHalim temporarily deployed to auto-approve May 7, 2026 22:42 — with GitHub Actions Inactive

trigger CI

caa8650

BassemHalim temporarily deployed to auto-approve May 8, 2026 17:46 — with GitHub Actions Inactive

BassemHalim temporarily deployed to auto-approve May 11, 2026 19:37 — with GitHub Actions Inactive

BassemHalim temporarily deployed to auto-approve May 11, 2026 22:59 — with GitHub Actions Inactive

BassemHalim commented May 11, 2026

View reviewed changes

Merge branch 'master' into feat/feature-store-fp-lf

11dd57e

BassemHalim temporarily deployed to auto-approve May 12, 2026 01:26 — with GitHub Actions Inactive

BassemHalim requested a review from mollyheamazon May 12, 2026 17:35

mollyheamazon requested changes May 12, 2026

View reviewed changes

BassemHalim and others added 2 commits May 12, 2026 11:42

Merge branch 'master' into feat/feature-store-fp-lf

9705161

mollyheamazon had a problem deploying to auto-approve May 12, 2026 19:16 — with GitHub Actions Error

mollyheamazon had a problem deploying to auto-approve May 12, 2026 19:17 — with GitHub Actions Error

Merge branch 'master' into feat/feature-store-fp-lf

8f79fe7

BassemHalim temporarily deployed to auto-approve May 12, 2026 19:20 — with GitHub Actions Inactive

Merge branch 'master' into feat/feature-store-fp-lf

d923bd4

BassemHalim temporarily deployed to auto-approve May 12, 2026 23:37 — with GitHub Actions Inactive

mollyheamazon approved these changes May 13, 2026

View reviewed changes

mollyheamazon merged commit 58ac9dc into aws:master May 13, 2026
38 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(feature-processor): Add Lake Formation credential vending and Spark 3.5/Python 3.12 support#5816

feat(feature-processor): Add Lake Formation credential vending and Spark 3.5/Python 3.12 support#5816
mollyheamazon merged 34 commits into
aws:masterfrom
BassemHalim:feat/feature-store-fp-lf

BassemHalim commented Apr 30, 2026 •

edited

Loading

Uh oh!

BassemHalim May 11, 2026

Uh oh!

BassemHalim May 11, 2026

Uh oh!

Uh oh!

mollyheamazon May 12, 2026

Uh oh!

BassemHalim May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -92,7 +92,7 @@ def test_is_bad_link_unsafe():

		def test_get_safe_members_all_safe():

Uh oh!

Conversation

BassemHalim commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BassemHalim May 11, 2026

Choose a reason for hiding this comment

Uh oh!

BassemHalim May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mollyheamazon May 12, 2026

Choose a reason for hiding this comment

Uh oh!

BassemHalim May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BassemHalim commented Apr 30, 2026 •

edited

Loading