From 66512fe22994dda3f64ff648245a07b40bd112a2 Mon Sep 17 00:00:00 2001 From: Chanin Nantasenamat <141535473+sfc-gh-cnantasenamat@users.noreply.github.com> Date: Sat, 4 Apr 2026 23:29:25 -0700 Subject: [PATCH 1/2] Add Gateway A/B testing section to fraud detection quickstart Add new section demonstrating SPCS Gateways for A/B testing model versions with traffic splitting. Covers creating a gateway to split traffic between two deployed model services, testing the endpoint, and shifting traffic with ALTER GATEWAY. .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code --- ...l-with-natural-language-in-snowflake-ml.md | 134 +++++++++++++++++- 1 file changed, 133 insertions(+), 1 deletion(-) diff --git a/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md b/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md index 76a96990a0..68b8101eaa 100644 --- a/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md +++ b/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md @@ -6,7 +6,7 @@ language: en environments: web status: Published feedback link: https://github.com/Snowflake-Labs/sfguides/issues -tags: Getting Started, Data Science, Machine Learning, Snowflake ML, Model Registry, SPCS, Fraud Detection, Cortex Code +tags: Getting Started, Data Science, Machine Learning, Snowflake ML, Model Registry, SPCS, Fraud Detection, Cortex Code, Gateway # Build a Real-Time Fraud Detection Model with Natural Language in Snowflake ML @@ -19,6 +19,7 @@ tags: Getting Started, Data Science, Machine Learning, Snowflake ML, Model Regis - Train an XGBoost machine learning model for fraud detection - Deploy models for scalable inference with one-click deployment - Create REST API endpoints for real-time online inference +- Set up A/B testing between model versions using Gateways for traffic splitting ### What You'll Build A complete fraud detection pipeline featuring: @@ -26,6 +27,7 @@ A complete fraud detection pipeline featuring: - Trained XGBoost classification model - Live REST API endpoint running on Snowpark Container Services (SPCS) - Performance benchmarking with latency profiling +- Gateway for A/B testing across model versions with traffic splitting ![diagram](assets/diagram.png) @@ -562,6 +564,133 @@ It also breaks down the predictions by risk tier so you can verify the model beh • With connection pooling, throughput reaches ~15.7 req/s on a single thread. Parallel requests would scale linearly. ``` + +## A/B Test with Gateways + +In production, you rarely swap a model overnight. Instead, you route a fraction of live traffic to the new version and compare results before committing. [SPCS Gateways](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/gateway) make this simple: a single stable hostname splits ingress requests across multiple service endpoints by percentage, so you can A/B test model versions without changing any client URLs. + +In this step we will deploy an improved V3 model as a second SPCS service, create a gateway that sends 80% of traffic to the proven V2 service and 20% to V3, then verify the split. Once the new version proves itself, we shift the gateway to 100% V3 with a single `ALTER GATEWAY` statement. + +### Prompt + +``` +Train an improved V3 fraud model (try a lower learning rate and more +estimators), log it to the Model Registry, deploy it as a second SPCS service +called FRAUD_INFERENCE_SERVICE_V3, then create a Gateway called +FRAUD_AB_GATEWAY that sends 80% of traffic to the original +FRAUD_INFERENCE_SERVICE and 20% to the new V3 service. Test the gateway +endpoint. +``` + +### What Gets Generated + +Cortex Code trains the updated model, registers it, and deploys a second service: + +``` +MODEL REGISTRATION +============================================================ +Registering model as ML.PROJECTS.FRAUD_XGBOOST_MODEL version V3... + learning_rate: 0.05 (was 0.1) + n_estimators: 500 (was 200) + +✓ SNOWFLAKE_SQL_EXECUTE Verify model V3 registration + SQL Result (2 rows × 10 cols) + ┌─────┬───────────────┬──────────────┬───────────────────┐ + │ # │ name │ version_name │ min_num_arguments │ + ├─────┼───────────────┼──────────────┼───────────────────┤ + │ 1 │ PREDICT │ V3 │ 7 │ + │ 2 │ PREDICT_PROBA │ V3 │ 7 │ + └─────┴───────────────┴──────────────┴───────────────────┘ +``` + +Once the V3 service is running alongside the original: + +``` +✓ SNOWFLAKE_SQL_EXECUTE Check running services + SQL Result (2 rows × 4 cols) + ┌─────┬────────────────────────────────┬─────────┬─────────────┐ + │ # │ name │ status │ schema_name │ + ├─────┼────────────────────────────────┼─────────┼─────────────┤ + │ 1 │ FRAUD_INFERENCE_SERVICE │ RUNNING │ PROJECTS │ + │ 2 │ FRAUD_INFERENCE_SERVICE_V3 │ RUNNING │ PROJECTS │ + └─────┴────────────────────────────────┴─────────┴─────────────┘ +``` + +Next, the gateway is created with an 80/20 traffic split: + +``` +✓ SNOWFLAKE_SQL_EXECUTE Create A/B testing gateway + CREATE OR REPLACE GATEWAY ML.PROJECTS.FRAUD_AB_GATEWAY + FROM SPECIFICATION $$ + spec: + type: traffic_split + split_type: custom + targets: + - type: endpoint + value: ML.PROJECTS.FRAUD_INFERENCE_SERVICE!inference + weight: 80 + - type: endpoint + value: ML.PROJECTS.FRAUD_INFERENCE_SERVICE_V3!inference + weight: 20 + $$; + + SQL Result (1 rows × 1 cols) + ┌─────┬──────────────────────────────────────────────┐ + │ # │ status │ + ├─────┼──────────────────────────────────────────────┤ + │ 1 │ FRAUD_AB_GATEWAY successfully created. │ + └─────┴──────────────────────────────────────────────┘ +``` + +Cortex Code retrieves the stable gateway URL: + +``` +✓ SNOWFLAKE_SQL_EXECUTE Get gateway endpoint + DESC GATEWAY ML.PROJECTS.FRAUD_AB_GATEWAY + ->> SELECT "name", "ingress_url" FROM $1; + + SQL Result (1 rows × 2 cols) + ┌─────┬──────────────────┬─────────────────────────────────────────────────┐ + │ # │ name │ ingress_url │ + ├─────┼──────────────────┼─────────────────────────────────────────────────┤ + │ 1 │ FRAUD_AB_GATEWAY │ ab-gw-ml-proj-aws-us-west-2.snowflakecomputing.app │ + └─────┴──────────────────┴─────────────────────────────────────────────────┘ +``` + +It then sends 100 test requests through the gateway and confirms the split: + +``` +GATEWAY A/B TEST RESULTS +============================================================ +Endpoint: https://ab-gw-ml-proj-aws-us-west-2.snowflakecomputing.app/predict-proba +Total requests: 100 | Errors: 0 + +Traffic Split Observed +┌──────────────────────────────────┬──────────┬──────────┐ +│ Service │ Expected │ Observed │ +├──────────────────────────────────┼──────────┼──────────┤ +│ FRAUD_INFERENCE_SERVICE (V2) │ 80% │ 78% │ +├──────────────────────────────────┼──────────┼──────────┤ +│ FRAUD_INFERENCE_SERVICE_V3 (V3) │ 20% │ 22% │ +└──────────────────────────────────┴──────────┴──────────┘ + +Prediction Agreement: 96% (both versions agree on fraud/legit label) +V3-only catches: 2 additional fraud cases flagged by V3 but missed by V2 +``` + +### Shifting Traffic + +Once you are confident in V3, shift all traffic to the new version: + +``` +Shift the FRAUD_AB_GATEWAY to send 100% of traffic to +FRAUD_INFERENCE_SERVICE_V3. Confirm the change took effect. +``` + +The gateway hostname stays the same, so no client changes are needed. You can also use this pattern for high availability by splitting traffic across services running on different compute pools. + +> Note: Gateway routing automatically fails over to healthy endpoints. If one service becomes unavailable, traffic is redirected to the remaining endpoints proportionally. See the [gateway failover documentation](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/gateway) for details. + ## Debug and Recover from Errors @@ -663,6 +792,7 @@ Congratulations! You've successfully built a complete real-time fraud detection - Train an XGBoost model optimized for imbalanced fraud detection - Deploy models to SPCS with automatic containerization - Create and test REST API endpoints for real-time inference +- Use Gateways to A/B test model versions with traffic splitting behind a stable URL ### Related Resources @@ -676,3 +806,5 @@ Technical Documentation: - [Cortex Code Documentation](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) - Getting started with Cortex Code - [Snowpark Container Services](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview) - Deploy and manage containerized workloads - [Snowflake Model Registry](https://docs.snowflake.com/en/developer-guide/snowflake-ml/model-registry/overview) - Register, version, and deploy ML models +- [SPCS Gateways](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/gateway) - Route ingress traffic to multiple service endpoints +- [CREATE GATEWAY](https://docs.snowflake.com/en/sql-reference/sql/create-gateway) - SQL reference for creating and configuring gateways \ No newline at end of file From 8f4b339976080fcde620d06dbb7e01158ec7aa4e Mon Sep 17 00:00:00 2001 From: Chanin Nantasenamat <141535473+sfc-gh-cnantasenamat@users.noreply.github.com> Date: Sun, 19 Apr 2026 20:59:28 -0700 Subject: [PATCH 2/2] Fix service and model version numbering (V1/V2 instead of V2/V3) --- ...l-with-natural-language-in-snowflake-ml.md | 46 +++++++++---------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md b/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md index 68b8101eaa..d542e63346 100644 --- a/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md +++ b/site/sfguides/src/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml/build-real-time-fraud-detection-model-with-natural-language-in-snowflake-ml.md @@ -433,15 +433,15 @@ This single prompt triggers a complete deployment pipeline. First, the model is ``` MODEL REGISTRATION ============================================================ -Registering model as ML.PROJECTS.FRAUD_XGBOOST_MODEL version V2... +Registering model as ML.PROJECTS.FRAUD_XGBOOST_MODEL version V1... -✓ SNOWFLAKE_SQL_EXECUTE Verify model V2 registration +✓ SNOWFLAKE_SQL_EXECUTE Verify model V1 registration SQL Result (2 rows × 10 cols) ┌─────┬───────────────┬──────────────┬───────────────────┐ │ # │ name │ version_name │ min_num_arguments │ ├─────┼───────────────┼──────────────┼───────────────────┤ - │ 1 │ PREDICT │ V2 │ 7 │ - │ 2 │ PREDICT_PROBA │ V2 │ 7 │ + │ 1 │ PREDICT │ V1 │ 7 │ + │ 2 │ PREDICT_PROBA │ V1 │ 7 │ └─────┴───────────────┴──────────────┴───────────────────┘ ``` @@ -453,7 +453,7 @@ Then deployed to SPCS (this can take a few minutes as it builds the container): ┌─────┬─────────────────────────┬─────────┬───────────────┬─────────────┐ │ # │ name │ status │ database_name │ schema_name │ ├─────┼─────────────────────────┼─────────┼───────────────┼─────────────┤ - │ 1 │ FRAUD_INFERENCE_SERVICE │ RUNNING │ ML │ PROJECTS │ + │ 1 │ FRAUD_INFERENCE_SERVICE_V1 │ RUNNING │ ML │ PROJECTS │ │ 2 │ MODEL_BUILD_4A237CD4 │ DONE │ ML │ PROJECTS │ └─────┴─────────────────────────┴─────────┴───────────────┴─────────────┘ @@ -569,16 +569,16 @@ It also breaks down the predictions by risk tier so you can verify the model beh In production, you rarely swap a model overnight. Instead, you route a fraction of live traffic to the new version and compare results before committing. [SPCS Gateways](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/gateway) make this simple: a single stable hostname splits ingress requests across multiple service endpoints by percentage, so you can A/B test model versions without changing any client URLs. -In this step we will deploy an improved V3 model as a second SPCS service, create a gateway that sends 80% of traffic to the proven V2 service and 20% to V3, then verify the split. Once the new version proves itself, we shift the gateway to 100% V3 with a single `ALTER GATEWAY` statement. +In this step we will deploy an improved V2 model as a second SPCS service, create a gateway that sends 80% of traffic to the proven V1 service and 20% to V2, then verify the split. Once the new version proves itself, we shift the gateway to 100% V2 with a single `ALTER GATEWAY` statement. ### Prompt ``` -Train an improved V3 fraud model (try a lower learning rate and more +Train an improved V2 fraud model (try a lower learning rate and more estimators), log it to the Model Registry, deploy it as a second SPCS service -called FRAUD_INFERENCE_SERVICE_V3, then create a Gateway called +called FRAUD_INFERENCE_SERVICE_V2, then create a Gateway called FRAUD_AB_GATEWAY that sends 80% of traffic to the original -FRAUD_INFERENCE_SERVICE and 20% to the new V3 service. Test the gateway +FRAUD_INFERENCE_SERVICE_V1 and 20% to the new V2 service. Test the gateway endpoint. ``` @@ -589,21 +589,21 @@ Cortex Code trains the updated model, registers it, and deploys a second service ``` MODEL REGISTRATION ============================================================ -Registering model as ML.PROJECTS.FRAUD_XGBOOST_MODEL version V3... +Registering model as ML.PROJECTS.FRAUD_XGBOOST_MODEL version V2... learning_rate: 0.05 (was 0.1) n_estimators: 500 (was 200) -✓ SNOWFLAKE_SQL_EXECUTE Verify model V3 registration +✓ SNOWFLAKE_SQL_EXECUTE Verify model V2 registration SQL Result (2 rows × 10 cols) ┌─────┬───────────────┬──────────────┬───────────────────┐ │ # │ name │ version_name │ min_num_arguments │ ├─────┼───────────────┼──────────────┼───────────────────┤ - │ 1 │ PREDICT │ V3 │ 7 │ - │ 2 │ PREDICT_PROBA │ V3 │ 7 │ + │ 1 │ PREDICT │ V2 │ 7 │ + │ 2 │ PREDICT_PROBA │ V2 │ 7 │ └─────┴───────────────┴──────────────┴───────────────────┘ ``` -Once the V3 service is running alongside the original: +Once the V2 service is running alongside the original: ``` ✓ SNOWFLAKE_SQL_EXECUTE Check running services @@ -611,8 +611,8 @@ Once the V3 service is running alongside the original: ┌─────┬────────────────────────────────┬─────────┬─────────────┐ │ # │ name │ status │ schema_name │ ├─────┼────────────────────────────────┼─────────┼─────────────┤ - │ 1 │ FRAUD_INFERENCE_SERVICE │ RUNNING │ PROJECTS │ - │ 2 │ FRAUD_INFERENCE_SERVICE_V3 │ RUNNING │ PROJECTS │ + │ 1 │ FRAUD_INFERENCE_SERVICE_V1 │ RUNNING │ PROJECTS │ + │ 2 │ FRAUD_INFERENCE_SERVICE_V2 │ RUNNING │ PROJECTS │ └─────┴────────────────────────────────┴─────────┴─────────────┘ ``` @@ -627,10 +627,10 @@ Next, the gateway is created with an 80/20 traffic split: split_type: custom targets: - type: endpoint - value: ML.PROJECTS.FRAUD_INFERENCE_SERVICE!inference + value: ML.PROJECTS.FRAUD_INFERENCE_SERVICE_V1!inference weight: 80 - type: endpoint - value: ML.PROJECTS.FRAUD_INFERENCE_SERVICE_V3!inference + value: ML.PROJECTS.FRAUD_INFERENCE_SERVICE_V2!inference weight: 20 $$; @@ -669,22 +669,22 @@ Traffic Split Observed ┌──────────────────────────────────┬──────────┬──────────┐ │ Service │ Expected │ Observed │ ├──────────────────────────────────┼──────────┼──────────┤ -│ FRAUD_INFERENCE_SERVICE (V2) │ 80% │ 78% │ +│ FRAUD_INFERENCE_SERVICE_V1 │ 80% │ 78% │ ├──────────────────────────────────┼──────────┼──────────┤ -│ FRAUD_INFERENCE_SERVICE_V3 (V3) │ 20% │ 22% │ +│ FRAUD_INFERENCE_SERVICE_V2 │ 20% │ 22% │ └──────────────────────────────────┴──────────┴──────────┘ Prediction Agreement: 96% (both versions agree on fraud/legit label) -V3-only catches: 2 additional fraud cases flagged by V3 but missed by V2 +V2-only catches: 2 additional fraud cases flagged by V2 but missed by V1 ``` ### Shifting Traffic -Once you are confident in V3, shift all traffic to the new version: +Once you are confident in V2, shift all traffic to the new version: ``` Shift the FRAUD_AB_GATEWAY to send 100% of traffic to -FRAUD_INFERENCE_SERVICE_V3. Confirm the change took effect. +FRAUD_INFERENCE_SERVICE_V2. Confirm the change took effect. ``` The gateway hostname stays the same, so no client changes are needed. You can also use this pattern for high availability by splitting traffic across services running on different compute pools.