- Overview
- Prerequisites
- Install Required Packages
- Quick Start: Keras Mixed Precision
- Deploying with TensorFlow Serving (bfloat16 Auto Mixed Precision)
- Client Inference (REST)
- Optional: Graph Freezing for Additional Performance
- Key Validation Steps
ResNet50 (Residual Network with 50 layers) is a convolutional neural network pretrained on ImageNet for image classification. This example shows how to run ResNet50 v1.5 (Keras built-in) for 1000-class image classification on Intel Xeon processors with AMX acceleration using bfloat16 mixed precision.
- Intel Xeon 4th Gen (or newer) with AMX
bfloat16support - Docker (for TensorFlow Serving deployment)
- Python environment with
pip - Internet access to download model weights
Pinned versions are shown below for reproducibility.
pip install tensorflow==2.21.0To reuse the standard float32 (pretrained) ResNet50 model while executing layers in bfloat16 on AMX, enable a mixed_bfloat16 policy BEFORE creating/loading the model. This keeps model weights in float32 for stability while executing math (matmul, convolution, batch norm) in bfloat16 on AMX-capable Intel Xeon processors. Note that this approach to enable auto-mixed precision can be used for any Keras model.
import numpy as np
import tensorflow as tf
import keras
# 1. Enable AMX via bfloat16 Mixed Precision
# Set this BEFORE loading the model so all layers use bfloat16 compute with float32 weights.
keras.mixed_precision.set_global_policy("mixed_bfloat16")
# 2. Load pretrained ResNet50 (ImageNet weights, 1000 classes)
model = keras.applications.ResNet50(weights="imagenet")
# 3. Create a dummy input image (224x224x3, batch of 1)
# In production, use keras.utils.load_img() and keras.applications.resnet50.preprocess_input().
dummy_image = np.random.rand(1, 224, 224, 3).astype(np.float32)
preprocessed = keras.applications.resnet50.preprocess_input(dummy_image)
# 4. Run Inference
predictions = model(preprocessed, training=False)
logits = tf.cast(predictions, tf.float32) # ensure float32 for downstream usage
top5 = tf.math.top_k(logits, k=5)
print("Top-5 class indices:", top5.indices.numpy())
print("Top-5 logits:", top5.values.numpy())Notes:
- Set
ONEDNN_VERBOSE=1to confirm AMX usage (look forbrg_matmul ... amx). - Revert to full
float32by removing the policy or settingmixed_precisiontofloat32.
Note: We don't need to explicitly enable
bfloat16mixed precision with Keras while exporting the model, because the--mixed_precision=bfloat16flag passed when starting the inference server handles that automatically (see Start the Server (Enablebfloat16) below).
Create export_resnet50.py:
import numpy as np
import tensorflow as tf
import keras
model = keras.applications.ResNet50(weights="imagenet")
# Export the model in float32 format.
output_model_path = "/tmp/resnet50/1"
model.export(output_model_path)
print("Exported to:", output_model_path)Run:
python export_resnet50.pyPull the official TensorFlow Serving CPU image:
docker pull tensorflow/servingReference setup guide: https://github.com/tensorflow/serving?tab=readme-ov-file#set-up
TensorFlow Serving (CPU) currently supports bfloat16 mixed precision (fp16 not yet enabled for CPU on TensorFlow Serving).
docker run -t --rm \
-p 8501:8501 \
-v /tmp/resnet50:/models/resnet50 \
-e MODEL_NAME=resnet50 \
-e ONEDNN_VERBOSE=1 \
tensorflow/serving --mixed_precision=bfloat16Sample log indicators:
auto_mixed_precision_onednn_bfloat16graph optimizerbrg_matmulwithamxandsrc_bf16/wei_bf16
I0000 00:00:0000000000.000000 905 auto_mixed_precision.cc:2335] Running auto_mixed_precision_onednn_bfloat16 graph optimizer
I0000 00:00:0000000000.000000 905 auto_mixed_precision.cc:2263] Converted N/M nodes to bfloat16 precision using K cast(s) to bfloat16 (excluding Const and Variable casts)Troubleshooting 403:
- Ensure the URL model name matches
MODEL_NAME. - Check container logs:
docker logs <id>. - Disable proxies:
export no_proxy=localhost,127.0.0.1.
Install:
pip install requests==2.33.1 numpy==2.4.4Create infer_resnet50.py:
import requests, json, numpy as np
# Create a dummy input image (224x224x3, batch of 1)
# In production, load a real image and convert to list.
dummy_image = np.random.rand(1, 224, 224, 3).astype(np.float32)
payload = {
"instances": dummy_image.tolist()
}
resp = requests.post(
"http://127.0.0.1:8501/v1/models/resnet50:predict",
data=json.dumps(payload),
headers={"content-type": "application/json"},
proxies={"http": None, "https": None}
)
if resp.status_code == 200:
preds = np.array(resp.json()["predictions"])
top5_indices = np.argsort(preds[0])[-5:][::-1]
top5_logits = preds[0][top5_indices]
print("Inference successful!")
print("Top-5 class indices:", top5_indices)
print("Top-5 logits:", top5_logits)
else:
print("Error:", resp.status_code, resp.text)Run:
python infer_resnet50.pyExpected Logs on the Server
I0000 00:00:0000000000.000000 3797 auto_mixed_precision.cc:2335] Running auto_mixed_precision_onednn_bfloat16 graph optimizer
I0000 00:00:0000000000.000000 3797 auto_mixed_precision.cc:2263] Converted N/M nodes to bfloat16 precision using K cast(s) to bfloat16 (excluding Const and Variable casts)Expected Logs on the Client
Top-5 class indices and logits for ImageNet classification (random input will give arbitrary results).
Inference successful!
Top-5 class indices: [916 530 851 644 664]
Top-5 logits: [0.05151367 0.046875 0.04541016 0.04272461 0.03881836]
Freeze variables to constants for a lean inference graph (removes variable-loading overhead).
Script (public reference): https://raw.githubusercontent.com/oneapi-src/oneAPI-samples/master/AI-and-Analytics/Features-and-Functionality/IntelTensorFlow_InferenceOptimization/scripts/freeze_optimize_v2.py
Example:
python freeze_optimize_v2.py \
--input_saved_model_dir=/tmp/resnet50/1 \
--output_saved_model_dir=/tmp/resnet50_frozen/1
Run this after exporting the SavedModel (server side).
- Functional: REST returns logits JSON with 1000 class scores
- Precision: Logs show
auto_mixed_precision_onednn_bfloat16 - AMX:
ONEDNN_VERBOSElines includeamxandbf16datatypes - Rollback: Remove
--mixed_precisionflag on TF Serving; delete policy in Keras path
Enabled bfloat16 mixed precision for ResNet50 on Xeon with minimal code change, deployed via TensorFlow Serving, verified AMX acceleration, and optionally optimized the model by freezing the graph.