Skip to content

Commit 9441533

Browse files
SujeethJineshcopybara-github
authored andcommitted
Create Dockerfile for building the custom Colocated Python server image.
PiperOrigin-RevId: 745233667
1 parent 8386856 commit 9441533

3 files changed

Lines changed: 105 additions & 15 deletions

File tree

pathwaysutils/sidecar/README.md

Lines changed: 68 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@ out = create_and_save_plot(dummy_device_array)
104104

105105
For more advanced usage (such as data loading), check out [MaxText's RemoteIterator class](https://github.com/AI-Hypercomputer/maxtext/blob/391a5a788d85cae8942334b042fdabdbd549af51/MaxText/multihost_dataloading.py#L175).
106106

107+
See Installation and Usage for instructions on how to use MaxText out of the box with this feature.
108+
107109
### Verification
108110

109111
To verify files were created, SSH into one of the TPU workers using the following command and check that the file was created.
@@ -120,15 +122,20 @@ Follow these steps to set up, build, and deploy your application with the Coloca
120122

121123
**Prerequisites**
122124

123-
Ensure [Docker](https://docs.docker.com/engine/install/) is installed on your system along with [gcloud](https://cloud.google.com/sdk/docs/install). Ensure you are authenticated into gcloud.
125+
Ensure [Docker](https://docs.docker.com/engine/install/) is installed on your system along with [gcloud](https://cloud.google.com/sdk/docs/install). Ensure you are authenticated into gcloud and Docker is configured for your region. For Google Artifact Registry, you typically run a command like this (replace `REGION` with the region of your repository, e.g., `us-east5`):
126+
127+
```bash
128+
gcloud auth login
129+
gcloud auth configure-docker REGION-docker.pkg.dev
130+
```
124131

125132
**1. Clone the Repository**
126133

127134
Get the necessary code and scripts.
128135

129136
```bash
130137
git clone https://github.com/AI-Hypercomputer/pathways-utils.git
131-
cd pathways-utils
138+
cd pathways-utils/sidecar/python
132139
```
133140

134141
**2. Prepare Sidecar Dependencies**
@@ -143,7 +150,6 @@ jax>=0.5.1
143150
tensorflow-datasets
144151
tiktoken
145152
grain-nightly>=0.0.10
146-
sentencepiece==0.1.97
147153
```
148154

149155
**3. Build the Colocated Python Sidecar Image and upload it to Artifact Registry**
@@ -152,38 +158,85 @@ Use the provided Dockerfile to create the sidecar image. This image will contain
152158

153159
```bash
154160
export PROJECT_ID=<your_project_id>
155-
export IMAGE_LOCATION=us-docker.pkg.dev/${PROJECT_ID}/colocated-python:latest
161+
export LOCAL_IMAGE_NAME=my-colocated-python-server
162+
export JAX_VERSION=0.5.3
163+
164+
docker build --build-arg JAX_VERSION=${JAX_VERSION} -t ${LOCAL_IMAGE_NAME} .
165+
```
156166

157-
docker build -t ${IMAGE_LOCATION} .
167+
Now you can upload the image to Google Artifact Registry.
168+
169+
```bash
170+
export REGION=us # Your Region
171+
export ARTIFACT_REGISTRY_REPO=YOUR_ARTIFACT_REGISTRY_REPO
172+
export EXPORTED_IMAGE_LOCATION=${REGION}-docker.pkg.dev/${PROJECT_ID}/${ARTIFACT_REGISTRY_REPO}/my-colocated-python:latest
173+
174+
docker tag ${LOCAL_IMAGE_NAME} ${EXPORTED_IMAGE_LOCATION}
175+
docker push ${EXPORTED_IMAGE_LOCATION}
176+
177+
# Delete the local image as it's no longer needed.
178+
docker image rm ${LOCAL_IMAGE_NAME}
158179
```
159180

160181
**4. Update Deployment Configuration**
161182

183+
***Simple Example***
184+
162185
Modify your Kubernetes deployment YAML file to use your colocated python sidecar image. This assumes you are using the [pathways-job](https://github.com/google/pathways-job) api.
163186

164-
For example.
187+
For example, if using 2 v4-16 TPUs, use the following yaml. This example is modified from [pathways-job](https://github.com/google/pathways-job/blob/main/config/samples/colocated_python_example_pathwaysjob.yaml).
165188

166189
```yaml
167-
...
190+
apiVersion: pathways-job.pathways.domain/v1
191+
kind: PathwaysJob
192+
metadata:
193+
name: pathways-colocated
168194
spec:
169195
maxRestarts: 0
170196
customComponents:
171197
- componentType: colocated_python_sidecar
172-
image: us-docker.pkg.dev/<your_project_id>/colocated-python:latest
173-
...
198+
image: <location of your colocated python sidecar server image>
199+
workers:
200+
- type: ct4p-hightpu-4t
201+
topology: 2x2x2
202+
numSlices: 2
203+
pathwaysDir: "gs://<test-bucket>/tmp" #This bucket needs to be created in advance.
204+
controller:
205+
# Pod template for training, default mode.
206+
deploymentMode: default
207+
mainContainerName: main
208+
template: # UserPodTemplate
209+
spec:
210+
containers:
211+
- name: main
212+
env:
213+
- name: XCLOUD_ENVIRONMENT
214+
value: GCP
215+
- name: JAX_PLATFORMS
216+
value: proxy
217+
- name: JAX_BACKEND_TARGET
218+
value: grpc://pathways-colocatedpython-trial-pathways-head-0-0.pathways-colocatedpython-trial:29000
219+
image: python:3.13
220+
imagePullPolicy: Always
221+
command:
222+
- /bin/sh
223+
- -c
224+
- |
225+
pip install --upgrade pip
226+
pip install -U --pre jax jaxlib -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html
227+
pip install pathwaysutils
228+
python -c "import jax; import pathwaysutils; print(\"Number of JAX devices is\", len(jax.devices()))"
174229
```
175230
176-
For a full sample Yaml, please refer to [pathways-job](https://github.com/google/pathways-job/blob/main/config/samples/colocated_python_example_pathwaysjob.yaml).
177-
178-
**5. (Optional) Turn on Data Loading Optimization in MaxText**
231+
***MaxText Reference Example***
179232
180233
If using MaxText, to turn on the data loading optimization that uses Colocated Python feature.
181234
182235
```python
183-
colocated_python_data_input = True
236+
colocated_python_data_input=True
184237
```
185238

186-
**6. Deploy the Application**
239+
**5. Deploy the Application**
187240

188241
Apply the updated deployment configuration to your Kubernetes cluster:
189242

@@ -197,5 +250,5 @@ This will create the necessary pods with your application, pathways head, and th
197250

198251
**User Dependency Conflicts**
199252

200-
Colocated Python relies on specific internal dependencies, including JAX. Refer to the provided `server_requirements.txt` for the complete list of required dependencies. Using a different dependency version than the one provided in `server_requirements.txt` will cause the remote Python image build to fail.
253+
Colocated Python relies on specific internal dependencies, including JAX. Refer to the provided `server_requirements.txt` for the complete list of required dependencies. Using a different dependency version than the one provided in `server_requirements.txt` will cause the Colocated Python image build to fail.
201254

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Dockerfile for building the colocated python start wheel using a specific JAX base image
2+
# Build using --build-arg JAX_VERSION=<version>, where <version> must be one of [0.5.1, 0.5.2, 0.5.3]
3+
# Example: `docker build --build-arg JAX_VERSION=0.5.1 -t my-colocated-python-server .`
4+
5+
# --- Build Argument ---
6+
# Defines the build argument for the JAX version.
7+
# This MUST be declared before the first FROM instruction to be used within it.
8+
ARG JAX_VERSION
9+
10+
# --- Base Image ---
11+
# Use the specified JAX base image from Google Artifact Registry.
12+
# The JAX_VERSION argument is substituted here.
13+
FROM us-docker.pkg.dev/cloud-tpu-v2-images/pathways/colocated_python_server:jax-$JAX_VERSION
14+
15+
# --- Application Setup ---
16+
# Set working directory
17+
WORKDIR /app
18+
19+
# Copy the user's requirements file containing the dependencies to install
20+
COPY requirements.txt .
21+
22+
# Install user dependencies, and check if they are compatible, if not fail
23+
RUN . venv/bin/activate && \
24+
pip install --no-cache-dir -r requirements.txt && \
25+
pip check
26+
27+
# --- Runtime Configuration ---
28+
# Set the default port (optional, might be set by base image)
29+
ENV PORT 50051
30+
31+
# Command to run the application. Colocated Python is already installed in the base image.
32+
CMD ["/bin/sh", "-c", ". venv/bin/activate && venv/bin/python3 main.py --port=50051"]
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Your requirements go here, for example:
2+
jax>=0.5.1
3+
tensorflow-datasets
4+
tiktoken
5+
grain-nightly>=0.0.1

0 commit comments

Comments
 (0)