Skip to content

Commit a8be563

Browse files
Merge pull request #3961 from AI-Hypercomputer:darisoy-fix-permission-confusion
PiperOrigin-RevId: 919270502
2 parents fb79a9e + 1e9e29a commit a8be563

2 files changed

Lines changed: 10 additions & 2 deletions

File tree

docs/tutorials/posttraining/rl_on_multi_host.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,9 +56,13 @@ rely on the vLLM library.
5656
Before starting, ensure you have:
5757

5858
- Access to a Google Cloud Project with TPU quotas.
59+
- **IAM Roles** required:
60+
- **Kubernetes Engine Developer** (`roles/container.developer`) to submit and manage workloads on GKE.
61+
- **Artifact Registry Writer** (`roles/artifactregistry.writer`) to upload Docker images.
62+
- **Storage Object Admin** (`roles/storage.objectAdmin`) on your GCS bucket to read/write checkpoints and logs.
5963
- A Hugging Face account with an access token for downloading models.
60-
- Permissions for Google Artifact Registry (Artifact Registry Writer role).
6164
- Prerequisites for XPK installed (follow [official documentation](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites)).
65+
- **Important:** Modern GKE clusters require the GKE auth plugin. If you encounter `gke-gcloud-auth-plugin not found` when running `kubectl` commands, you must install it locally (e.g., `sudo apt-get install google-cloud-sdk-gke-gcloud-auth-plugin` for `apt` installations, or `gcloud components install gke-gcloud-auth-plugin` for standalone archive installations).
6266
- A Pathways-ready GKE cluster (see [create GKE cluster](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster)).
6367
- **Docker** installed and configured for sudoless use. Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/).
6468

docs/tutorials/posttraining/sft_on_multi_host.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,13 @@ Let's get started!
2929
Before starting, ensure you have:
3030

3131
- Access to a Google Cloud Project with TPU quotas.
32+
- **IAM Roles** required:
33+
- **Kubernetes Engine Developer** (`roles/container.developer`) to submit and manage workloads on GKE.
34+
- **Artifact Registry Writer** (`roles/artifactregistry.writer`) to upload Docker images.
35+
- **Storage Object Admin** (`roles/storage.objectAdmin`) on your GCS bucket to read/write checkpoints and logs.
3236
- A Hugging Face account with an access token for downloading models.
33-
- Permissions for Google Artifact Registry (Artifact Registry Writer role).
3437
- Prerequisites for XPK installed (follow [official documentation](https://github.com/AI-Hypercomputer/xpk/blob/main/docs/installation.md#1-prerequisites)).
38+
- **Important:** Modern GKE clusters require the GKE auth plugin. If you encounter `gke-gcloud-auth-plugin not found` when running `kubectl` commands, you must install it locally (e.g., `sudo apt-get install google-cloud-sdk-gke-gcloud-auth-plugin` for `apt` installations, or `gcloud components install gke-gcloud-auth-plugin` for standalone archive installations).
3539
- A Pathways-ready GKE cluster (see [create GKE cluster](https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/create-gke-cluster)).
3640
- **Docker** installed and configured for sudoless use. Follow the steps to [configure sudoless Docker](https://docs.docker.com/engine/install/linux-postinstall/).
3741

0 commit comments

Comments
 (0)