|
| 1 | +# Vertex AI SDK User guide |
| 2 | + |
| 3 | + |
| 4 | +## Introduction |
| 5 | +Vertex AI SDK adds a new usability layer to the [Vertex SDK](https://cloud.google.com/python/docs/reference/aiplatform/latest) with the goal of substantially improving the traditional data-to-model workflows. It allows users to shift from thinking about the mechanics of calling Vertex services, to thinking in terms of building models and seamlessly interleaving the [Vertex SDK](https://cloud.google.com/python/docs/reference/aiplatform/latest) throughout their workflow. |
| 6 | + |
| 7 | + |
| 8 | +## Setup |
| 9 | +Vertex AI SDK is currently available in preview under `vertexai` package. Please install `google-cloud-aiplatform[preview]` using pip to enable full functionalities. |
| 10 | + |
| 11 | +We also recommend installing the package in a [virtualenv](https://virtualenv.pypa.io/en/latest/) using pip. [virtualenv](https://virtualenv.pypa.io/en/latest/) is a tool to create isolated Python environments and helps manage dependencies and versions, and indirectly permissions. |
| 12 | + |
| 13 | +### Mac/Linux |
| 14 | + |
| 15 | +```shell |
| 16 | +pip install virtualenv |
| 17 | +virtualenv <your-env> |
| 18 | +source <your-env>/bin/activate |
| 19 | +<your-env>/bin/pip install google-cloud-aiplatform[preview] |
| 20 | +``` |
| 21 | + |
| 22 | +### Windows |
| 23 | + |
| 24 | +```shell |
| 25 | +pip install virtualenv |
| 26 | +virtualenv <your-env> |
| 27 | +<your-env>\Scripts\activate |
| 28 | +<your-env>\Scripts\pip.exe install google-cloud-aiplatform[preview] |
| 29 | +``` |
| 30 | + |
| 31 | + |
| 32 | +## Remote training |
| 33 | +With the remote training feature in Vertex AI SDK, you can write your machine learning code as usual and then the code will be automatically executed on [Vertex AI CustomJob](https://cloud.google.com/vertex-ai/docs/training/create-custom-job) with few small changes. This feature allows you to seamlessly access Vertex resources while reducing the time needed to learn how to interact with Vertex services. |
| 34 | + |
| 35 | +### Supported frameworks |
| 36 | +1. scikit-learn |
| 37 | +2. tensorflow |
| 38 | +3. pytorch |
| 39 | +4. lightning |
| 40 | + |
| 41 | +### User journey |
| 42 | +```py |
| 43 | +import vertexai |
| 44 | +from sklearn.linear_model import LogisticRegression |
| 45 | + |
| 46 | +# Wrap classes to enable Vertex remote execution |
| 47 | +LogisticRegression = vertexai.preview.remote(LogisticRegression) |
| 48 | + |
| 49 | +# Init vertexai and switch to remote mode for training |
| 50 | +vertexai.init(project="my-project", location="my-location") |
| 51 | +vertexai.preview.init(remote=True) |
| 52 | + |
| 53 | +model = LogisticRegression() |
| 54 | + |
| 55 | +# Model will be trained on Vertex |
| 56 | +model.fit(X, y) |
| 57 | +``` |
| 58 | + |
| 59 | +### (Optional) Remote job configuration |
| 60 | +Vertex will help you set the remote job based on the model you use. But you can also customize those configurations, e.g., display name, staging bucket, machine type. |
| 61 | + |
| 62 | +```py |
| 63 | +# Set the config before training the model |
| 64 | +model.fit.vertex.remote_config.display_name = "my-sklearn-training" |
| 65 | +model.fit.vertex.remote_config.staging_bucket = "gs://my-bucket" |
| 66 | +model.fit.vertex.remote_config.machine_type = "n1-highmem-64" |
| 67 | + |
| 68 | +# Model will be trained on Vertex |
| 69 | +model.fit(X, y) |
| 70 | +``` |
| 71 | + |
| 72 | +[Here](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/preview/_workflow/shared/configs.py#L22-L104) is the full list of supported configurations. |
| 73 | + |
| 74 | + |
| 75 | +## Remote GPU training |
| 76 | +This is an extra feature on top of remote training. It allows you to remotely train supported models on GPU, even though you don't have any GPU resources in your local device. Please check [here](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/preview/_workflow/shared/configs.py#L63-L73) for more information. |
| 77 | + |
| 78 | +### Supported frameworks |
| 79 | +1. tensorflow |
| 80 | +2. pytorch |
| 81 | + |
| 82 | +### User journey |
| 83 | +```py |
| 84 | +import vertexai |
| 85 | +from tensorflow import keras |
| 86 | + |
| 87 | +# Wrap classes to enable Vertex remote execution |
| 88 | +keras.Sequential = vertexai.preview.remote(keras.Sequential) |
| 89 | + |
| 90 | +# Init vertexai and switch to remote mode for training |
| 91 | +vertexai.init(project="my-project", location="my-location") |
| 92 | +vertexai.preview.init(remote=True) |
| 93 | + |
| 94 | +# Instantiate model |
| 95 | +model = keras.Sequential( |
| 96 | + [keras.layers.Dense(5, input_shape=(4,)), keras.layers.Softmax()] |
| 97 | +) |
| 98 | +model.compile(optimizer="adam", loss="mean_squared_error") |
| 99 | + |
| 100 | +# Set `enable_cuda` to True in remote config |
| 101 | +model.fit.vertex.remote_config.enable_cuda = True |
| 102 | + |
| 103 | +# Model will be trained on Vertex with GPU |
| 104 | +model.fit(dataset, epochs=10) |
| 105 | +``` |
| 106 | + |
| 107 | + |
| 108 | +## Remote distributed training |
| 109 | +This feature extends remote training by enabling you to remotely train supported models on multi-worker CPU or GPU machines, regardless of the resources available in your local device. Please check [here](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/preview/_workflow/shared/configs.py#L74-L86) for more information. |
| 110 | + |
| 111 | +### Supported frameworks |
| 112 | +1. tensorflow |
| 113 | +2. pytorch |
| 114 | + |
| 115 | +### User journey |
| 116 | +```py |
| 117 | +import torch |
| 118 | +import vertexai |
| 119 | +from vertexai.preview import VertexModel |
| 120 | +from vertexai.preview.developer import remote_specs |
| 121 | + |
| 122 | +# Define the custom model with `VertexModel` to enable remote execution |
| 123 | +class TorchLogisticRegression(VertexModel, torch.nn.Module): |
| 124 | + |
| 125 | + def __init__(self, input_size:int, output_size:int): |
| 126 | + torch.nn.Module.__init__(self) |
| 127 | + VertexModel.__init__(self) |
| 128 | + self.linear = torch.nn.Linear(input_size, output_size) |
| 129 | + self.softmax = torch.nn.Softmax(dim=1) |
| 130 | + |
| 131 | + def forward(self, x): |
| 132 | + return self.softmax(self.linear(x)) |
| 133 | + |
| 134 | + # Add this train decorator to allow remote training |
| 135 | + @vertexai.preview.developer.mark.train() |
| 136 | + def train(self, dataloader, num_epochs, lr): |
| 137 | + # Add this line to enable distributed training for pytorch |
| 138 | + self = remote_specs.setup_pytorch_distributed_training(self) |
| 139 | + criterion = torch.nn.CrossEntropyLoss() |
| 140 | + optimizer = torch.optim.SGD(self.parameters(), lr=lr) |
| 141 | + |
| 142 | + for t in range(num_epochs): |
| 143 | + for idx, batch in enumerate(dataloader): |
| 144 | + device = next(self.parameters()).device |
| 145 | + x, y = batch[0].to(device), batch[1].to(device) |
| 146 | + optimizer.zero_grad() |
| 147 | + pred = self(x) |
| 148 | + loss = criterion(pred, y) |
| 149 | + loss.backward() |
| 150 | + optimizer.step() |
| 151 | + |
| 152 | + |
| 153 | +# Init vertexai and switch to remote mode for training |
| 154 | +vertexai.init(project="my-project", location="my-location") |
| 155 | +vertexai.preview.init(remote=True) |
| 156 | + |
| 157 | +# Instantiate model |
| 158 | +model = TorchLogisticRegression(4, 3) |
| 159 | + |
| 160 | +# Set `enable_distributed` to True in remote config |
| 161 | +model.fit.vertex.remote_config.enable_distributed = True |
| 162 | + |
| 163 | +# Model will be distributed trained on Vertex |
| 164 | +model.train(dataloader, num_epochs=100, lr=0.05) |
| 165 | +``` |
| 166 | + |
| 167 | +## Remote training with Autologging |
| 168 | +The [autologging feature](https://cloud.google.com/vertex-ai/docs/experiments/autolog-data) is available in remote training. Metrics (summary metrics, time series metrics, etc) and parameters for your remote training can be automatically logged into [Vertex Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments). This feature allows you to easily track, compare, and analyze your training runs with different setups. |
| 169 | + |
| 170 | +### User journey |
| 171 | +```py |
| 172 | +# Init with experiment and autolog |
| 173 | +vertexai.init( |
| 174 | + project="my-project", |
| 175 | + location="my-location", |
| 176 | + experiment="my-exp", |
| 177 | +) |
| 178 | +vertexai.preview.init(remote=True) |
| 179 | +... |
| 180 | +# Set a service account in remote config, this is required for autologging |
| 181 | +model.fit.vertex.remote_config.service_account = "GCE" |
| 182 | + |
| 183 | +# Model will be trained on Vertex with data automatically logged |
| 184 | +model.fit(X, y) |
| 185 | +``` |
| 186 | + |
| 187 | + |
| 188 | +## Uptraining |
| 189 | +Once you finish training a model, you may register your trained model to the [Vertex Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction). And then pulled the pretrained model from the model registry for up training with new input training data. |
| 190 | + |
| 191 | +### Supported frameworks |
| 192 | +1. scikit-learn |
| 193 | +2. tensorflow |
| 194 | +3. pytorch |
| 195 | + |
| 196 | +### User journey |
| 197 | + |
| 198 | +#### Register to Model Registry |
| 199 | +After a model is trained, you can register the model with `register()` method. It returns an [aiplatform.Model](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/models.py#L2610) object. |
| 200 | + |
| 201 | +```py |
| 202 | +# Model could be trained locally or remotely in previous steps |
| 203 | +registered_model = vertexai.preview.register(model) |
| 204 | +``` |
| 205 | + |
| 206 | +#### Pulled pre-trained model from Model Registry |
| 207 | +You can then use the `from_pretrained()` method with the resource name of an [aiplatform.Model](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/models.py#L2610) or an [aiplatdform.CustomJob](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/jobs.py#L1171) to retrieve the model object. |
| 208 | + |
| 209 | +```py |
| 210 | +# You can get the model resource name through SDK or Google Cloud console. |
| 211 | +pulled_model = vertexai.preview.from_pretrained( |
| 212 | + model_name=registered_model.resource_name |
| 213 | +) |
| 214 | +``` |
| 215 | + |
| 216 | +```py |
| 217 | +# You can get the custom job resource name through Google Cloud console. |
| 218 | +pulled_model = vertexai.preview.from_pretrained( |
| 219 | + custom_job_name="1234567890123456789" |
| 220 | +) |
| 221 | +``` |
| 222 | + |
| 223 | + |
| 224 | +#### Uptraining |
| 225 | +You can proceed with uptraining (remotely or locally) using new input training data. |
| 226 | + |
| 227 | +```py |
| 228 | +pulled_model.fit(X_retrain, y_retrain) |
| 229 | +``` |
| 230 | + |
| 231 | + |
| 232 | +## Remote training with Built-in Models |
| 233 | +In addition to custom machine learning code, we also support remote training with built-in algorithms through pre-built containers. You will be able to run training jobs on your data without having to write any code for a training application. |
| 234 | + |
| 235 | +### Supported frameworks |
| 236 | +1. [TabNetTrainer](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/preview/tabular_models/tabnet_trainer.py#L52) |
| 237 | + |
| 238 | +### User Journey |
| 239 | +```py |
| 240 | +import vertexai |
| 241 | +from vertexai.preview.tabular_models import TabNetTrainer |
| 242 | + |
| 243 | +# Init vertexai and switch to remote mode for training |
| 244 | +vertexai.init(project="my-project", location="my-location") |
| 245 | +vertexai.preview.init(remote=True) |
| 246 | + |
| 247 | +trainer = TabNetTrainer( |
| 248 | + model_type = "classification", |
| 249 | + target_column = "target", |
| 250 | + learning_rate = 0.01, |
| 251 | + max_train_secs = 1800, |
| 252 | +) |
| 253 | + |
| 254 | +trainer.fit(training_data, validation_data) |
| 255 | +``` |
| 256 | + |
| 257 | + |
| 258 | +## Vizier Hyperparameter Tuning |
| 259 | +Vertex AI SDK supports local and remote hyperparameter tuning using [Vertex AI Vizier](https://cloud.google.com/vertex-ai/docs/vizier/overview). Local tuning creates trials locally and runs training locally, meanwhile remote tuning creates trials locally and runs training remotely on [Vertex AI CustomJob](https://cloud.google.com/vertex-ai/docs/training/create-custom-job), with each CustomJob corresponding to one trial. |
| 260 | + |
| 261 | +### Supported frameworks |
| 262 | +1. scikit-learn |
| 263 | +2. tensorflow |
| 264 | +3. pytorch |
| 265 | +4. lightning |
| 266 | + |
| 267 | +### User journey |
| 268 | +```py |
| 269 | +import vertexai |
| 270 | + |
| 271 | +# Init vertexai and switch to remote mode for tuning |
| 272 | +vertexai.init(project="my-project", location="my-location") |
| 273 | +vertexai.preview.init(remote=True) |
| 274 | + |
| 275 | + |
| 276 | +def get_model_func(C: float): |
| 277 | + from sklearn.linear_model import LogisticRegression |
| 278 | + |
| 279 | + # Wrap the class to train models on Vertex |
| 280 | + LogisticRegression = vertexai.preview.remote(LogisticRegression) |
| 281 | + # Instantiate model. C will be tuned. |
| 282 | + model = LogisticRegression(C=C) |
| 283 | + |
| 284 | + return model |
| 285 | + |
| 286 | + |
| 287 | +tuner = VizierHyperparameterTuner( |
| 288 | + get_model_func=get_model_func, |
| 289 | + max_trial_count=4, |
| 290 | + parallel_trial_count=2, |
| 291 | + hparam_space=[ |
| 292 | + { |
| 293 | + "parameter_id": "C", |
| 294 | + "discrete_value_spec": {"values": [0.1, 0.5, 1.0]}, |
| 295 | + } |
| 296 | + ], |
| 297 | + metric_id="custom", |
| 298 | + metric_goal="MAXIMIZE", |
| 299 | +) |
| 300 | + |
| 301 | +# Tune model using Vizier. Tuning runs locally while trials run on Vertex. |
| 302 | +tuner.fit(X_train, y_train) |
| 303 | + |
| 304 | +# Get the best model after tuning is done. |
| 305 | +best_model = tuner.get_best_models()[0] |
| 306 | +``` |
| 307 | + |
| 308 | + |
| 309 | +## Sample Notebooks |
| 310 | +The notebooks below showcase the different usage of Vertex AI SDK. |
| 311 | + |
| 312 | +- [remote_training_sklearn.ipynb](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/vertex_ai_sdk/remote_training_sklearn.ipynb) |
| 313 | + - Remote training |
| 314 | + - Uptraining |
| 315 | + |
| 316 | +- [remote_training_pytorch.ipynb](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/vertex_ai_sdk/remote_training_pytorch.ipynb) |
| 317 | + - Remote training |
| 318 | + - Remote GPU training |
| 319 | + - Uptraining |
| 320 | + |
| 321 | +- [remote_training_tensorflow_with_autologging.ipynb](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/vertex_ai_sdk/remote_training_tensorflow_with_autologging.ipynb) |
| 322 | + - Remote training |
| 323 | + - Remote GPU training |
| 324 | + - Remote training with Autologging |
| 325 | + - Uptraining |
| 326 | + |
| 327 | +- [remote_training_lightning.ipynb](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/vertex_ai_sdk/remote_training_lightning.ipynb) |
| 328 | + - Remote training |
| 329 | + |
| 330 | +- [remote_hyperparameter_tuning.ipynb](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/vertex_ai_sdk/remote_hyperparameter_tuning.ipynb) |
| 331 | + - Vizier Hyperparameter Tuning |
0 commit comments