gcp_professional_machine_learning_engineer_exam_notes/gpu_vs_tpu_hardware.md at main · maheshkr-code/gcp_professional_machine_learning_engineer_exam_notes · GitHub

25 lines (18 loc) · 1.15 KB

In Google Cloud exams, hardware selection questions are often binary.

GPUs

Models with a significant number of custom PyTorch/JAX operations that must run at least partially on CPUs
Models with TensorFlow ops that are not available on Cloud TPU (see the list of available TensorFlow ops)
Medium-to-large models with larger effective batch sizes

TPUs

Models dominated by matrix computations
Models with no custom PyTorch/JAX operations inside the main training loop
Models that train for weeks or months
Large models with large effective batch sizes
Models with ultra-large embeddings common in advanced ranking and recommendation workloads

**Cloud TPUs are not suited to the following workloads: **

Linear algebra programs that require frequent branching or contain many element-wise algebra operations
Workloads that require high-precision arithmetic
Neural network workloads that contain custom operations in the main training loop

src: https://docs.cloud.google.com/tpu/docs/intro-to-tpu#when_to_use_tpus