Skip to content

Commit 964dc1d

Browse files
docs: add distributed XGBoost on Kubernetes tutorial (#12080)
Add comprehensive documentation for distributed XGBoost training on Kubernetes via Kubeflow Trainer. The tutorial covers: - Overview of the distributed architecture and Collective protocol - Environment variables (DMLC_*) injected by the XGBoost runtime plugin - Worker count calculation for CPU and GPU training - Prerequisites and installation verification - ClusterTrainingRuntime configuration - Example training code using Python SDK and kubectl YAML - Best practices: QuantileDMatrix, early stopping, checkpointing, logging, data partitioning, and rank-specific logic - Common issues and edge cases Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
1 parent 03c196f commit 964dc1d

1 file changed

Lines changed: 891 additions & 16 deletions

File tree

0 commit comments

Comments
 (0)