EKS AI/ML intro section (#1241)

dnutels · web-flow · commit 98472fd947d9 · 2026-05-08T18:15:48.000-05:00
Authored-by: Dmitry Nutels &lt;dmitryn@amazon.com&gt;
diff --git a/latest/ug/ml/machine-learning-on-eks.adoc b/latest/ug/ml/machine-learning-on-eks.adoc
@@ -10,64 +10,59 @@ include::../attributes.txt[]
 Complete guide for running Machine Learning applications on Amazon EKS. This includes everything from provisioning infrastructure to choosing and deploying Machine Learning workloads on Amazon EKS.
 --
 
-Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes platform that empowers organizations to deploy, manage, and scale AI and machine learning (ML) workloads with unparalleled flexibility and control. Built on the open source Kubernetes ecosystem, EKS lets you harness your existing Kubernetes expertise, while integrating seamlessly with open source tools and {aws} services.
+Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes platform that empowers organizations to deploy, manage, and scale AI and machine learning (ML) workloads with unparalleled flexibility and control. Because EKS is built on the open source Kubernetes ecosystem, you can apply your existing Kubernetes expertise while integrating seamlessly with open source tools and {aws} services.
 
-Whether you’re training large-scale models, running real-time online inference, or deploying generative AI applications, EKS delivers the performance, scalability, and cost efficiency your AI/ML projects demand.
+Whether you're training large-scale models, running real-time online inference, or deploying generative AI applications, EKS delivers the performance, scalability, and cost efficiency your AI/ML projects demand.
 
+## Why use Amazon EKS for AI/ML
 
-## Why Choose EKS for AI/ML?
+Amazon EKS provides the control, integrations, performance, and scalability needed for AI/ML projects. Built on the open source Kubernetes ecosystem and integrated with {aws} services, Amazon EKS helps you use existing Kubernetes expertise while orchestrating complex workloads. For teams new to AI/ML deployments, existing Kubernetes skills transfer without steep learning curves.
 
-EKS is a managed Kubernetes platform that helps you deploy and manage complex AI/ML workloads.
-Built on the open source Kubernetes ecosystem, it integrates with {aws} services, providing the control and scalability needed for advanced projects.
-For teams new to AI/ML deployments, existing Kubernetes skills transfer directly, allowing efficient orchestration of multiple workloads.
+Amazon EKS supports everything from operating system customizations to compute scaling, and promotes technological flexibility that preserves choice for future infrastructure decisions. The platform provides the performance and tuning options that AI/ML workloads require, including the following features:
 
-EKS supports everything from operating system customizations to compute scaling, and its open source foundation promotes technological flexibility, preserving choice for future infrastructure decisions.
-The platform provides the performance and tuning options AI/ML workloads require, supporting features such as:
-
-* Full cluster control to fine-tune costs and configurations without hidden abstractions
-* Sub-second latency for real-time inference workloads in production
-* Advanced customizations like multi-instance GPUs, multi-cloud strategies, and OS-level tuning
-* Ability to centralize workloads using EKS as a unified orchestrator across AI/ML pipelines
+- **Full cluster control**: Fine-tune costs and configurations without hidden abstractions.
+- **Sub-second latency**: Run real-time inference workloads in production.
+- **Advanced customizations**: Configure multi-instance GPUs, network tuning, and operating system-level tuning.
+- **Unified orchestration**: Orchestrate across AI/ML pipelines and on-premise, edge, and cloud environments.
+- **Cost optimization**: Use auto scaling, native GPU scheduling, and diverse GPU and accelerator instance types.
 
 ## Key use cases
 
-Amazon EKS provides a robust platform for a wide range of AI/ML workloads, supporting various technologies and deployment patterns:
+Amazon EKS supports a wide range of AI/ML workloads, including the following common use cases:
 
-* **Real-time (online) inference:** EKS powers immediate predictions on incoming data, such as fraud detection, with sub-second latency using tools like https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-torchserve.html[TorchServe],
-https://aws.amazon.com/blogs/containers/quora-3x-faster-machine-learning-25-lower-costs-with-nvidia-triton-on-amazon-eks/[Triton Inference Server], and https://kserve.github.io/website/0.8/get_started/first_isvc/[KServe] on Amazon EC2 https://aws.amazon.com/ec2/instance-types/inf1/[Inf1]
-and https://aws.amazon.com/ec2/instance-types/inf2/[Inf2] instances.
-These workloads benefit from dynamic scaling with https://karpenter.sh/[Karpenter] and https://keda.sh/[KEDA], while leveraging https://aws.amazon.com/efs/[Amazon EFS] for model sharding across pods.
-https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache-creating-rule.html[Amazon ECR Pull Through Cache (PTC)] accelerates model updates,
-and https://aws.amazon.com/bottlerocket/[Bottlerocket] data volumes with https://docs.aws.amazon.com/ebs/latest/userguide/what-is-ebs.html[Amazon EBS]-optimized volumes ensure fast data access.
+- **Inference**: Self-host models on Amazon EKS for use cases that require low-latency response times.
+- **Batch inference**: Process large datasets efficiently through scheduled jobs.
+- **Model training**: Train complex models on large datasets over extended periods of time.
+- **Model fine-tuning**: Enhance open source models with proprietary domain knowledge.
+- **Retrieval augmented generation (RAG) pipelines**: Integrate retrieval and generation processes.
+- **Agentic AI**: Deploy agents with models hosted on Amazon Bedrock, third parties, or on Amazon EKS.
 
-* **General model training:** Organizations leverage EKS to train complex models on large datasets over extended periods using the https://www.kubeflow.org/docs/components/trainer/[Kubeflow Training Operator],
-https://docs.ray.io/en/latest/serve/index.html[Ray Serve], and https://pytorch.org/docs/stable/distributed.elastic.html[Torch Distributed Elastic] on https://aws.amazon.com/ec2/instance-types/p4/[Amazon EC2 P4d] and https://aws.amazon.com/ec2/instance-types/trn1/[Amazon EC2 Trn1] instances.
-These workloads are supported by batch scheduling with tools like https://volcano.sh/en/#home_slider[Volcano],
-https://yunikorn.apache.org/[Yunikorn],
-and https://kueue.sigs.k8s.io/[Kueue].
-https://aws.amazon.com/efs/[Amazon EFS] enables sharing of model checkpoints, and https://aws.amazon.com/s3/[Amazon S3] handles model import/export with lifecycle policies for version management.
+## Case studies
 
-* **Retrieval augmented generation (RAG) pipelines:** EKS manages customer support chatbots and similar applications by integrating retrieval and generation processes. These workloads often use tools like https://argoproj.github.io/workflows/[Argo Workflows] and https://www.kubeflow.org/[Kubeflow] for orchestration, vector databases like https://www.pinecone.io/blog/serverless/[Pinecone], https://weaviate.io/[Weaviate], or https://aws.amazon.com/opensearch-service/[Amazon OpenSearch], and expose applications to users via the 
-<<aws-load-balancer-controller,Application Load Balancer Controller (LBC)>>. https://docs.nvidia.com/nim/index.html[NVIDIA NIM] optimizes GPU utilization, while <<prometheus,Prometheus>> and https://aws.amazon.com/grafana/[Grafana] monitor resource usage.
+Customers select Amazon EKS for various reasons, such as optimizing GPU usage or running inference workloads with sub-second latency, as demonstrated in the following case studies. For a list of all case studies for Amazon EKS, see [{aws} Customer Success Stories](https://aws.amazon.com/solutions/case-studies/).
 
-* **Generative AI model deployment:** Companies deploy real-time content creation services on EKS, such as text or image generation, using https://docs.ray.io/en/latest/serve/index.html[Ray Serve], https://github.com/vllm-project/vllm[vLLM], and https://aws.amazon.com/blogs/containers/quora-3x-faster-machine-learning-25-lower-costs-with-nvidia-triton-on-amazon-eks/[Triton Inference Server] on Amazon https://aws.amazon.com/ec2/instance-types/g5/[EC2 G5] and https://aws.amazon.com/ai/machine-learning/inferentia/[Inferentia] accelerators. These deployments optimize performance and memory utilization for large-scale models. https://jupyter.org/hub[JupyterHub] enables iterative development, https://www.gradio.app/[Gradio] provides simple web interfaces, and the <<s3-csi,S3 Mountpoint CSI Driver>> allows mounting S3 buckets as file systems for accessing large model files.
+- [Unitary](https://aws.amazon.com/solutions/case-studies/unitary-eks-case-study/?did=cr_card&trk=cr_card) processes 26 million videos daily using AI for content moderation. The company requires high-throughput, low-latency inference and achieved an 80% reduction in container boot times, which ensures fast response to scaling events as traffic fluctuates.
+- [Synthesia](https://aws.amazon.com/solutions/case-studies/synthesia-case-study/?did=cr_card&trk=cr_card) offers generative AI video creation as a service for customers to create realistic videos from text prompts. The company achieved a 30x improvement in ML model training throughput.
+- [Ada Support](https://aws.amazon.com/solutions/case-studies/ada-support-eks-case-study/), an AI-powered customer service automation company, achieved a 15% reduction in compute costs alongside a 30% increase in compute efficiency.
+- [Snorkel AI](https://aws.amazon.com/blogs/startups/how-snorkel-ai-achieved-over-40-cost-savings-by-scaling-machine-learning-workloads-using-amazon-eks/) equips enterprises to build and adapt foundation models and large language models. The company achieved over 40% cost savings by implementing intelligent scaling mechanisms for GPU resources.
+- [Artera](https://aws.amazon.com/solutions/case-studies/artera-case-study/) uses Amazon Elastic File System (EFS) and Amazon EKS to train ML models that personalize cancer treatment using high-resolution biopsy images.
+- [Anthropic](https://aws.amazon.com/blogs/containers/amazon-eks-enables-ultra-scale-ai-ml-workloads-with-support-for-100k-nodes-per-cluster/) runs their flagship Claude family of foundation models on Amazon EKS and operates some of the largest EKS clusters in production, consisting of {aws} Trainium (trn2) instances and NVIDIA GPUs for AI workloads alongside {aws} Graviton processors for CPU intensive data processing.
 
-* **Batch (offline) inference:** Organizations process large datasets efficiently through scheduled jobs with https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html[{aws} Batch] or https://volcano.sh/en/docs/schduler_introduction/[Volcano]. These workloads often use https://aws.amazon.com/ec2/instance-types/inf1/[Inf1] and https://aws.amazon.com/ec2/instance-types/inf2/[Inf2] EC2 instances for {aws} https://aws.amazon.com/ai/machine-learning/inferentia/[Inferentia] chips, Amazon EC2 https://aws.amazon.com/ec2/instance-types/g4/[G4dn] instances for NVIDIA T4 GPUs, or https://aws.amazon.com/ec2/instance-types/c5/[c5] and https://aws.amazon.com/ec2/instance-types/c6i[c6i] CPU instances, maximizing resource utilization during off-peak hours for analytics tasks. The https://aws.amazon.com/ai/machine-learning/neuron/[{aws} Neuron SDK] and NVIDIA GPU drivers optimize performance, while MIG/TS enables GPU sharing. Storage solutions include https://aws.amazon.com/s3/[Amazon S3] and Amazon https://aws.amazon.com/efs/[EFS] and https://aws.amazon.com/fsx/lustre/[FSx for Lustre], with CSI drivers for various storage classes. Model management leverages tools like https://www.kubeflow.org/docs/components/pipelines/[Kubeflow Pipelines], https://argoproj.github.io/workflows/[Argo Workflows], and https://docs.ray.io/en/latest/cluster/getting-started.html[Ray Cluster], while monitoring is handled by <<prometheus,Prometheus>>, https://aws.amazon.com/grafana/[Grafana] and custom model monitoring tools.
+## Guide structure
 
-## Case studies
+The guide includes a series of hands-on guides you can follow step-by-step to deploy and manage AI/ML workloads on Amazon EKS. Each guide provides instructions and configurations you can implement directly in your environment.
 
-Customers choose Amazon EKS for various reasons, such as optimizing GPU usage or running real-time inference workloads with sub-second latency, as demonstrated in the following case studies. For a list of all case studies for Amazon EKS, see https://aws.amazon.com/solutions/case-studies/browse-customer-success-stories/?refid=cr_card&customer-references-cards.sort-by=item.additionalFields.sortDate&customer-references-cards.sort-order=desc&awsf.customer-references-location=*all&awsf.customer-references-industry=*all&awsf.customer-references-use-case=*all&awsf.language=language%23english&awsf.customer-references-segment=*all&awsf.content-type=*all&awsf.customer-references-product=product%23eks&awsm.page-customer-references-cards=1[{aws} Customer Success Stories].
+Alongside the instructions, the guide provides the necessary background and foundational concepts for each topic. It also includes the links to relevant {aws} documentation and resources for required deeper technical details.
 
-* https://aws.amazon.com/solutions/case-studies/unitary-eks-case-study/?did=cr_card&trk=cr_card[Unitary] processes 26 million videos daily using AI for content moderation, requiring high-throughput, low-latency inference and have achieved an 80% reduction in container boot times, ensuring fast response to scaling events as traffic fluctuates.
-* https://aws.amazon.com/solutions/case-studies/miro-eks-case-study/[Miro], the visual collaboration platform supporting 70 million users worldwide, reported an 80% reduction in compute costs compared to their previous self-managed Kubernetes clusters. 
-* https://aws.amazon.com/solutions/case-studies/synthesia-case-study/?did=cr_card&trk=cr_card[Synthesia], which offers generative AI video creation as a service for customers to create realistic videos from text prompts, achieved a 30x improvement in ML model training throughput.
-* https://aws.amazon.com/solutions/case-studies/harri-eks-case-study/?did=cr_card&trk=cr_card[Harri], providing HR technology for the hospitality industry, achieved 90% faster scaling in response to spikes in demand and reduced its compute costs by 30% by migrating to https://aws.amazon.com/ec2/graviton/[{aws} Graviton processors].
-* https://aws.amazon.com/solutions/case-studies/ada-support-eks-case-study/[Ada Support], an AI-powered customer service automation company, achieved a 15% reduction in compute costs alongside a 30% increase in compute efficiency. 
-* https://aws.amazon.com/blogs/startups/how-snorkel-ai-achieved-over-40-cost-savings-by-scaling-machine-learning-workloads-using-amazon-eks/[Snorkel AI], which equips enterprises to build and adapt foundation models and large language models, achieved over 40% cost savings by implementing intelligent scaling mechanisms for their GPU resources. 
+## Start using AI/ML on Amazon EKS
 
-== Start using Machine Learning on EKS
+To begin planning for and using AI/ML platforms and workloads on Amazon EKS, create an Amazon EKS cluster (link needed), including the required Kubernetes components, in your {aws} account.
+Once your environment is up and running, you can continue to the next steps:
 
-To begin planning for and using Machine Learning platforms and workloads on EKS on the {aws} cloud, proceed to the <<ml-resources>> section.
+- [Real-time inference](https://docs.aws.amazon.com/eks/latest/userguide/ml-realtime-inference.html): Use Amazon EKS to deploy, configure, and start using an inference application with a large language model (LLM).
+- [Cluster configuration](https://docs.aws.amazon.com/eks/latest/userguide/ml-cluster-configuration.html): Configure Amazon EKS clusters optimized for AI/ML workloads.
+- [Capacity management](https://docs.aws.amazon.com/eks/latest/userguide/ml-compute-management.html): Manage and optimize compute resources for machine learning workloads on Amazon EKS.
+- [Device management](https://docs.aws.amazon.com/eks/latest/userguide/device-management.html): Manage specialized hardware devices using Dynamic Resource Allocation (DRA) and device plugins.
 
 include::ml-realtime-inference.adoc[leveloffset=+1]
 
@@ -80,4 +75,3 @@ include::device-management.adoc[leveloffset=+1]
 include::ml-recipes.adoc[leveloffset=+1]
 
 include::ml-resources.adoc[leveloffset=+1]
-