Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion ai-quick-actions/model-deployment-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,11 @@ form to quickly deploy the model:

![Deploy Model](web_assets/deploy-model.png)

### Compute Shape
### Infrastructure

AQUA supports two types of infrastructure resources for deploying a single model: Compute Shape and Compute Target (Managed Computer Cluster). When deploying a model, you can either specify a compute shape or choose a compute target.

#### Compute Shape

The compute shape selection is critical, the list available is selected to be suitable for the
chosen model.
Expand All @@ -55,6 +59,21 @@ For a full list of shapes and their definitions see the [compute shape docs](htt
The relationship between model parameter size and GPU memory is roughly 2x parameter count in GB, so for example a model that has 7B parameters will need a minimum of 14 GB for inference. At runtime the
memory is used for both holding the weights, along with the concurrent contexts for the user's requests.

#### Compute Target

A Data Science Compute Target manages the underlying compute, networking, and Kubernetes infrastructure, ensuring security and compliance. Each Compute Target is supported by dedicated compute capacity within a multi-tenant Kubernetes environment.

To deploy a model on a managed compute cluster, you must first create a Compute Target and reference it in AQUA during model deployment creation. When select Compute Target as infrastructure, it's also required to specify the resource configurations, including the number of GPUs, OCPUs, and memory (in GB).

```bash
--compute_target_details '{"compute_target_id":"ocid1.datasciencecomputetargetint.oc1.iad.<ocid>", "gpu_count":2, "ocpus": "15", "memory_in_gbs": 240}'
```

For more details regarding compute target creation and required policy, refer to [Data Science Compute Target](<url_placeholder>).

**Note:** Currently AQUA only supports deploying service managed models on Compute Target.


#### Quantization Support

To deploy large language models efficiently on CPU-based compute shapes, AQUA provides quantization support. Quantization reduces the precision of model weights (e.g., from 16-bit to 4-bit), significantly lowering memory and compute requirements while maintaining good accuracy. This enables faster and more cost-effective model inference without requiring a GPU. Learn more about [how to configure and deploy models with quantization](quantization-tips.md).
Expand Down
2 changes: 2 additions & 0 deletions ai-quick-actions/policies/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ After the stack is created and its Stack details page opens, click Plan from the

Allow dynamic-group aqua-dynamic-group to inspect compartments in tenancy

Allow dynamic-group aqua-dynamic-group to manage data-science-compute-targets in compartment <your-compartment-name>

Allow dynamic-group aqua-dynamic-group to manage object-family in compartment <your-compartment-name> where any {target.bucket.name='<your-bucket-name>'}

Allow dynamic-group <dynamic-group-name> to read repos in compartment <your-compartment-name> where any {request.operation='ReadDockerRepositoryMetadata',request.operation='ReadDockerRepositoryManifest',request.operation='PullDockerLayer'}
Expand Down
2 changes: 2 additions & 0 deletions ai-quick-actions/policies/terraform/iam.tf
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ locals {
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to manage data-science-modelversionsets in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to read buckets in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to read objectstorage-namespaces in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to manage data-science-compute-targets in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to inspect compartments in tenancy",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to read repos in ${local.compartment_policy_string} where any {request.operation='ReadDockerRepositoryMetadata',request.operation='ReadDockerRepositoryManifest',request.operation='PullDockerLayer'}"
]:[]
Expand All @@ -66,6 +67,7 @@ locals {
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to manage data-science-modelversionsets in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to read buckets in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to read objectstorage-namespaces in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to manage data-science-compute-targets in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to inspect compartments in ${local.compartment_policy_string}",
"Allow dynamic-group id ${oci_identity_dynamic_group.aqua-dynamic-group[0].id} to read repos in ${local.compartment_policy_string} where any {request.operation='ReadDockerRepositoryMetadata',request.operation='ReadDockerRepositoryManifest',request.operation='PullDockerLayer'}"
]:[]
Expand Down