You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A practical guide for configuring Cline to call OCI Generative AI models through OCI's OpenAI-compatible API.
4
4
5
+
Cline is an AI coding assistant that works inside your IDE and can help with day-to-day development tasks such as explaining unfamiliar code, generating new files, refactoring existing logic, writing tests, debugging errors, and summarizing repository structure. By connecting Cline to OCI Generative AI, developers can use OCI-hosted models directly from their coding environment while keeping model access, deployment choices, and enterprise controls within Oracle Cloud Infrastructure.
6
+
7
+
This setup is useful when teams want AI-assisted development workflows that can use either on-demand OCI Generative AI models for quick setup or Dedicated AI Cluster (DAC)-hosted models for production-grade isolation, performance, and customization.
8
+
5
9
## Overview
6
10
7
11
This tutorial walks through configuring Cline to use OCI Generative AI models through the OCI OpenAI-compatible API.
8
12
9
13
The process involves:
10
14
11
15
1. Selecting an OCI Generative AI model
12
-
2. Understanding the OCI OpenAI-compatible API endpoint URL
13
-
3. Creating an OCI Generative AI API key
14
-
4. Configuring Cline with the OCI OpenAI-compatible base URL, API key, and model ID
15
-
5. Testing the setup with an example prompt
16
+
2. Choosing whether to use the model on demand or from a Dedicated AI Cluster (DAC)
17
+
3. Understanding the OCI OpenAI-compatible API endpoint URL
18
+
4. Creating an OCI Generative AI API key
19
+
5. Configuring Cline with the OCI OpenAI-compatible base URL, API key, and model ID
20
+
6. Testing the setup with an example prompt
16
21
17
22
OCI provides OpenAI-compatible APIs for model inference, including Chat Completions and Responses. This tutorial uses the OCI Generative AI OpenAI-compatible API documented here:
18
23
@@ -42,21 +47,30 @@ Example regions include:
42
47
-`uk-london-1`
43
48
-`eu-frankfurt-1`
44
49
45
-
The region is used in the OpenAI-compatible base URL.
Select the OCI Generative AI model that you want Cline to use.
67
+
Select the OCI Generative AI model that you want Cline to use. OCI Generative AI models can be used either on demand or from a Dedicated AI Cluster (DAC). The Cline setup is different for each option, so choose the deployment path first.
68
+
69
+
#### On-Demand Models
70
+
71
+
On-demand models are shared, OCI-hosted models that are ready to call directly through the OpenAI-compatible API. This is the simplest setup for testing, prototyping, and lighter usage patterns.
58
72
59
-
Example model IDs:
73
+
Example OCI Model Names:
60
74
61
75
```text
62
76
xai.grok-code-fast-1
@@ -66,20 +80,54 @@ openai.gpt-oss-120b
66
80
google.gemini-2.5-flash
67
81
```
68
82
83
+
For on-demand models, the Cline **Model ID** is the OCI Model Name.
84
+
85
+
#### Dedicated AI Cluster (DAC)-Hosted Models
86
+
87
+
DAC-hosted models run on dedicated infrastructure in your tenancy. Use a DAC-hosted model when you need production-grade control over model hosting and inference. DACs provide several advantages:
88
+
89
+
-**Flexibility:** Import supported Hugging Face-format models from Hugging Face or Object Storage, test imported models with shorter commitments, choose fine-tuned or quantized versions, and right-size based on visible hardware specifications.
90
+
-**Isolation:** Run workloads on dedicated GPU resources inside your tenancy, which helps protect sensitive data, avoids shared-resource contention, and supports regulated workloads.
91
+
-**Predictable latency:** Dedicated infrastructure can provide more stable time-to-first-token and inference response times than shared model endpoints, especially for scaling production applications.
92
+
-**Fine-tuning support:** Host fine-tuned models alongside base models, run multiple fine-tuned models on a single cluster, and control model lifecycle and upgrade cadence.
93
+
-**Cost efficiency at scale:** For inference-heavy workloads, DACs can reduce effective price per token by keeping dedicated resources highly utilized and hosting multiple models on one cluster.
94
+
-**Deployment near data:** Deploy in supported OCI regions, including regulated regions where available, to support data residency, lower latency, and simpler security reviews.
95
+
-**Simplified management:** OCI manages the infrastructure while you manage model deployment, scaling, fine-tuning, and application integration.
96
+
97
+
Before configuring Cline for a DAC-hosted model, make sure the model endpoint is already created and active.
98
+
99
+
1. Open the OCI Console
100
+
2. Navigate to **Analytics & AI -> Generative AI**
101
+
3. Go to **Endpoints**
102
+
4. Confirm that the endpoint for your DAC-hosted model is **Active**
103
+
5. Keep note of the endpoint region and DAC endpoint OCID
104
+
105
+
For DAC-hosted models, the Cline **Model ID** is the DAC endpoint OCID.
106
+
107
+
```text
108
+
ocid1.generativeaiendpoint.<region>..<unique_id>
109
+
```
110
+
69
111
### 3. Understand the OCI OpenAI-Compatible API Endpoint URL
70
112
71
-
All calls from Cline go through the OCI OpenAI-compatible API endpoint URL:
113
+
Cline uses a different OCI OpenAI-compatible endpoint format depending on whether the model is on demand or DAC-hosted.
114
+
115
+
For on-demand models, configure the base URL **without**`/chat/completions`:
0 commit comments