(completions)
OpenAI's API completions v1 endpoint
This function processes completion requests by using the chat completions endpoint.
Returns a Response containing either:
- A streaming SSE connection for real-time completions
- A single JSON response for non-streaming completions
Returns an error status code if:
- The request processing fails
- The streaming/non-streaming handlers encounter errors
- The underlying inference service returns an error
from atoma_sdk import AtomaSDK
import os
with AtomaSDK(
bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:
res = as_client.completions.create(model="meta-llama/Llama-3.3-70B-Instruct", prompt=[
"<value>",
"<value>",
], frequency_penalty=0, logit_bias={
"1234567890": 0.5,
"1234567891": -0.5,
}, logprobs=1, n=1, presence_penalty=0, seed=123, stop=[
"json([\"stop\", \"halt\"])",
], stream=False, suffix="json(\"\n\")", temperature=0.7, top_p=1, user="user-1234")
# Handle response
print(res)| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
model |
str | ✔️ | ID of the model to use | meta-llama/Llama-3.3-70B-Instruct |
prompt |
models.CompletionsPrompt | ✔️ | N/A | |
best_of |
OptionalNullable[int] | ➖ | N/A | 1 |
echo |
OptionalNullable[bool] | ➖ | N/A | false |
frequency_penalty |
OptionalNullable[float] | ➖ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far |
0 |
logit_bias |
Dict[str, float] | ➖ | Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. |
{ "1234567890": 0.5, "1234567891": -0.5 } |
logprobs |
OptionalNullable[int] | ➖ | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. | 1 |
max_tokens |
OptionalNullable[int] | ➖ | The maximum number of tokens to generate in the chat completion | 4096 |
n |
OptionalNullable[int] | ➖ | How many chat completion choices to generate for each input message | 1 |
presence_penalty |
OptionalNullable[float] | ➖ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far |
0 |
seed |
OptionalNullable[int] | ➖ | If specified, our system will make a best effort to sample deterministically | 123 |
stop |
List[str] | ➖ | Up to 4 sequences where the API will stop generating further tokens | json(["stop", "halt"]) |
stream |
OptionalNullable[bool] | ➖ | Whether to stream back partial progress | false |
stream_options |
OptionalNullable[models.StreamOptions] | ➖ | N/A | |
suffix |
OptionalNullable[str] | ➖ | The suffix that comes after a completion of inserted text. | json("\n") |
temperature |
OptionalNullable[float] | ➖ | What sampling temperature to use, between 0 and 2 | 0.7 |
top_p |
OptionalNullable[float] | ➖ | An alternative to sampling with temperature | 1 |
user |
OptionalNullable[str] | ➖ | A unique identifier representing your end-user | user-1234 |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| models.APIError | 4XX, 5XX | */* |
from atoma_sdk import AtomaSDK
import os
with AtomaSDK(
bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:
res = as_client.completions.stream(model="meta-llama/Llama-3.3-70B-Instruct", prompt="<value>", frequency_penalty=0, logit_bias={
"1234567890": 0.5,
"1234567891": -0.5,
}, logprobs=1, n=1, presence_penalty=0, seed=123, stop=[
"json([\"stop\", \"halt\"])",
], suffix="json(\"\n\")", temperature=0.7, top_p=1, user="user-1234")
with res as event_stream:
for event in event_stream:
# handle event
print(event, flush=True)| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
model |
str | ✔️ | ID of the model to use | meta-llama/Llama-3.3-70B-Instruct |
prompt |
models.CompletionsPrompt | ✔️ | N/A | |
best_of |
OptionalNullable[int] | ➖ | N/A | 1 |
echo |
OptionalNullable[bool] | ➖ | N/A | false |
frequency_penalty |
OptionalNullable[float] | ➖ | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far |
0 |
logit_bias |
Dict[str, float] | ➖ | Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. |
{ "1234567890": 0.5, "1234567891": -0.5 } |
logprobs |
OptionalNullable[int] | ➖ | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. | 1 |
max_tokens |
OptionalNullable[int] | ➖ | The maximum number of tokens to generate in the chat completion | 4096 |
n |
OptionalNullable[int] | ➖ | How many chat completion choices to generate for each input message | 1 |
presence_penalty |
OptionalNullable[float] | ➖ | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far |
0 |
seed |
OptionalNullable[int] | ➖ | If specified, our system will make a best effort to sample deterministically | 123 |
stop |
List[str] | ➖ | Up to 4 sequences where the API will stop generating further tokens | json(["stop", "halt"]) |
stream |
Optional[bool] | ➖ | Whether to stream back partial progress. Must be true for this request type. | |
stream_options |
OptionalNullable[models.StreamOptions] | ➖ | N/A | |
suffix |
OptionalNullable[str] | ➖ | The suffix that comes after a completion of inserted text. | json("\n") |
temperature |
OptionalNullable[float] | ➖ | What sampling temperature to use, between 0 and 2 | 0.7 |
top_p |
OptionalNullable[float] | ➖ | An alternative to sampling with temperature | 1 |
user |
OptionalNullable[str] | ➖ | A unique identifier representing your end-user | user-1234 |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| models.APIError | 4XX, 5XX | */* |