Skip to content

Latest commit

 

History

History
149 lines (112 loc) · 107 KB

File metadata and controls

149 lines (112 loc) · 107 KB

Completions

(completions)

Overview

OpenAI's API completions v1 endpoint

Available Operations

create

This function processes completion requests by using the chat completions endpoint.

Returns

Returns a Response containing either:

  • A streaming SSE connection for real-time completions
  • A single JSON response for non-streaming completions

Errors

Returns an error status code if:

  • The request processing fails
  • The streaming/non-streaming handlers encounter errors
  • The underlying inference service returns an error

Example Usage

from atoma_sdk import AtomaSDK
import os


with AtomaSDK(
    bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:

    res = as_client.completions.create(model="meta-llama/Llama-3.3-70B-Instruct", prompt=[
        "<value>",
        "<value>",
    ], frequency_penalty=0, logit_bias={
        "1234567890": 0.5,
        "1234567891": -0.5,
    }, logprobs=1, n=1, presence_penalty=0, seed=123, stop=[
        "json([\"stop\", \"halt\"])",
    ], stream=False, suffix="json(\"\n\")", temperature=0.7, top_p=1, user="user-1234")

    # Handle response
    print(res)

Parameters

Parameter Type Required Description Example
model str ✔️ ID of the model to use meta-llama/Llama-3.3-70B-Instruct
prompt models.CompletionsPrompt ✔️ N/A
best_of OptionalNullable[int] N/A 1
echo OptionalNullable[bool] N/A false
frequency_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on their
existing frequency in the text so far
0
logit_bias Dict[str, float] Modify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer)
to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits
generated by the model prior to sampling. The exact effect will vary per model, but values
between -1 and 1 should decrease or increase likelihood of selection; values like -100 or
100 should result in a ban or exclusive selection of the relevant token.
{
"1234567890": 0.5,
"1234567891": -0.5
}
logprobs OptionalNullable[int] An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. 1
max_tokens OptionalNullable[int] The maximum number of tokens to generate in the chat completion 4096
n OptionalNullable[int] How many chat completion choices to generate for each input message 1
presence_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on
whether they appear in the text so far
0
seed OptionalNullable[int] If specified, our system will make a best effort to sample deterministically 123
stop List[str] Up to 4 sequences where the API will stop generating further tokens json(["stop", "halt"])
stream OptionalNullable[bool] Whether to stream back partial progress false
stream_options OptionalNullable[models.StreamOptions] N/A
suffix OptionalNullable[str] The suffix that comes after a completion of inserted text. json("\n")
temperature OptionalNullable[float] What sampling temperature to use, between 0 and 2 0.7
top_p OptionalNullable[float] An alternative to sampling with temperature 1
user OptionalNullable[str] A unique identifier representing your end-user user-1234
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.CompletionsResponse

Errors

Error Type Status Code Content Type
models.APIError 4XX, 5XX */*

stream

Example Usage

from atoma_sdk import AtomaSDK
import os


with AtomaSDK(
    bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:

    res = as_client.completions.stream(model="meta-llama/Llama-3.3-70B-Instruct", prompt="<value>", frequency_penalty=0, logit_bias={
        "1234567890": 0.5,
        "1234567891": -0.5,
    }, logprobs=1, n=1, presence_penalty=0, seed=123, stop=[
        "json([\"stop\", \"halt\"])",
    ], suffix="json(\"\n\")", temperature=0.7, top_p=1, user="user-1234")

    with res as event_stream:
        for event in event_stream:
            # handle event
            print(event, flush=True)

Parameters

Parameter Type Required Description Example
model str ✔️ ID of the model to use meta-llama/Llama-3.3-70B-Instruct
prompt models.CompletionsPrompt ✔️ N/A
best_of OptionalNullable[int] N/A 1
echo OptionalNullable[bool] N/A false
frequency_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on their
existing frequency in the text so far
0
logit_bias Dict[str, float] Modify the likelihood of specified tokens appearing in the completion.

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer)
to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits
generated by the model prior to sampling. The exact effect will vary per model, but values
between -1 and 1 should decrease or increase likelihood of selection; values like -100 or
100 should result in a ban or exclusive selection of the relevant token.
{
"1234567890": 0.5,
"1234567891": -0.5
}
logprobs OptionalNullable[int] An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. 1
max_tokens OptionalNullable[int] The maximum number of tokens to generate in the chat completion 4096
n OptionalNullable[int] How many chat completion choices to generate for each input message 1
presence_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalize new tokens based on
whether they appear in the text so far
0
seed OptionalNullable[int] If specified, our system will make a best effort to sample deterministically 123
stop List[str] Up to 4 sequences where the API will stop generating further tokens json(["stop", "halt"])
stream Optional[bool] Whether to stream back partial progress. Must be true for this request type.
stream_options OptionalNullable[models.StreamOptions] N/A
suffix OptionalNullable[str] The suffix that comes after a completion of inserted text. json("\n")
temperature OptionalNullable[float] What sampling temperature to use, between 0 and 2 0.7
top_p OptionalNullable[float] An alternative to sampling with temperature 1
user OptionalNullable[str] A unique identifier representing your end-user user-1234
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

Union[eventstreaming.EventStream[models.CompletionsCreateStreamResponseBody], eventstreaming.EventStreamAsync[models.CompletionsCreateStreamResponseBody]]

Errors

Error Type Status Code Content Type
models.APIError 4XX, 5XX */*