Completions

(completions)

Overview

OpenAI's API completions v1 endpoint

Available Operations

create - Create completions
stream

create

This function processes completion requests by using the chat completions endpoint.

Returns

Returns a Response containing either:

A streaming SSE connection for real-time completions
A single JSON response for non-streaming completions

Errors

Returns an error status code if:

The request processing fails
The streaming/non-streaming handlers encounter errors
The underlying inference service returns an error

Example Usage

from atoma_sdk import AtomaSDK
import os


with AtomaSDK(
    bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:

    res = as_client.completions.create(model="meta-llama/Llama-3.3-70B-Instruct", prompt=[
        "<value>",
        "<value>",
    ], frequency_penalty=0, logit_bias={
        "1234567890": 0.5,
        "1234567891": -0.5,
    }, logprobs=1, n=1, presence_penalty=0, seed=123, stop=[
        "json([\"stop\", \"halt\"])",
    ], stream=False, suffix="json(\"\n\")", temperature=0.7, top_p=1, user="user-1234")

    # Handle response
    print(res)

Parameters

Parameter	Type	Required	Description	Example
`model`	str	✔️	ID of the model to use	meta-llama/Llama-3.3-70B-Instruct
`prompt`	models.CompletionsPrompt	✔️	N/A
`best_of`	OptionalNullable[int]	➖	N/A	1
`echo`	OptionalNullable[bool]	➖	N/A	false
`frequency_penalty`	OptionalNullable[float]	➖	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far	0
`logit_bias`	Dict[str, float]	➖	Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.	{ "1234567890": 0.5, "1234567891": -0.5 }
`logprobs`	OptionalNullable[int]	➖	An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.	1
`max_tokens`	OptionalNullable[int]	➖	The maximum number of tokens to generate in the chat completion	4096
`n`	OptionalNullable[int]	➖	How many chat completion choices to generate for each input message	1
`presence_penalty`	OptionalNullable[float]	➖	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far	0
`seed`	OptionalNullable[int]	➖	If specified, our system will make a best effort to sample deterministically	123
`stop`	List[str]	➖	Up to 4 sequences where the API will stop generating further tokens	json(["stop", "halt"])
`stream`	OptionalNullable[bool]	➖	Whether to stream back partial progress	false
`stream_options`	OptionalNullable[models.StreamOptions]	➖	N/A
`suffix`	OptionalNullable[str]	➖	The suffix that comes after a completion of inserted text.	json("\n")
`temperature`	OptionalNullable[float]	➖	What sampling temperature to use, between 0 and 2	0.7
`top_p`	OptionalNullable[float]	➖	An alternative to sampling with temperature	1
`user`	OptionalNullable[str]	➖	A unique identifier representing your end-user	user-1234
`retries`	Optional[utils.RetryConfig]	➖	Configuration to override the default retry behavior of the client.

Response

models.CompletionsResponse

Errors

Error Type	Status Code	Content Type
models.APIError	4XX, 5XX	/

stream

Example Usage

from atoma_sdk import AtomaSDK
import os


with AtomaSDK(
    bearer_auth=os.getenv("ATOMASDK_BEARER_AUTH", ""),
) as as_client:

    res = as_client.completions.stream(model="meta-llama/Llama-3.3-70B-Instruct", prompt="<value>", frequency_penalty=0, logit_bias={
        "1234567890": 0.5,
        "1234567891": -0.5,
    }, logprobs=1, n=1, presence_penalty=0, seed=123, stop=[
        "json([\"stop\", \"halt\"])",
    ], suffix="json(\"\n\")", temperature=0.7, top_p=1, user="user-1234")

    with res as event_stream:
        for event in event_stream:
            # handle event
            print(event, flush=True)

Parameters

Parameter	Type	Required	Description	Example
`model`	str	✔️	ID of the model to use	meta-llama/Llama-3.3-70B-Instruct
`prompt`	models.CompletionsPrompt	✔️	N/A
`best_of`	OptionalNullable[int]	➖	N/A	1
`echo`	OptionalNullable[bool]	➖	N/A	false
`frequency_penalty`	OptionalNullable[float]	➖	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far	0
`logit_bias`	Dict[str, float]	➖	Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.	{ "1234567890": 0.5, "1234567891": -0.5 }
`logprobs`	OptionalNullable[int]	➖	An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.	1
`max_tokens`	OptionalNullable[int]	➖	The maximum number of tokens to generate in the chat completion	4096
`n`	OptionalNullable[int]	➖	How many chat completion choices to generate for each input message	1
`presence_penalty`	OptionalNullable[float]	➖	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far	0
`seed`	OptionalNullable[int]	➖	If specified, our system will make a best effort to sample deterministically	123
`stop`	List[str]	➖	Up to 4 sequences where the API will stop generating further tokens	json(["stop", "halt"])
`stream`	Optional[bool]	➖	Whether to stream back partial progress. Must be true for this request type.
`stream_options`	OptionalNullable[models.StreamOptions]	➖	N/A
`suffix`	OptionalNullable[str]	➖	The suffix that comes after a completion of inserted text.	json("\n")
`temperature`	OptionalNullable[float]	➖	What sampling temperature to use, between 0 and 2	0.7
`top_p`	OptionalNullable[float]	➖	An alternative to sampling with temperature	1
`user`	OptionalNullable[str]	➖	A unique identifier representing your end-user	user-1234
`retries`	Optional[utils.RetryConfig]	➖	Configuration to override the default retry behavior of the client.

Response

Union[eventstreaming.EventStream[models.CompletionsCreateStreamResponseBody], eventstreaming.EventStreamAsync[models.CompletionsCreateStreamResponseBody]]

Errors

Error Type	Status Code	Content Type
models.APIError	4XX, 5XX	/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completions

Overview

Available Operations

create

Returns

Errors

Example Usage

Parameters

Response

Errors

stream

Example Usage

Parameters

Response

Errors

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Completions

Overview

Available Operations

create

Returns

Errors

Example Usage

Parameters

Response

Errors

stream

Example Usage

Parameters

Response

Errors