Skip to content

Commit 3d701e6

Browse files
authored
fix: Update tiktoken package (#18)
1 parent 7f615c8 commit 3d701e6

10 files changed

Lines changed: 15360 additions & 3 deletions

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Change Log
22

3+
## 0.2.7 (2025-09-18)
4+
5+
- Update tiktoken to support the latest models.
6+
37
## 0.2.6 (2025-08-15)
48

59
- Fix issue with whitespace in the input parameters.

src/main.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,12 @@ async def create_files_from_dataset(
121121
data = [{key: get_nested_value(d, key) for key in actor_input.datasetFields} for d in data]
122122
data = [d for d in data if d]
123123

124-
if encoding := assistant and tiktoken.encoding_for_model(assistant.model) or None:
124+
if assistant:
125+
try:
126+
encoding = tiktoken.encoding_for_model(assistant.model)
127+
except KeyError:
128+
encoding = tiktoken.get_encoding("o200k_base")
129+
Actor.log.warning("Model %s not found. Using cl200k_base encoding", assistant.model)
125130
data = await split_data_if_required(data, encoding)
126131
else:
127132
data = [data]

src/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def split_data_into_batches(data: list, max_tokens: int, encoding: tiktoken.core
125125
126126
Example:
127127
>>> d = [{"name": "Alice"}, {"name": "Bob"}, {"name": "Carol"}]
128-
>>> enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
128+
>>> enc = tiktoken.encoding_for_model("gpt-5-mini")
129129
>>> batches = split_data_into_batches(d, 15, enc)
130130
>>> print(batches)
131131
[[{'name': 'Alice'}, {'name': 'Bob'}], [{'name': 'Carol'}]]

0 commit comments

Comments
 (0)