Skip to content

Commit 91a31c6

Browse files
committed
fallback if we can't figure out the encoding from the model name
1 parent c700173 commit 91a31c6

1 file changed

Lines changed: 4 additions & 1 deletion

File tree

  • backend/danswer/natural_language_processing

backend/danswer/natural_language_processing/utils.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,10 @@ def __init__(self, model_name: str):
4444
if not hasattr(self, "encoder"):
4545
import tiktoken
4646

47-
self.encoder = tiktoken.encoding_for_model(model_name)
47+
try:
48+
self.encoder = tiktoken.encoding_for_model(model_name)
49+
except KeyError:
50+
self.encoder = tiktoken.get_encoding("cl100k_base")
4851

4952
def encode(self, string: str) -> list[int]:
5053
# this ignores special tokens that the model is trained on, see encode_ordinary for details

0 commit comments

Comments
 (0)