Skip to content

Commit ab3b935

Browse files
committed
Adding tokenizer_chat_template_path to valid inputs for config
1 parent 6c8a1d4 commit ab3b935

2 files changed

Lines changed: 6 additions & 0 deletions

File tree

src/maxtext/configs/base.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -645,6 +645,8 @@ tokenizer_path: ""
645645
tokenizer_type: "sentencepiece" # Currently supporting: "tiktoken", "sentencepiece", "huggingface"
646646
use_chat_template: false
647647
chat_template_path: "" # path to chat template json file
648+
chat_template: "" # Chat template to use with HF tokenizers. It should be a valid Jinja2-formatted template.
649+
tokenizer_chat_template_path: "" # Path to a chat template file to be loaded into the tokenizer if missing.
648650
tokenize_train_data: true # false if the dataset is pre-tokenized
649651
tokenize_eval_data: true # false if the dataset is pre-tokenized
650652
add_bos: true

src/maxtext/configs/types.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1081,6 +1081,10 @@ class Tokenizer(BaseModel):
10811081
"",
10821082
description="Chat template to use with HF tokenizers. It should be a valid Jinja2-formatted template.",
10831083
)
1084+
tokenizer_chat_template_path: str = Field(
1085+
"",
1086+
description="Path to a chat template file to be loaded into the tokenizer if missing.",
1087+
)
10841088
tokenize_train_data: bool = Field(True, description="If False, assumes the training dataset is pre-tokenized.")
10851089
tokenize_eval_data: bool = Field(True, description="If False, assumes the evaluation dataset is pre-tokenized.")
10861090
add_bos: bool = Field(True, description="Whether to add a beginning-of-sentence token.")

0 commit comments

Comments
 (0)