Skip to content

Commit 2e88d22

Browse files
ChenhanYuclaude
andcommitted
fix: support messages field in calibrate_draft_vocab and compute_hidden_states
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
1 parent dddfc6a commit 2e88d22

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ async def submit_generates():
201201
for entry in dataset:
202202
conversation_id = entry.get("conversation_id", entry.get("uuid"))
203203

204-
conversations = entry["conversations"]
204+
conversations = entry.get("messages") or entry["conversations"]
205205
if not conversations or not isinstance(conversations, list):
206206
num_invalid += 1
207207
continue

examples/speculative_decoding/scripts/calibrate_draft_vocab.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,9 @@ def main():
4949
tokenizer = AutoTokenizer.from_pretrained(args.model)
5050
with open(args.data) as f:
5151
lines = islice(f, args.calibrate_size) if args.calibrate_size else f
52-
conversations = [json.loads(line)["conversations"] for line in lines]
52+
conversations = [
53+
(d := json.loads(line)).get("messages") or d["conversations"] for line in lines
54+
]
5355
conversations = [item for sublist in conversations for item in sublist]
5456

5557
d2t = calibrate_frequent_vocab(tokenizer, conversations, args.draft_vocab_size)

0 commit comments

Comments
 (0)