Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:
code_debug code_run
TTFT(s) 630.37 140
TPOT(s) 1.246 0.84
for k in range(GEN_LEN):
st = time.time()
input_ids = logits.argmax(dim=-1)
logits = llm.inference(input_ids=input_ids, position_ids=position_ids[:,PREFIX_LEN + k:PREFIX_LEN + k + 1])
output.append(input_ids.item())
en = time.time()
total_decode_time.append(en-st)
if input_ids.item() in config["eos"]:
break
TPOT = sum(total_decode_time) / len(total_decode_time)
We would like to know the reason of high latency and if there is any error in our implementation.
Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:
We would like to know the reason of high latency and if there is any error in our implementation.