Question in testing latency 

Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:
```
        code_debug    code_run 
TTFT(s)    630.37       140   
TPOT(s)    1.246       0.84
```
```
for k in range(GEN_LEN):
  st = time.time()
  input_ids = logits.argmax(dim=-1)
  logits = llm.inference(input_ids=input_ids, position_ids=position_ids[:,PREFIX_LEN + k:PREFIX_LEN + k + 1])
  output.append(input_ids.item())
  en = time.time()
  total_decode_time.append(en-st)
  if input_ids.item() in config["eos"]:
      break
TPOT = sum(total_decode_time) / len(total_decode_time)
```
We would like to know the reason of high latency and if there is any error in our implementation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question in testing latency #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question in testing latency #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions