update qwen-image text token length error#1042
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a mechanism to propagate error types from the inference worker to the API, allowing for more specific HTTP status codes like 413 (Payload Too Large) when token limits are exceeded. Feedback focuses on improving efficiency by avoiding redundant tokenizer calls, refining the broad mapping of all ValueError exceptions to HTTP 413, and addressing potentially unreachable error handling logic in the task creation endpoints.
| token_lengths = [len(ids) for ids in self.tokenizer(txt, add_special_tokens=True)["input_ids"]] | ||
| max_token_len = max(token_lengths) | ||
| if max_token_len > self.tokenizer_max_length + drop_idx: | ||
| raise ValueError(f"Input text token length ({max_token_len - drop_idx}) exceeds ({self.tokenizer_max_length}). Please shorten the input text.") |
There was a problem hiding this comment.
The tokenizer is called twice for the same input text: once here to calculate lengths and again on the following line to generate the tensors. This is inefficient, especially for large batches or long prompts.\n\nConsider calling the tokenizer once with padding=True and return_tensors="pt", then checking the length of the resulting input_ids tensor before moving it to the device.
| if error_type == "ValueError": | ||
| raise HTTPException(status_code=413, detail=error_detail) |
There was a problem hiding this comment.
Mapping all ValueError exceptions to HTTP 413 (Payload Too Large) is too broad. A ValueError can be raised for various reasons other than input length (e.g., invalid parameter values or unsupported configurations). This might lead to misleading error codes for the client. It is recommended to use a more specific exception type for token length violations or check the error message to ensure the status code is appropriate.
| if getattr(e, "original_error_type", "") == "ValueError": | ||
| raise HTTPException(status_code=413, detail=str(e)) |
There was a problem hiding this comment.
This check for ValueError in the RuntimeError handler appears to be unreachable for token length errors. Token length validation happens during inference in the worker process, and the resulting error is propagated via the task status, not as a direct exception from task_manager.create_task. This pattern is repeated in other task creation endpoints in this file.
No description provided.