Skip to content

update qwen-image text token length error#1042

Merged
llmc-reviewer merged 1 commit intomainfrom
hhh
Apr 28, 2026
Merged

update qwen-image text token length error#1042
llmc-reviewer merged 1 commit intomainfrom
hhh

Conversation

@helloyongyang
Copy link
Copy Markdown
Contributor

No description provided.

@llmc-reviewer llmc-reviewer merged commit d203934 into main Apr 28, 2026
2 checks passed
@llmc-reviewer llmc-reviewer deleted the hhh branch April 28, 2026 09:20
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a mechanism to propagate error types from the inference worker to the API, allowing for more specific HTTP status codes like 413 (Payload Too Large) when token limits are exceeded. Feedback focuses on improving efficiency by avoiding redundant tokenizer calls, refining the broad mapping of all ValueError exceptions to HTTP 413, and addressing potentially unreachable error handling logic in the task creation endpoints.

Comment on lines +221 to +224
token_lengths = [len(ids) for ids in self.tokenizer(txt, add_special_tokens=True)["input_ids"]]
max_token_len = max(token_lengths)
if max_token_len > self.tokenizer_max_length + drop_idx:
raise ValueError(f"Input text token length ({max_token_len - drop_idx}) exceeds ({self.tokenizer_max_length}). Please shorten the input text.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tokenizer is called twice for the same input text: once here to calculate lengths and again on the following line to generate the tensors. This is inefficient, especially for large batches or long prompts.\n\nConsider calling the tokenizer once with padding=True and return_tensors="pt", then checking the length of the resulting input_ids tensor before moving it to the device.

Comment on lines +39 to +40
if error_type == "ValueError":
raise HTTPException(status_code=413, detail=error_detail)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Mapping all ValueError exceptions to HTTP 413 (Payload Too Large) is too broad. A ValueError can be raised for various reasons other than input length (e.g., invalid parameter values or unsupported configurations). This might lead to misleading error codes for the client. It is recommended to use a more specific exception type for token length violations or check the error message to ensure the status code is appropriate.

Comment on lines +115 to +116
if getattr(e, "original_error_type", "") == "ValueError":
raise HTTPException(status_code=413, detail=str(e))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check for ValueError in the RuntimeError handler appears to be unreachable for token length errors. Token length validation happens during inference in the worker process, and the resulting error is propagated via the task status, not as a direct exception from task_manager.create_task. This pattern is repeated in other task creation endpoints in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants