Fix/s2v sekotalk only#1194
Conversation
…Talk - wan_audio_runner: add 1080p tier to fixed_min_side resize mode - wan_audio_runner: read resize_mode/fixed_area from input_info first, fallback to config - wan_audio_runner: re-mux with original audio after local inference to replace 16kHz VARecorder output - input_info: add fixed_area field to S2VInputInfo
There was a problem hiding this comment.
Code Review
This pull request introduces support for 1080p resolution in the image resizing logic, refactors image resizing and spatial size resolution into helper methods to reduce duplication, and adds the fixed_area field to S2VInputInfo. Feedback highlights a potential bug in _get_image_resize_kwargs where a truthy UNSET value could bypass the fallback configuration, missing fixed_area definitions in related input classes (SekoTalkInputs and RS2VInputInfo), and a redundant cleanup workaround in run_main that should instead be resolved by making the finally block robust against uninitialized models.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def _get_image_resize_kwargs(self): | ||
| input_info = getattr(self, "input_info", None) | ||
| return { | ||
| "resize_mode": (getattr(input_info, "resize_mode", None) if input_info is not None else None) or self.config.get("resize_mode", "adaptive"), | ||
| "bucket_shape": self.config.get("bucket_shape", None), | ||
| "fixed_area": (getattr(input_info, "fixed_area", None) if input_info is not None else None) or self.config.get("fixed_area", None), | ||
| "fixed_shape": self.config.get("fixed_shape", None), | ||
| } |
There was a problem hiding this comment.
In _get_image_resize_kwargs, getattr(input_info, "resize_mode", None) can return UNSET (which is a truthy _UnsetType instance). Because UNSET is truthy, the expression getattr(...) or self.config.get(...) will evaluate to UNSET instead of falling back to the config default. This will subsequently cause an AssertionError in resize_image because UNSET is not a valid resize mode.
We should explicitly check if the attribute is UNSET or None before falling back.
| def _get_image_resize_kwargs(self): | |
| input_info = getattr(self, "input_info", None) | |
| return { | |
| "resize_mode": (getattr(input_info, "resize_mode", None) if input_info is not None else None) or self.config.get("resize_mode", "adaptive"), | |
| "bucket_shape": self.config.get("bucket_shape", None), | |
| "fixed_area": (getattr(input_info, "fixed_area", None) if input_info is not None else None) or self.config.get("fixed_area", None), | |
| "fixed_shape": self.config.get("fixed_shape", None), | |
| } | |
| def _get_image_resize_kwargs(self): | |
| input_info = getattr(self, "input_info", None) | |
| resize_mode = getattr(input_info, "resize_mode", None) if input_info is not None else None | |
| if resize_mode is None or resize_mode is UNSET: | |
| resize_mode = self.config.get("resize_mode", "adaptive") | |
| fixed_area = getattr(input_info, "fixed_area", None) if input_info is not None else None | |
| if fixed_area is None or fixed_area is UNSET: | |
| fixed_area = self.config.get("fixed_area", None) | |
| return { | |
| "resize_mode": resize_mode, | |
| "bucket_shape": self.config.get("bucket_shape", None), | |
| "fixed_area": fixed_area, | |
| "fixed_shape": self.config.get("fixed_shape", None), | |
| } |
| stream_config: dict = field(default_factory=dict) | ||
| # shape related | ||
| resize_mode: str = field(default_factory=str) | ||
| fixed_area: str = field(default_factory=str) |
There was a problem hiding this comment.
While fixed_area has been added to S2VInputInfo, it is missing from SekoTalkInputs (which is the actual class instantiated for s2v and rs2v tasks in init_input_info_from_args) and RS2VInputInfo. This omission means fixed_area cannot be passed or resolved correctly during runtime for these tasks.
Please add fixed_area: str | Any = UNSET to SekoTalkInputs and fixed_area: str = field(default_factory=str) to RS2VInputInfo to ensure compatibility.
| if self.va_controller.reader is None: | ||
| return super().run_main() | ||
| result = super().run_main() | ||
| # Stop VARecorder so ffmpeg finishes writing the file | ||
| if self.va_controller is not None: | ||
| self.va_controller.clear() | ||
| self.va_controller = None | ||
| return result |
There was a problem hiding this comment.
The manual cleanup of self.va_controller before returning result is redundant because the finally block at the end of run_main (line 793) is guaranteed to execute when the try block exits (including via a return statement).
However, the finally block currently contains a bug: if hasattr(self.model, "inputs"): will raise an AttributeError if self.model has not been initialized yet (which is the case when self.va_controller.reader is None and we return early). This crash in the finally block is likely why this manual cleanup was added as a workaround.
Instead of adding redundant cleanup here, please make the finally block robust (e.g., if getattr(self, "model", None) is not None and hasattr(self.model, "inputs"):) and keep this early return simple.
| if self.va_controller.reader is None: | |
| return super().run_main() | |
| result = super().run_main() | |
| # Stop VARecorder so ffmpeg finishes writing the file | |
| if self.va_controller is not None: | |
| self.va_controller.clear() | |
| self.va_controller = None | |
| return result | |
| if self.va_controller.reader is None: | |
| return super().run_main() |
support 1080p sekotalk