Skip to content

Allow frame stacking with text context#15585

Merged
rfejgin merged 1 commit intoNVIDIA-NeMo:mainfrom
rfejgin:magpietts_framestacking_textcontext_assert
Apr 7, 2026
Merged

Allow frame stacking with text context#15585
rfejgin merged 1 commit intoNVIDIA-NeMo:mainfrom
rfejgin:magpietts_framestacking_textcontext_assert

Conversation

@rfejgin
Copy link
Copy Markdown
Collaborator

@rfejgin rfejgin commented Apr 6, 2026

This change removes the restriction on combining frame stacking and text context.

We still need to be careful that the context's sequence length after stacking is long enough to accommodate the text context, which does not get stacked. So a check was added to verify that the context sequence length is the at least as long as what we'd get with the standard 5-second context and no stacking, a configuration that is known to (almost always) fit existing text contexts.

But still error out if the context length would be too short to fit commonly used text contexts.

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
@github-actions github-actions Bot added the TTS label Apr 6, 2026
@rfejgin rfejgin added Run CICD and removed TTS labels Apr 6, 2026
@rfejgin rfejgin marked this pull request as ready for review April 6, 2026 21:20
@rfejgin rfejgin requested review from blisc and paarthneekhara April 6, 2026 21:20
Copy link
Copy Markdown
Collaborator

@paarthneekhara paarthneekhara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Not for this PR, but we should probably take in a parameter specifying max text context length so that we accidentally dont give longer than ~105 tokens for text context. We can use that value for verification (instead of 5 seconds at 21FPS)

@rfejgin
Copy link
Copy Markdown
Collaborator Author

rfejgin commented Apr 7, 2026

LGTM. Not for this PR, but we should probably take in a parameter specifying max text context length so that we accidentally dont give longer than ~105 tokens for text context. We can use that value for verification (instead of 5 seconds at 21FPS)

I agree. I will create a JIRA issue to track that.

@rfejgin rfejgin merged commit cae54b3 into NVIDIA-NeMo:main Apr 7, 2026
131 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants