Feature/video transcriptions#589
Conversation
Decode mono 16k audio from the editor video, run Whisper via Transformers.js, and insert linked text annotations with timing and layout helpers. Adds Vite resolve shims for Node-only imports used by the model stack, optional leading-silence trim for the caption buffer, timeline gap reconciliation for auto-caption regions, and editor i18n. Raises Select content z-index so caption controls stay usable over the video surface. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds in-browser Whisper transcription types and a robust transcribeMono16kToSegments entrypoint (slice handling, dedupe, retries, Node-version masking), plus small UI tweaks: Sonner Toaster theme/class merging and removal of a Toaster prop. ChangesAuto-captions Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c431fe473
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
someone beat me to it lol - i was going to work on this soon. can you add more detailed description of the implementation - is this using whisper? also glad that you added a recording of it. I'll take a more closer look later in the week when I have some downtime. But thanks for working on this! |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (4)
src/components/video-editor/VideoEditor.tsx (1)
1022-1040: 💤 Low valuenit: reconcile runs for every annotation span change, including blur/non-caption ones
handleAnnotationSpanChangeis also wired up as the blur span handler (line 2253), and even regular text/figure annotations flow through here. for those,reconcileAutoCaptionTimelineGapsis presumably a no-op (only auto-caption siblings produce gaps), but it still walks the whole array on every span tweak — and span changes can fire pretty frequently during drag.cheap shortcut: only reconcile when the edited region is actually
annotationSource === "auto-caption"(or when the array contains any). totally optional, won't move any numbers on small projects, just leaving it here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/video-editor/VideoEditor.tsx` around lines 1022 - 1040, handleAnnotationSpanChange currently always calls reconcileAutoCaptionTimelineGaps after updating annotationRegions, causing unnecessary full-array work on every drag/blur; change it to only run reconciliation when relevant: inside the pushState update, determine whether the edited region (matched by id) has annotationSource === "auto-caption" (or short-circuit if any region in prev.annotationRegions already has annotationSource === "auto-caption"), and only call reconcileAutoCaptionTimelineGaps(next) in that case; otherwise return { annotationRegions: next } directly. Refer to handleAnnotationSpanChange, pushState, annotationRegions, reconcileAutoCaptionTimelineGaps and the region.id / annotationSource fields to locate the change.src/lib/captioning/leadingSilence.ts (1)
66-83: 💤 Low valuenit: could combine the two maps into one pass
lines 71-81 do two separate maps when you could do both operations in one:
return regions .map((r) => ({ ...r, startMs: Math.max(0, r.startMs - trimMs), endMs: Math.max(0, r.endMs - trimMs), })) .filter((r) => r.endMs > r.startMs);current version is totally fine tho, just a tiny micro-optimization that probably doesn't matter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/captioning/leadingSilence.ts` around lines 66 - 83, The two successive .map passes in shiftTrimRegionsMsForCaptionBuffer can be combined into a single pass: in the mapping for each TrimRegion (refer to function shiftTrimRegionsMsForCaptionBuffer and parameter trimMs) compute startMs and endMs as Math.max(0, r.startMs - trimMs) and Math.max(0, r.endMs - trimMs) respectively, return the updated object, then keep the existing .filter(r => r.endMs > r.startMs); this removes the extra iteration while preserving behavior.src/lib/captioning/extractMono16k.ts (1)
87-103: 💤 Low valuedurationSec after truncation might be confusing
when
truncatedis true, you're returning the originaldurationSecbut thesamplesarray is shorter. this could trip up downstream code that assumessamples.length / 16000 ≈ durationSec.maybe rename to
originalDurationSecor return the actual truncated duration? or add a comment explaining why original is preserved.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/captioning/extractMono16k.ts` around lines 87 - 103, The truncateAndResampleTo16k function currently returns the original durationSec even when audio was truncated, which can mislead callers; change the return value to reflect the actual post-truncation duration (e.g., compute durationSec = samples.length / 16000 or set to Math.min(originalDurationSec, MAX_CAPTION_AUDIO_SEC) after resampling) and update any callers expecting the old semantics (or alternatively rename the original parameter to originalDurationSec if you prefer preserving both values) — locate this logic in truncateAndResampleTo16k and adjust the returned durationSec to match the truncated/resampled samples.src/lib/captioning/extractMono16kWebDemuxer.ts (1)
45-74: 💤 Low valueframes are consumed (closed) by this function
line 65 closes each frame after processing. if the caller still needs these frames, this will break. might want to add a comment or rename to something like
mergeAndConsumeDecodedAudioto make it explicit.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/captioning/extractMono16kWebDemuxer.ts` around lines 45 - 74, The function mergeDecodedAudioToMonoLinear currently calls frame.close() and thus consumes frames; change its API to make consumption explicit by adding a boolean parameter (e.g., consumeFrames = false) to mergeDecodedAudioToMonoLinear and only call frame.close() when consumeFrames is true, or alternatively rename to mergeAndConsumeDecodedAudio if you want a consuming variant; update any callers to pass the flag or use the renamed function and add a short comment noting the behavior; reference mergeDecodedAudioToMonoLinear, the frame.close() call, and audioDataFrameToMono when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/components/video-editor/VideoEditor.tsx`:
- Around line 1835-1852: The retry path transcribes the full `samples` buffer
but the post-processing always adds `trimSec` to `segmentsRaw`, causing retryed
segments to be double-shifted; change the logic around the
`transcribeMono16kToSegments` call(s) so you track which buffer produced
`segmentsRaw` (e.g., a boolean like `usedTrimmed = true/false` when calling with
`speechSamples` vs `samples`) and only add `trimSec` to segment timestamps when
`usedTrimmed` is true; update the block that builds `segments` (the mapping that
adjusts `startSec`/`endSec`) to consult that flag instead of unconditionally
checking `trimSec`, leaving the retry results unshifted when they came from the
untouched `samples`.
In `@src/i18n/locales/ar/editor.json`:
- Around line 45-62: The autoCaptions block currently contains English strings;
replace each value under "autoCaptions" (keys: button, dialogTitle,
dialogDescription, minWords, maxWords, wordsCount, generate, dialogCancel,
generating, loadingModel, busy, done, noneHeard, noAudio, failed, truncated)
with Arabic translations—for example use: button/dialogTitle: "التسميات
التوضيحية التلقائية", dialogDescription: "اختر تقريبا كم عدد الكلمات التي تظهر
في كل تسمية توضيحية. يتم توزيع التوقيت عبر الكلمات في تلك العبارة.", minWords:
"الحد الأدنى من الكلمات لكل تسمية", maxWords: "الحد الأقصى من الكلمات لكل
تسمية", wordsCount: "{{count}} كلمة", generate: "توليد", dialogCancel: "إلغاء",
generating: "جارٍ توليد التسميات من الصوت…", loadingModel: "جارٍ تحميل نموذج
الكلام (سيتم تنزيل ~75 ميغابايت عند الاستخدام الأول)…", busy: "توليد التسميات
قيد التنفيذ بالفعل.", done: "تمت إضافة {{count}} تسمية.", noneHeard: "لم يتم
الكشف عن أي كلام.", noAudio: "لا يحتوي هذا الفيديو على صوت صالح للنسخ.", failed:
"تعذّر توليد التسميات.", truncated: "تم نسخ الدقائق الأولى فقط: {{minutes}}
دقيقة."; keep the keys unchanged so localization lookup (autoCaptions) continues
to work.
In `@src/i18n/locales/es/editor.json`:
- Around line 45-62: Replace the English strings under the "autoCaptions" object
with Spanish translations for every key (button, dialogTitle, dialogDescription,
minWords, maxWords, wordsCount, generate, dialogCancel, generating,
loadingModel, busy, done, noneHeard, noAudio, failed, truncated); update the
values so UI shows Spanish text (e.g., "Auto captions" -> "Subtítulos
automáticos", "Generate" -> "Generar", "Cancel" -> "Cancelar", status/error
messages and the templated strings like "{{count}} words" and "Only the first
{{minutes}} minutes were transcribed." should be translated preserving the
placeholders like {{count}} and {{minutes}). Ensure punctuation and ellipses
match original intent (e.g., "Generating captions from audio…" keeps ellipsis)
and verify plural/template grammar in wordsCount/done/truncated.
In `@src/i18n/locales/fr/editor.json`:
- Around line 45-62: Translate every English string inside the "autoCaptions"
object to French while preserving all JSON keys and interpolation placeholders
(e.g., {{count}}, {{minutes}}), punctuation, ellipses and parentheses; update
values for "button", "dialogTitle", "dialogDescription", "minWords", "maxWords",
"wordsCount", "generate", "dialogCancel", "generating", "loadingModel" (keep the
"~75 MB" note and parentheses), "busy", "done", "noneHeard", "noAudio",
"failed", and "truncated" so the UI displays fully French text for the
autoCaptions section.
In `@src/i18n/locales/ja-JP/editor.json`:
- Around line 45-62: The autoCaptions block in the ja-JP editor locale is still
English; translate each value for keys under "autoCaptions" (e.g., "button",
"dialogTitle", "dialogDescription", "minWords", "maxWords", "wordsCount",
"generate", "dialogCancel", "generating", "loadingModel", "busy", "done",
"noneHeard", "noAudio", "failed", "truncated") into natural Japanese and replace
the English strings in the "autoCaptions" object in
src/i18n/locales/ja-JP/editor.json so the caption-generation UI displays
Japanese text for Japanese users.
In `@src/i18n/locales/ko-KR/editor.json`:
- Around line 45-62: The autoCaptions translation block currently contains
English strings; update each key under "autoCaptions" (button, dialogTitle,
dialogDescription, minWords, maxWords, wordsCount, generate, dialogCancel,
generating, loadingModel, busy, done, noneHeard, noAudio, failed, truncated)
with proper Korean translations preserving interpolation tokens like {{count}}
and {{minutes}} and ellipses; ensure punctuation and spacing match locale
conventions and that message semantics (e.g., informational vs. error) are
preserved.
In `@src/i18n/locales/tr/editor.json`:
- Around line 45-62: Replace the English strings under the autoCaptions object
with Turkish translations so the UI is fully localized; update the values for
keys button, dialogTitle, dialogDescription, minWords, maxWords, wordsCount,
generate, dialogCancel, generating, loadingModel, busy, done, noneHeard,
noAudio, failed, and truncated by providing appropriate Turkish text (e.g.,
"Otomatik altyazılar" for titles/buttons, Turkish sentences for descriptions and
status messages, and localized plural form for wordsCount like "{{count}}
kelime" and minutes in truncated like "{{minutes}} dakika"). Ensure the existing
key names remain unchanged and preserve interpolation tokens like {{count}} and
{{minutes}} exactly as they appear.
In `@src/i18n/locales/zh-CN/editor.json`:
- Around line 45-62: The autoCaptions locale block is still in English; replace
each string value under the "autoCaptions" object (keys: button, dialogTitle,
dialogDescription, minWords, maxWords, wordsCount, generate, dialogCancel,
generating, loadingModel, busy, done, noneHeard, noAudio, failed, truncated)
with proper Simplified Chinese translations, preserving all interpolation tokens
like {{count}} and {{minutes}} and punctuation/ellipsis, and do not change the
key names or JSON structure so the app consumes the translated strings
correctly.
In `@src/i18n/locales/zh-TW/editor.json`:
- Around line 45-62: The autoCaptions translation block is still in English
across locales; update the "autoCaptions" object (keys: button, dialogTitle,
dialogDescription, minWords, maxWords, wordsCount, generate, dialogCancel,
generating, loadingModel, busy, done, noneHeard, noAudio, failed, truncated) in
every non-English locale file (ar, es, fr, ja-JP, ko-KR, tr, zh-CN, zh-TW) with
proper localized strings matching the tone and placeholders (e.g., keep
{{count}} and {{minutes}} intact), ensure grammar/placeholder placement is
correct for each language, and run/verify the i18n validation script to catch
missing keys or formatting issues.
In `@src/lib/captioning/annotationsFromCaptions.ts`:
- Around line 535-546: The initial call to
dedupeAdjacentCaptionRepeats(segments) is removing legitimate repeated tokens
before line assembly in word mode; stop deduping the raw input and instead run
dedupe only after grouping. Remove the pre-group dedupe (the dedupedIn variable)
and pass the original segments into groupPhraseCaptionSegmentsIntoLines or
groupTimedCaptionWordsIntoLines, then call dedupeAdjacentCaptionRepeats on the
grouped result (as you already do) before finalizeCaptionSegmentsForPlayback;
keep finalizeCaptionSegmentsForPlayback unchanged.
In `@src/lib/captioning/transcribe.ts`:
- Around line 125-137: The raw chunk dedupe logic in transcribe.ts (variables:
segments, rawDeduped, CHUNK_DUP_MAX_GAP_SEC) is too aggressive and collapses
repeated words separated by small gaps; change the merge condition so
identical-text segments are only merged when they truly overlap or immediately
touch (e.g., require seg.startSec <= prev.endSec) instead of allowing a
configurable gap (remove or set CHUNK_DUP_MAX_GAP_SEC to 0), keeping the rest of
the merge logic (updating prev.startSec/prev.endSec) intact.
---
Nitpick comments:
In `@src/components/video-editor/VideoEditor.tsx`:
- Around line 1022-1040: handleAnnotationSpanChange currently always calls
reconcileAutoCaptionTimelineGaps after updating annotationRegions, causing
unnecessary full-array work on every drag/blur; change it to only run
reconciliation when relevant: inside the pushState update, determine whether the
edited region (matched by id) has annotationSource === "auto-caption" (or
short-circuit if any region in prev.annotationRegions already has
annotationSource === "auto-caption"), and only call
reconcileAutoCaptionTimelineGaps(next) in that case; otherwise return {
annotationRegions: next } directly. Refer to handleAnnotationSpanChange,
pushState, annotationRegions, reconcileAutoCaptionTimelineGaps and the region.id
/ annotationSource fields to locate the change.
In `@src/lib/captioning/extractMono16k.ts`:
- Around line 87-103: The truncateAndResampleTo16k function currently returns
the original durationSec even when audio was truncated, which can mislead
callers; change the return value to reflect the actual post-truncation duration
(e.g., compute durationSec = samples.length / 16000 or set to
Math.min(originalDurationSec, MAX_CAPTION_AUDIO_SEC) after resampling) and
update any callers expecting the old semantics (or alternatively rename the
original parameter to originalDurationSec if you prefer preserving both values)
— locate this logic in truncateAndResampleTo16k and adjust the returned
durationSec to match the truncated/resampled samples.
In `@src/lib/captioning/extractMono16kWebDemuxer.ts`:
- Around line 45-74: The function mergeDecodedAudioToMonoLinear currently calls
frame.close() and thus consumes frames; change its API to make consumption
explicit by adding a boolean parameter (e.g., consumeFrames = false) to
mergeDecodedAudioToMonoLinear and only call frame.close() when consumeFrames is
true, or alternatively rename to mergeAndConsumeDecodedAudio if you want a
consuming variant; update any callers to pass the flag or use the renamed
function and add a short comment noting the behavior; reference
mergeDecodedAudioToMonoLinear, the frame.close() call, and audioDataFrameToMono
when making the change.
In `@src/lib/captioning/leadingSilence.ts`:
- Around line 66-83: The two successive .map passes in
shiftTrimRegionsMsForCaptionBuffer can be combined into a single pass: in the
mapping for each TrimRegion (refer to function
shiftTrimRegionsMsForCaptionBuffer and parameter trimMs) compute startMs and
endMs as Math.max(0, r.startMs - trimMs) and Math.max(0, r.endMs - trimMs)
respectively, return the updated object, then keep the existing .filter(r =>
r.endMs > r.startMs); this removes the extra iteration while preserving
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fc01dfd4-7cce-43c9-9fa2-e07e89630c21
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (25)
package.jsonsrc/components/ui/select.tsxsrc/components/video-editor/VideoEditor.tsxsrc/components/video-editor/projectPersistence.tssrc/components/video-editor/types.tssrc/i18n/locales/ar/editor.jsonsrc/i18n/locales/en/editor.jsonsrc/i18n/locales/es/editor.jsonsrc/i18n/locales/fr/editor.jsonsrc/i18n/locales/ja-JP/editor.jsonsrc/i18n/locales/ko-KR/editor.jsonsrc/i18n/locales/tr/editor.jsonsrc/i18n/locales/zh-CN/editor.jsonsrc/i18n/locales/zh-TW/editor.jsonsrc/lib/captioning/annotationsFromCaptions.test.tssrc/lib/captioning/annotationsFromCaptions.tssrc/lib/captioning/captionConstants.tssrc/lib/captioning/extractMono16k.tssrc/lib/captioning/extractMono16kWebDemuxer.tssrc/lib/captioning/index.tssrc/lib/captioning/leadingSilence.tssrc/lib/captioning/transcribe.tssrc/lib/vite-stubs/empty-node-module.tssrc/lib/vite-stubs/onnxruntime-node-stub.tsvite.config.ts
- Gate trimSec segment shift when transcription used full buffer retry - Restore UTF-8 autoCaptions strings (ar, es, fr, ja-JP, ko-KR, tr, zh-CN, zh-TW) - Dedupe caption segments after grouping only; stricter chunk dedupe in transcribe - Post-truncate duration, explicit consume merge for web demuxer, trim region shift cleanup - Tests for caption annotation pipeline Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/captioning/annotationsFromCaptions.ts`:
- Around line 459-466: The single-span (group length === 1) case currently only
inserts line breaks and leaves the full original time range intact, so
phrase-mode captions never get time-sliced; change the branch handling the lone
span (the `only` variable in annotationsFromCaptions.ts) to call
wrapCaptionTextByWordBounds(only.text, minWords, maxWords) to produce an array
of caption segments, then split the time range (only.startSec..only.endSec)
evenly across those segments and push one caption object per segment with the
segment text and its computed startSec/endSec; ensure you still use minWords and
maxWords and preserve the existing out.push shape.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8fcd92b9-825c-454f-bcb1-804d9aa1f23d
📒 Files selected for processing (15)
src/components/video-editor/VideoEditor.tsxsrc/i18n/locales/ar/editor.jsonsrc/i18n/locales/es/editor.jsonsrc/i18n/locales/fr/editor.jsonsrc/i18n/locales/ja-JP/editor.jsonsrc/i18n/locales/ko-KR/editor.jsonsrc/i18n/locales/tr/editor.jsonsrc/i18n/locales/zh-CN/editor.jsonsrc/i18n/locales/zh-TW/editor.jsonsrc/lib/captioning/annotationsFromCaptions.test.tssrc/lib/captioning/annotationsFromCaptions.tssrc/lib/captioning/extractMono16k.tssrc/lib/captioning/extractMono16kWebDemuxer.tssrc/lib/captioning/leadingSilence.tssrc/lib/captioning/transcribe.ts
✅ Files skipped from review due to trivial changes (5)
- src/i18n/locales/tr/editor.json
- src/i18n/locales/ko-KR/editor.json
- src/i18n/locales/ar/editor.json
- src/i18n/locales/zh-CN/editor.json
- src/i18n/locales/fr/editor.json
🚧 Files skipped from review as they are similar to previous changes (8)
- src/i18n/locales/ja-JP/editor.json
- src/lib/captioning/extractMono16kWebDemuxer.ts
- src/i18n/locales/zh-TW/editor.json
- src/lib/captioning/annotationsFromCaptions.test.ts
- src/lib/captioning/extractMono16k.ts
- src/lib/captioning/leadingSilence.ts
- src/lib/captioning/transcribe.ts
- src/components/video-editor/VideoEditor.tsx
Phrase mode with one merged span now splits wrapCaptionTextByWordBounds lines into separate CaptionSegments with even time allocation across the phrase span (fallback when duration is too short for min spans). Update unit test accordingly. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@siddharthvaddem Thanks for the review, no problem hahah. I have updated the PR: expanded the description with how the pipeline works (yes, it uses Whisper tiny via Transformers.js in the browser), and pushed the follow-up fixes we discussed. Appreciate you taking a closer look when you have time. |
Merge siddharthvaddem/openscreen main and resolve conflicts in VideoEditor.tsx (export diagnostics, save-diagnostic handler, auto captions, toolbar chrome) and package-lock.json. Apply Biome formatting after merge so hooks pass. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/components/video-editor/VideoEditor.tsx (1)
1300-1311: 💤 Low valuethree near-identical sync handlers — fine for now, but a tiny helper would save future eyes
each of
handleAnnotationStyleChange/handleAnnotationPositionChange/handleAnnotationSizeChangeruns the same dance: find the touched region, checkannotationSource === "auto-caption", then map-with-conditional. it works, and the empty-find case correctly no-ops becausesyncAutoCaptionsbecomes false. nothing actually broken here.if you ever touch this surface again, something like:
function applyToCaptionFamily<R extends AnnotationRegion>( regions: R[], id: string, patch: (region: R) => R, ): R[] { const touched = regions.find((r) => r.id === id); const syncAll = touched?.annotationSource === "auto-caption"; return regions.map((region) => { if (syncAll && region.annotationSource === "auto-caption") return patch(region); return region.id === id ? patch(region) : region; }); }would collapse all three handlers to one-liners. low-priority though, totally fine to defer.
Also applies to: 1374-1385, 1392-1403
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/video-editor/VideoEditor.tsx` around lines 1300 - 1311, Extract the repeated map/find logic into a small helper (e.g., applyToCaptionFamily) that accepts regions, id, and a patch function and implements the described syncAutoCaptions behavior; then replace the duplicate blocks inside handleAnnotationStyleChange, handleAnnotationPositionChange, and handleAnnotationSizeChange (the pushState callers where you currently find touched and map regions) to call applyToCaptionFamily(prev.annotationRegions, id, region => ({ ...region, style: { ...region.style, ...style } }) or the appropriate patch for position/size), returning the new annotationRegions array from pushState.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/components/video-editor/VideoEditor.tsx`:
- Around line 1300-1311: Extract the repeated map/find logic into a small helper
(e.g., applyToCaptionFamily) that accepts regions, id, and a patch function and
implements the described syncAutoCaptions behavior; then replace the duplicate
blocks inside handleAnnotationStyleChange, handleAnnotationPositionChange, and
handleAnnotationSizeChange (the pushState callers where you currently find
touched and map regions) to call applyToCaptionFamily(prev.annotationRegions,
id, region => ({ ...region, style: { ...region.style, ...style } }) or the
appropriate patch for position/size), returning the new annotationRegions array
from pushState.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bdcd45dd-e161-4b47-a334-66ed990eb1d2
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (4)
package.jsonsrc/components/video-editor/VideoEditor.tsxsrc/components/video-editor/projectPersistence.tssrc/components/video-editor/types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/components/video-editor/types.ts
- package.json
|
This is a very cool feature, nice work ! |
Use one Sonner id for the full caption flow, show a distinct transcribing step, yield before Whisper so updates paint, match editor dark chrome on toasts, and keep pointer-events only on toast bodies. Add transcribing strings across locales. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Hey @FabLrc I just added in the toast notifcations updates for the transcriptions. Screen.Recording.2026-05-15.122648.mp4 |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/captioning/transcribe.ts`:
- Around line 298-302: The fallback path currently calls transcribeOne(true,
...) which bypasses trimming and can reintroduce captions inside user-trimmed
regions; modify the logic after each retry (the transcribeOne(true, true,
timestampMode) and transcribeOne(true, false, timestampMode) calls) to reapply
the existing trimRegions filter (or call the same trimming utility used
elsewhere) to the returned segments before assigning to segments, or
alternatively split the retry so it only disables silence-based trimming but
still enforces trimRegions; ensure to reference transcribeOne, segments, trims
and the trimRegions trimming function/utility when making the change.
- Around line 242-253: After each awaited transcriber call (calls to
runTranscriberOnSlice in the short-audio branch that uses
padTailSliceForTranscribe and in the multi-slice branch that builds/pushes
segments), check the AbortSignal (signal.aborted) immediately before any
post-processing (extractChunksFromAsrResult, segmentsFromTranscriberChunks, or
pushing results). If the signal is aborted, stop processing and propagate
cancellation (throw an AbortError or rethrow a cancellation) instead of
returning/pushing segments so cancellation cannot resolve with captions; update
the short-audio path around TRANSCRIBE_SLICE_SAMPLES and the multi-slice path
that handles pushing slices accordingly.
- Around line 38-56: withoutNodeVersion currently removes process.versions.node
for the entire transcription Promise which can hide Node for unrelated async
work; change the usage so that withoutNodeVersion only wraps the ORT
initialization steps (the minimal synchronous/async calls that detect Node)
instead of the whole transcription flow: refactor callers to call
withoutNodeVersion around the ORT init function/block (replace broad fn usage
with a short-lived wrapper around the ORT init routine) or, if multiple
overlapping transcriptions must be prevented, implement a simple mutex/semaphore
around the ORT init path so only one transcription can clear node-version at a
time; keep the withoutNodeVersion helper the same but limit its scope to the ORT
init symbol(s) rather than the full transcription function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 26da2b9b-1a54-4b39-b904-e33ec145f12d
📒 Files selected for processing (13)
src/App.tsxsrc/components/ui/sonner.tsxsrc/components/video-editor/VideoEditor.tsxsrc/i18n/locales/ar/editor.jsonsrc/i18n/locales/en/editor.jsonsrc/i18n/locales/es/editor.jsonsrc/i18n/locales/fr/editor.jsonsrc/i18n/locales/ja-JP/editor.jsonsrc/i18n/locales/ko-KR/editor.jsonsrc/i18n/locales/tr/editor.jsonsrc/i18n/locales/zh-CN/editor.jsonsrc/i18n/locales/zh-TW/editor.jsonsrc/lib/captioning/transcribe.ts
✅ Files skipped from review due to trivial changes (4)
- src/i18n/locales/ko-KR/editor.json
- src/i18n/locales/zh-TW/editor.json
- src/i18n/locales/es/editor.json
- src/i18n/locales/ar/editor.json
🚧 Files skipped from review as they are similar to previous changes (6)
- src/i18n/locales/fr/editor.json
- src/i18n/locales/zh-CN/editor.json
- src/i18n/locales/tr/editor.json
- src/i18n/locales/ja-JP/editor.json
- src/i18n/locales/en/editor.json
- src/components/video-editor/VideoEditor.tsx
Scope withoutNodeVersion to Transformers import/pipeline only; reapply trim-region filtering after ignore-trims retries; check AbortSignal after each slice inference before chunk processing. Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
||
| const trims = options?.trimRegions ?? []; | ||
| options?.onStatus?.("transcribe"); | ||
| if (options?.signal?.aborted) throw new DOMException("Aborted", "AbortError"); |
There was a problem hiding this comment.
do you need both checks on 248 and 250 for throwing these errors?
|
|
||
| if (options?.signal?.aborted) throw new DOMException("Aborted", "AbortError"); | ||
|
|
||
| await yieldForUiPaint(); |
There was a problem hiding this comment.
Why run whisper on the main renderer thread? Is there a way to do this either through web worker so that it doesn't block UI?
There was a problem hiding this comment.
Yeah fair enough, i will move it.
There was a problem hiding this comment.
can you also move auto captioning button to timeline?
|
@parse-nip I would like to get this in before it further diverges, making it a lot more work |
Pull Request Template
Description
Adds the new in-browser auto captions feature to OpenScreen. This generates caption annotations from recorded video audio inside the editor, makes them editable on the timeline, and includes follow-up fixes to improve timing accuracy and preserve visible gaps during pauses in speech.
How it works (implementation)
@xenova/transformers), using theXenova/whisper-tinyautomatic-speech-recognition model (Whisper “tiny” weights). The first run downloads model assets (~75 MB) to the client; later runs reuse the cache.decodeAudioDatawhen the container is supported; otherwise it uses the same web-demuxer +AudioDecoderstack as export. Long clips are truncated to a configured max duration for captioning; duration metadata reflects the decoded slice.Motivation
OpenScreen did not have a built-in captioning workflow, so users had no way to generate subtitles directly inside the app. This change adds that feature and addresses the key usability issues that came up during testing: inaccurate timing, captions collapsing across pauses, and occasional empty transcription results on spoken clips.
Type of Change
Screenshots / Video
Video (if applicable):
Screen.Recording.2026-05-14.094520.mp4
Testing
Reviewer flow:
Start the app in dev mode with npm run dev
Open a recording with clear spoken audio
Generate auto captions from the editor
Verify caption annotations are created on the timeline
Verify captions disappear during real pauses and resume when speech starts again
Verify caption timing stays aligned with speech and does not collapse everything into one continuous run
Checklist
Thank you for contributing!
Summary by CodeRabbit
New Features
Localization
Bug Fixes