You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version: 1.7
Created: 2026-04-17 (updated 2026-04-17 after v1.24.0 ship)
Baseline: v1.24.0 (1,241 routes, 460 core modules, 7,689+ tests)
Source: Synthesised from an OSS survey of LosslessCut, auto-editor, editly,
Descript, Shotcut/MLT, Olive, OpenShot, Kdenlive, OpenTimelineIO, WhisperX,
PyAV, and a 2024–2026 scan of new SOTA AI video models (see research notes
under Sources at the bottom).
Scope: This document extends ROADMAP.md — the
original Wave 1–7 plan. Anything already covered there is not repeated
here; this is only the incremental work discovered after the v1.16.3
cross-project research pass.
Guiding Principles (carried forward)
Never break what works. Every wave ships independently.
One new required dependency per feature maximum. Prefer optional
pip extras with graceful degradation (@async_job + checks.py).
Permissive licences only. Apache-2, MIT, BSD, LGPL are fine.
CC-BY-NC, research-only, or unclear licences are deferred until the
author clarifies.
Match existing patterns — InterpResult / ComposeResult /
MEMixResult / PremixResult style: subscriptable dataclass, single
run() entry point, on_progress(pct, msg="") callback default-arg,
queue allowlist entry, check_X_available() guard.
Frontend parity last. CEP panel first, UXP second, CLI/MCP
third — but never on the same PR as the backend addition.
v1.17.0 — Shipped (2026-04-17)
#
Feature
Core Module
Routes
OSS Source
17.1
Neural Frame Interpolation — RIFE-NCNN-Vulkan CLI with FFmpeg minterpolate fallback. 3-pass doubling cap (8×). InterpResult dataclass.
core/neural_interp.py
GET /video/interpolate/backends, POST /video/interpolate/neural
Status: Merged. Queue allowlist updated. Lint clean. Blueprint
registered (enhancement_bp). Version synced to v1.17.0.
v1.18.0 — Shipped (2026-04-17)
First production batch from Wave A + Wave D. All 10 features built with graceful degradation via check_X_available() — routes return 503 MISSING_DEPENDENCY with install hints when optional backends are absent.
Status: Merged. 1,192 total routes (+15 vs v1.17.0). Lint clean on all new files. 8 new check_*_available() entries in opencut/checks.py. Queue allowlist extended.
v1.19.0 — Shipped (2026-04-17)
Second batch closes the remaining Wave A items plus Wave D2 restoration pack and the D3.2 webhook auto-emit hook. All 7 core modules ship with graceful degradation; every restoration module has either an ONNX path via a user-supplied checkpoint env var, or a lighter FFmpeg fallback.
Clears the remaining Wave D items identified in the initial research pass. Two items (D1.1 audio description, D6.2 auto-quiz) became LLM enrichments of existing modules rather than new-from-scratch duplicates — core/audio_description.py already had a complete template-based pipeline and core/quiz_overlay.py already had a TF-IDF generator.
Status: Merged. 1,207 total routes (+5 vs v1.19.1). Lint clean. 4 new check_*_available() entries. Queue allowlist extended for the three async Wave C routes.
Already-shipped Wave A items detected during the v1.18.0 pass (skipped, noted for reference)
A1.1 BS-RoFormer — already available via the backend="audio-separator" path of POST /audio/separate (models include bs_roformer, mel_band_roformer, scnet, mdx23c).
A1.2 Chatterbox TTS — already shipped in core/voice_gen.py::chatterbox_generate().
TransNetV2 shot-boundary — already referenced in CLAUDE.md; verify check_transnetv2_available() gate is wired and add as preferred backend over FFmpeg scene threshold.
core/scene_detect.py (promote TransNetV2 to default when installed)
ProRes on Windows via native encoder — once FFmpeg ships official prores_ks on Windows builds, re-verify our existing implementation and update presets.
FFmpeg built-in
LGPL
S
No new code — follow-up check.
Rejected or wait
MASt3R, MatAnyone — non-commercial research licences. Revisit
if the authors relicense.
HunyuanVideo — 80 GB VRAM excludes 99 % of users. Not worth
shipping as optional unless the distilled variant arrives.
SadTalker, LivePortrait, MODNet, Bark, Coqui XTTS-v2 —
superseded by Chatterbox / F5-TTS / BiRefNet / MuseTalk or the upstream
project is abandoned.
Sora, Gen-3, Veo, Kling — closed APIs. No OSS equivalent worth
tracking beyond LTX-Video / Wan 2.1 / CogVideoX.
PerceptualVMAF as a codec — it's a metric, not a codec. Already
usable through ab-av1 (Wave A4.1).
Wave D — Breadth Pass (Next 2 Weeks, parallelisable with Wave A)
Discovered during the niche / verticals / accessibility / developer
survey. All are small enough to ship alongside Wave A without delaying
it.
Unlocks blind-viewer market; no current OSS editor ships this turnkey.
D1.2
Broadcast compliance validator (full rulesets) — port Netflix / BBC / EBU-TT-D / FCC rule JSON into the existing caption_compliance.py, fail-fast before export.
Wedding event-moment finder — audio-energy + ML spike detector for "first kiss / first dance / ring exchange" timestamps. Plugs into existing silence.py / highlights.py infrastructure.
These items from the original Wave 3 / "Research & Strategic Gaps"
section remain open and are not replaced by this document — they
should ship in parallel with Wave A/B work:
GPU process isolation (Wave 3A, P0) — still unimplemented as of
v1.17.0. Every Wave B entry above assumes this lands first.
Mitigation path: @gpu_exclusive decorator + MAX_CONCURRENT_GPU_JOBS = 3
semaphore.
Wave K Tier 3: dub pipeline, trailer gen, IntelliScript, face age, slate ID, outpainting, VACE editing, sports highlights
K
2027-Q1
Wave H — Commercial Parity & Content-Creator Polish (v1.25.0, 2026-04-19)
Cross-project research pass against commercial editors (Opus Clip,
Descript, CapCut, ScreenStudio, Runway Gen-3, DaVinci 19+ Magic Mask,
Adobe Firefly Video) and GitHub projects that landed after the
April-2026 survey (FlashVSR, ROSE, Sammie-Roto-2, OmniVoice, ReEzSynth,
VidMuse, VideoAgent, ViMax, Hailuo 2.3, Seedance 2.0, GaussianHeadTalk,
FantasyTalking2). All three tiers in this wave ship together as v1.25.0
"shipped scaffolding" — Tier 1 lands as fully-working backend + panel
additions; Tier 2/3 AI-model features land as check_X_available()-
gated stubs returning 503 MISSING_DEPENDENCY with install hints,
matching the v1.18–1.20 pattern. Frontend wiring trails by one release.
Tier 1 — High ROI, Small Effort (content-creator polish)
#
Feature
Module (new)
Routes
OSS / Product Source
H1.1
Virality / hook score 0–100 — multimodal: transcript sentiment × audio energy peaks × visual salience. Ranks short-form clip candidates before shorts_pipeline picks one.
core/virality_score.py
POST /analyze/virality, POST /analyze/virality/rank
Opus Clip pattern
H1.2
Cursor-event auto-zoom — detect mouse-click timestamps from a screen-recording metadata side-car (or OpenCV diff-based cursor detection) and emit timeline-aligned zoom keyframes. Extends auto_zoom.py.
core/cursor_zoom.py
POST /video/cursor-zoom
ScreenStudio / Screen.Studio
H1.3
Eye-gaze correction — MediaPipe face-mesh keypoint rotation to fake camera gaze for teleprompter reads.
QE API reflection probe — call qe.reflect.methods at startup, surface the result through GET /system/qe-reflect. Unlocks undocumented Premiere 2025+ APIs.
host/index.jsx::ocQeReflect + routes/system.py
GET /system/qe-reflect
vakago-tools.com
Tier 3 — Strategic Bets (stub + research note)
#
Feature
Module (stub)
Routes
Source
H3.1
VideoAgent / ViMax — agentic LLM-routed search across indexed footage + auto-storyboard from a script.
core/video_agent.py
POST /agent/search-footage, POST /agent/storyboard
Hailuo 2.3 / Seedance 2.0 — commercial gen-video backends (closed-weights, HTTP API). Alternative to LTX-Video / Wan 2.1 for higher quality at the cost of cloud dependency.
core/gen_video_cloud.py
POST /generate/cloud/submit, GET /generate/cloud/status/<id>, GET /generate/cloud/backends
hailuo-02.com, seed.bytedance.com
H3.3
GaussianHeadTalk / FantasyTalking2 — wobble-free talking-head alternatives to LatentSync/MuseTalk for higher-end dubbing.
core/lipsync_advanced.py
POST /lipsync/gaussian, POST /lipsync/fantasy2, GET /lipsync/advanced/backends
WACV/AAAI 2026
H3.4
Magnetic-timeline snap UI — FCP-inspired gap-closing snap for the cut review panel (drag cuts across sequence boundaries without gaps).
client/main.js (cut review panel)
— (frontend only)
FCP
H3.5
WebView UI UXP migration path — adopt Bolt UXP's WebView pattern to share the CEP codebase post-CEP-EOL (Sept 2026). Research spike only; no code lands in v1.25.0.
Tier 1 ships fully working in v1.25.0. Tier 2 ships as stubs + check_X_available() guards returning 503 MISSING_DEPENDENCY with install hints. Tier 3 lands as route scaffolding with a single "not yet implemented" response body + a TODO comment naming the upstream reference; promoted to Tier 2 once a user files a feature request or the upstream licence clarifies.
Wave H total: 21 new routes (Tier 1 + Tier 2), ~8 new stub routes (Tier 3), ~14 new check_*_available() entries, zero new required pip deps, 1 new blueprint (wave_h_bp).
Wave H gotchas
Gist sharing writes to public gists by default — /settings/gist/push requires an explicit private=True flag to target a secret gist, and requires GITHUB_TOKEN env for authenticated push. Unauthenticated push uses anonymous gists (IP-rate-limited by GitHub).
Virality score is heuristic — no ML model; a simple weighted blend of audio-energy peaks (from existing silence.py), transcript sentiment (via core/llm.py if available, falls back to keyword lexicon), and visual salience (optical-flow magnitude). Results are ranked 0–100 but the absolute number is not comparable across video types.
Cursor-zoom metadata parsing — accepts either a ScreenStudio / Screen.Studio sidecar JSON ({clicks: [{t, x, y}]}), an OBS-WebSocket recording log, or a frame-diff fallback (slower, OpenCV-based). Never trust client-supplied coordinates; clamp to [0, width] × [0, height].
Eye-contact shader — the MediaPipe face-mesh keypoint rotation fakes gaze at the cost of introducing a small warp around the eye region. The module returns a warp_factor between 0 and 1 so frontend previews can show the user a "before/after" slider rather than commit irreversibly.
Demo footage is bundled only in installer builds — PyInstaller spec adds opencut/data/demo/sample.mp4; pip-installed dev installs rely on a post-install opencut-server --download-demo flag that pulls from a GitHub release asset.
Onboarding wizard persists per-profile — stored as onboarding_seen: true in ~/.opencut/onboarding.json; deleting the file re-triggers the tour. Don't use localStorage — it doesn't survive panel reinstalls.
Issue-report bundle scrubs filepaths — /system/issue-report/bundle redacts any path under $HOME to ~/.../<basename>. Never let a user email raw crash.log to a bug tracker that could include private directory structures.
BridgeTalk event names are namespaced — all events use the com.opencut.<event> prefix. Panel listens via CSInterface.addEventListener in main.js. JSX emits via new CSXSEvent(...) (ES3-safe; no template literals).
Tier 3 routes always return 501 — ROUTE_STUBBED error code. Frontend treats 501 as "coming soon" (greyed-out with tooltip), never as a failed call.
Cross-project research pass against
ayushozha/AdobePremiereProMCP
— an MCP server for Premiere Pro (Go + Rust + TypeScript + Python
polyglot, WebSocket-in-panel transport, ~907 generated tools). Their
architecture isn't worth adopting wholesale (polyglot overhead for
negligible gain), but four small polish items and one strategic
capability are worth porting. All items are additive; no breaking
changes.
Tier 1 — small polish (high ROI, 1-2 days each)
#
Feature
Module (new)
Routes
Source
I1.1
Live panel stats widget — uptime, command count, avg response time (p50/p95), error count, active SSE / WS clients, last-error text. Renders as a new card on the Settings tab.
core/panel_stats.py + client/main.js stats card
GET /system/stats
AdobePremiereProMCP CEP panel
I1.2
Lazy-loaded JSX chunks — split host/index.jsx into host/core.jsx (media scan + ping + marker ops, eager) and host/domain.jsx (color / audio / transitions / captions, loaded on first call via $.evalFile). Target: trim cold-panel-open time.
WebSocket heartbeat pings (15 s) — active ping/pong from panel to /ws so dead sockets are detected before the next user action. Extends the existing wsDisconnect() reconnect loop.
Cross-platform launchers — add OpenCut-Server.command (macOS) and OpenCut-Server.sh (Linux) to match the existing OpenCut-Server.bat / OpenCut-Launcher.vbs. Keeps tarball installs turnkey on all three OSes.
Script → EDL → native Premiere sequence in one call. Chains: whisper transcribe (or raw script text) → LLM scene split → footage_search.py to match shots against an indexed media library → multicam_xml.py to emit FCP XML → host JSX import. Returns the new sequence's nodeId. Single POST replaces what currently takes 4-5 sequential jobs.
core/script_to_sequence.py
POST /timeline/assemble-from-script, POST /timeline/assemble-from-script/preview
AdobePremiereProMCP ExecuteEDL RPC
Not adopted (deliberate)
Polyglot stack (Go + Rust + TS + Python) — huge dependency burden for no user-visible benefit. OpenCut's single-process Flask ships as one exe via PyInstaller; we keep that.
WebSocket-in-panel server — they invert the normal CEP pattern (panel = server, external MCP client connects in on port 9801). Fine for their "MCP client drives Premiere" use case but breaks OpenCut's install-and-forget UX.
Auto-generated tool stubs — their own README disagrees with itself (907 vs 1,060 tools) suggesting heavy use of boilerplate generators. OpenCut's 1,275 routes are hand-written and tested.
gRPC between internal services — Flask + SSE + NDJSON streaming is enough. No cross-language boundary to bridge.
Wave I gotchas (anticipated)
Stats widget cardinality — don't track per-route p99 for every one of 1,275 routes (that's a memory leak waiting to happen). Aggregate at category level (audio/*, video/*, captions/*, system/*, settings/*) and keep a rolling window of the last 5 000 completed jobs.
Lazy JSX chunk loader must be idempotent — $.evalFile(path) called twice loads the script twice, which on ES3 redefines every top-level function. Track a window._ocLoadedJSXChunks Set on the panel side and short-circuit.
Heartbeat pings must be cheap — the server-side handler must be O(1) per ping. Don't touch the job store, don't run DB queries, don't acquire job_lock.
macOS .command file perms — must be committed executable (chmod +x) AND have a Gatekeeper-friendly #!/bin/sh shebang. Without +x macOS refuses to double-click-execute.
assemble-from-script LLM cost — the scene-split step is a single LLM call per ~4000-word chunk; cap at 8 chunks per request (≈32k-word script) or the backend will hit any rate limit on free Anthropic / OpenAI tiers.
Media-library index must exist first — /timeline/assemble-from-script requires a pre-built core/footage_index_db.py index; return 400 MISSING_INDEX with a hint to call POST /search/index first if the index is empty.
Wave I total: 4 new routes (Tier 1: /system/stats; Tier 2: /timeline/assemble-from-script + preview), 2 new core modules, 2 new launcher scripts, 1 JSX file split, zero new required pip deps.
EDL → CDL colour metadata passthrough — parse an EDL, extract Color Decision List values, emit as a .cdl file + an OTIO sidecar for DaVinci round-trip. Tiny code, high value for colourists.
core/cdl_bridge.py
POST /timeline/export/cdl, POST /timeline/import/cdl
Semantic keyframe extraction — CLIP embeddings + clustering pick N representative frames per clip for thumbnails / previews / summaries. Pairs with the existing virality score so the top-ranked clips get the smartest thumbnails.
core/keyframes_semantic.py
POST /video/keyframes/semantic, POST /video/keyframes/ranked
PSE hue-flash detector extension — extend the existing ITU-R BT.1702 flash detector to catch rapid hue changes (red→blue) that don't register on luminance delta. Accessibility win for seizure-prone viewers.
core/pse_flash.py (extend)
— (enhances existing /video/pse/check)
ITU-R BT.1702 + custom
—
J1.5
Video fingerprinting / duplicate detection — perceptual hash over clip segments; finds duplicate shots across an ingested library. Pure-stdlib pHash-style implementation (NOT the GPLv3 pHash lib — we roll our own to stay MIT).
Tier 2 — New AI surfaces (503 MISSING_DEPENDENCY stubs)
Same pattern as Wave H Tier 2: ship check_X_available() guards, 503 with install hints, full wiring lands in later releases once each upstream pins a stable Python entry point.
#
Feature
Module (stub)
Routes
OSS Source
J2.1
DCVC-RT real-time neural codec (Microsoft, CVPR'25) — 21% bitrate saving vs H.266 at 100+ fps 1080p, real-time 4K on modern GPUs. Replaces H.264 for proxy / preview generation.
core/codec_dcvc.py
POST /video/encode/dcvc, POST /video/decode/dcvc, GET /video/encode/dcvc/info
Advanced frame interpolation aggregator — unified /video/interpolate route that dispatches across RIFE (shipped) + GIMM (J2.2) + PerVFI (J2.3) + FFmpeg minterpolate fallback based on availability + user preference.
core/neural_interp.py (extend)
GET /video/interpolate/backends (extend existing)
— (dispatcher over J2.2/J2.3)
Tier 3 — Strategic UX patterns (scaffolding + research notes)
These are patterns from the research pass that deserve a documented landing spot but don't warrant a code stub — they're UX investments or architecture decisions that pay off across multiple releases.
#
Feature
OSS / Product Source
Notes
J3.1
Scene-aware auto-ducking — LLM-decided dialogue-vs-music submix routing. Existing /audio/duck does amplitude-based ducking only; extend to a scene-tag-aware router that knows "this is dialogue over music bed, dip the bed 12 dB".
Hindenburg, Auphonic (closed products)
New route /audio/auto-duck-scene; relies on existing LLM + transcript infra. L effort; no stub in Wave J.
J3.2
Multi-pass caption review gate — per-segment approve / flag / lock before export, with broadcast-compliance auto-check layered on top of caption_compliance.py. Differentiates from Premiere's native caption flow.
Rev, Glocap, Aegisub (pattern)
New route /captions/review-gate; panel card extension. M effort; no stub in Wave J.
J3.3
Node-based colour graph UI — SVG canvas on the panel, nodes = colour ops (lift/gamma/gain, LUT apply, curves), edges = pipeline. Lets users wire grades visually instead of through a linear filter chain. Attracts colourists.
DaVinci Resolve colour page
New panel card + POST /video/color-node-graph/apply; graph schema in core. L effort; no stub in Wave J.
J3.4
Client-review feedback loop — export a watermarked password-gated preview, collect frame-locked comments via a lightweight web view, reimport as timeline markers. Closes the post-production → client → revision loop without Frame.io subscription.
Frame.io, Wipster, Vimeo Review (pattern only)
New blueprint + static HTML review site; persist comments in ~/.opencut/reviews/<session_id>.json. L effort; design spike only.
J3.5
De-subtitling (burned-in subtitle removal) — detect burned-in subtitle regions via OCR confidence (reuse J1.1) then inpaint via existing ProPainter / ROSE. Inverse of J1.1 — produces a clean base video for re-localisation.
Glocap (pattern)
New route /video/de-subtitle; chains J1.1 + existing inpainting infra. M effort; schedule after J1.1 lands.
J3.6
Multi-language audio package delivery — extend /delivery/export to mux N audio streams (dialogue per language) + 1 master subtitle stream into a single MKV or H.264/H.265 container. Single output file instead of N separate files for N languages.
Broadcast delivery conventions
Extension to existing delivery routes; S effort; schedule when J1.1 archival lane lands.
Not adopted (with rationale)
Documented explicitly so future research passes don't re-surface these:
pHash (perceptual hash library) — GPLv3 licence contaminates OpenCut's MIT promise. We implement our own MIT-licensed pHash-style fingerprint in J1.5 instead.
EmoMUNIT (voice emotion transfer) — niche, low user demand, lab-quality voice artefacts. Skip unless a user files a feature request.
MyFrame / FreeFrame (self-hosted Frame.io clones) — shipping these well is a business, not a feature. J3.4 captures the narrow client-review slice that matters to OpenCut users.
ai-typography / atokern — font-level work is a different product. Skip.
pHash (again, GPLv3) — still no.
Wave J gotchas (anticipated)
PaddleOCR GPU footprint — J1.1 needs ~2 GB of PaddleOCR models per language pack. Download lazily per-language via a new /captions/extract-burned-in/install?lang=<iso> endpoint rather than front-loading every language on startup.
J1.3 CLIP embeddings already cached — footage_index_db.py (Wave 1.9.0) already caches CLIP embeddings for footage search. Re-use those — don't recompute on every keyframe request.
DCVC-RT decoder must match encoder — bitstreams are NOT interchangeable with H.264/HEVC. Any clip encoded with J2.1 requires J2.1 for decode. Flag this in the delivery preset so users don't hand off a DCVC proxy to a client who can't play it back.
GIMM-VFI + PerVFI share a CUDA-heavy runtime — ship them under a single opencut[interp-neural] pip extra so users don't double-install torch.
J1.4 PSE hue detector must not flag brand colour flips — branded motion graphics (logo reveal red→blue) triggers the naive detector. Gate the hue-delta check on a per-region basis (detect foreground vs background first) or expose a pse_hue_sensitivity knob.
J2.6 content-moderation scores are not decisions — return score + category and let the user / platform apply the policy. Never hard-block export on a content-moderation flag — that's someone else's compliance team's call.
Node-based colour graph JSON schema needs versioning — users will save graphs and expect them to keep loading in v1.28.0+. Pin the schema from day 1 with a "version": 1 field and a migration path.
J3.4 client-review feedback URL must be opt-in shareable — default it to localhost-only with a "generate public link" button that exposes a reverse-proxy endpoint. Don't make the panel silently open a public port.
J1.1 OCR + J3.5 de-subtitling must chain safely — if J1.1 extraction fails (OCR confidence too low), J3.5 should not blindly inpaint whatever rectangles J1.1 guessed. Propagate a confidence threshold + abort flag.
Wave J total: 15 new routes (J1.1-J1.5 + J2.1-J2.7), 9 new core modules, 9 new check_*_available() entries, zero new required pip deps, 1 new blueprint (wave_j_bp).
Four-angle research pass (May 2026): OSS tools (Gyroflow, Kdenlive, SubtitleEdit, VapourSynth),
AI models 2024-2026 (AudioSeal, Amphion/Vevo2, GPT-SoVITS, TokenFlow, Cutie, DEVA, SEA-RAFT,
EchoMimic V3, CosyVoice2, DiffBIR, Apple Depth Pro, NAFNet, Open-Sora v2, LTX-2, DepthFlow,
Gyroflow), and commercial feature analysis (CapCut 2026, Descript Underlord, OpusClip, Runway
Gen-4.5, Adobe Premiere 2026, DaVinci Resolve 21, HeyGen, ElevenLabs, Suno v5.5). 27 items
survive the licence + novelty filter across three tiers.
Tier 1 -- High ROI, Zero/Minimal ML (fully working)
#
Feature
Module (new)
Routes
Source
Licence
K1.1
AudioSeal AI-content watermark -- imperceptible audio watermark embeds provenance into all AI-generated audio. pip install audioseal. No other local editor ships this; legally significant for AI output.
core/audio_watermark.py
POST /audio/watermark/embed, POST /audio/watermark/detect, GET /audio/watermark/info
Brand Kit system -- logo, hex palette, fonts, intro/outro clip, watermark position stored in ~/.opencut/brand_kit.json. Auto-inject via brand_kit=true flag on compose routes. Zero ML; pure UX. CapCut/OpusClip ship this; no OSS editor does.
core/brand_kit.py
GET /settings/brand-kit, POST /settings/brand-kit, POST /settings/brand-kit/preview, DELETE /settings/brand-kit
POST /audio/podcast/suite, POST /audio/podcast/audiogram, POST /audio/podcast/show-notes
Descript / Headliner pattern
--
K1.4
Multi-ratio batch reframe -- one call produces 16:9 + 9:16 + 1:1 + 4:5 + 4:3 crops via existing smart_reframe.py. Returns zip with ratio-named filenames. CapCut/OpusClip charge per export.
core/batch_reframe.py
POST /video/reframe/batch, GET /video/reframe/batch/presets
CapCut / OpusClip pattern
--
K1.5
Star rating + clip tagging -- good/neutral/rejected + 1-5 stars + free-form tags per clip in ~/.opencut/clip_db.json. DaVinci / FCP ship this for dailies culling; OpenCut has no rating system.
core/clip_rating.py
POST /clips/rate, POST /clips/tag, GET /clips/search, DELETE /clips/tag
DaVinci / FCP pattern
--
K1.6
Subtitle QA validator -- CPS check, min/max gap, overlap detection, max line length across entire SRT/VTT/ASS. Four built-in profiles (Netflix, BBC, YouTube, EBU-TT-D). Extends caption_compliance.py.
core/subtitle_qa.py
POST /captions/qa/validate, GET /captions/qa/profiles
SubtitleEdit rule patterns (GPL-3 data reimplemented MIT)
MIT
K1.7
Bulk profanity censor -- Whisper word timestamps -> beep tone via FFmpeg aevalsrc. Modes: bleep / silence / mute_speaker. Custom word list via JSON.
core/profanity_censor.py
POST /audio/censor/profanity, GET /audio/censor/wordlists
Premiere / Descript pattern
--
K1.8
EQ / Level Spectral Matcher -- FFT (scipy) measures reference clip spectral curve, computes FIR correction filter, applies to target. "Make this interview sound like that reference mic." DaVinci Fairlight charges ~$295 for this. Pure Python, zero new GPU deps.
core/spectral_match.py
POST /audio/spectral-match, POST /audio/spectral-match/preview
DaVinci Fairlight pattern
--
K1.9
Lottie animation import -- render .json / .lottie as a video clip with alpha via lottie-python (MIT). Output: WEBM/MOV with alpha for compositing. DaVinci 21 ships native Lottie; no OSS editor does.
AI semantic media search -- unified CLIP visual + CLAP audio + Whisper transcript index over the media library. "Find shots with a person laughing outdoors." Extends existing footage_index_db.py. Adobe Premiere 2026 ships this; no OSS editor does.
core/semantic_search.py
POST /search/semantic, POST /search/index, GET /search/index/status
Vevo2 singing voice conversion -- Amphion Vevo2: convert speech/TTS into a singing performance with pitch conditioning. First singing capability in OpenCut. Shares Amphion install with K2.2.
core/singing_vevo2.py
POST /audio/sing/vevo2, GET /audio/sing/vevo2/info
TokenFlow training-free video style edit -- ICLR 2024, MIT. Apply diffusion style to real footage without training. "Restyle this clip as watercolour." No commercial editor ships a local free equivalent.
core/style_tokenflow.py
POST /video/style/tokenflow, GET /video/style/tokenflow/info
Cutie persistent video object tracking -- CVPR 2024, MIT. Track a segmented object across the full video with temporal memory. Pass SAM2 mask from frame 0; Cutie propagates. Enables "remove object from entire video" without per-frame annotation.
core/track_cutie.py
POST /video/track/cutie, GET /video/track/cutie/info
DEVA open-vocabulary video tracking -- ICCV 2023, MIT. Text-prompted: "track all cars" or "track the person in the blue shirt." Grounded-SAM + temporal propagation. Unique vs SAM2 click prompting.
core/track_deva.py
POST /video/track/deva, GET /video/track/deva/info
SEA-RAFT optical flow -- ECCV 2024, BSD-3. 2.3x faster than RAFT, SOTA on Spring benchmark. Feeds motion blur synthesis, motion trails, improved interpolation. Drop-in for any current RAFT call.
core/flow_searaft.py
POST /video/flow/searaft, GET /video/flow/backends
Gyroflow IMU stabilization -- Apache-2 CLI. Gyroscope/IMU warp stab from GoPro/DJI/Sony metadata sidecar. Far superior to vidstab for action-cam footage. Lens profile DB, horizon lock, STmap export, Sony IBIS. Subprocess call to gyroflow binary.
core/stabilize_gyroflow.py
POST /video/stabilize/gyroflow, GET /video/stabilize/gyroflow/info, GET /video/stabilize/gyroflow/lens-profiles
AI motion deblur -- NAFNet (ECCV 2022, Apache-2) for motion blur; MIMO-UNet as lightweight fallback. Zero deblur capability in OpenCut today. DaVinci Resolve 21 ships this as a premium AI feature.
core/deblur_motion.py
POST /video/restore/deblur-motion, GET /video/restore/deblur-motion/backends
Apple Depth Pro metric depth -- MIT, zero-shot metric depth with absolute scale. Faster + more accurate than Depth Anything V2 on single-frame depth. New backend for existing depth routes; enables accurate parallax and cinefocus without calibration.
core/depth_depthpro.py
POST /video/depth/depthpro, GET /video/depth/backends
DepthFlow parallax-from-stills -- convert a single still into a parallax-motion video using depth-based 2.5D warp. Creates motion from one still (Ken Burns on steroids). CLI subprocess.
core/depth_flow.py
POST /video/depth-flow/generate, GET /video/depth-flow/info
Text-to-SFX (AudioCraft AudioGen) -- generate SFX from text prompt ("footsteps on gravel", "thunderstorm"). Code MIT; weights CC-BY-NC (download instructions in 503 hint, not bundled). No other local editor ships this.
LTX-Video 0.9.8 + LTX-2 audio+video joint -- LTX-2 is the first model to generate audio and video simultaneously. Upgrade existing LTX-Video backend; add audio+video joint generation route. No other local tool ships synchronized A+V generation.
Audio-driven visual FX system -- BeatNet beat timestamps + frequency band analysis drive zoom pulse, chromatic aberration, colour saturation, shake, strobe keyframes. Reactive presets ("boom", "bass-drop", "snare"). No OSS editor exposes this as a system.
core/audio_reactive_fx.py
POST /video/audio-reactive/render, GET /video/audio-reactive/presets
Tier 3 -- Strategic Pipelines (route scaffolding + research notes)
#
Feature
Module (stub)
Routes
Source
Notes
K3.1
Full local video dubbing pipeline -- WhisperX STT -> NLLB-200 translate -> CosyVoice2/GPT-SoVITS voice clone -> EchoMimic V3 lip sync -> composite. HeyGen charges per-minute; OpenCut: private, free, local.
core/dub_pipeline.py
POST /dub/pipeline, GET /dub/pipeline/status/<job_id>
HeyGen pattern
L effort; schedule after K2.4 + K2.5 fill.
K3.2
Auto trailer/promo generator -- LLM moment scoring -> top-N extract -> MusicGen ramp + title card (declarative_compose) + CTA. All pieces in OpenCut; conductor is the gap. Descript Underlord ships this.
core/trailer_gen.py
POST /generate/trailer, POST /generate/promo
Descript Underlord pattern
M effort.
K3.3
IntelliScript .fdx / Fountain import -- extend Wave I script-to-sequence (I2.1) to accept Final Draft .fdx and Fountain .fountain files. Parse scene headings + WhisperX fuzzy-match transcript -> auto-assemble edit order. DaVinci 21 IntelliScript charges licence.
core/screenplay_parser.py
POST /timeline/assemble-from-screenplay (extends I2.1)
AI Face Age Transformer -- age slider on a face in video via IP-Adapter + Cutie temporal tracking. DaVinci 21 ships this. No OSS equivalent at video level yet.
core/face_age_transform.py
POST /video/face/age-transform, GET /video/face/age-transform/info
DaVinci 21 pattern
L effort; confirm weights licence before promoting to Tier 2.
K3.5
AI Slate ID -- Florence-2 VLM (already installed) reads clapperboard scene/take/camera from clip-head frames. Stamps metadata into OTIO + Premiere XMP. DaVinci 21 ships this.
core/slate_id.py
POST /video/slate/identify, GET /video/slate/identify/info
DaVinci 21 pattern (Florence-2 already in OpenCut)
M effort; Florence-2 already installed.
K3.6
Video outpainting -- expand frame borders via diffusion to change aspect ratio (generate content at edges). Wan2.1 VACE or LTX-2 inpainting conditioned on existing frame content. Runway charges per-second.
core/outpaint_video.py
POST /video/outpaint, GET /video/outpaint/info
Runway Gen-4 pattern
L effort; depends on K2.17 or K3.7.
K3.7
Wan2.1 VACE video editing -- existing C4 stub covers T2V; VACE adds editing of existing footage via video conditioning (background change, re-light, modify action). Different inference path from T2V.
core/gen_video_wan_vace.py
POST /generate/wan/vace, GET /generate/wan/vace/info
AudioSeal latency (K1.1) -- embed runs >1x realtime on CPU; wire as post-export background job, never synchronous on the export path.
Brand Kit opt-out (K1.2) -- must be explicit brand_kit=true per render. Never auto-apply to client footage without consent.
GPT-SoVITS server (K2.1) -- ships its own inference server (port 9880). OpenCut wraps it as a subprocess sidecar. Check server health before routing; surface install instructions when absent.
Amphion + Vevo2 shared checkpoint (K2.2/K2.3) -- one check_amphion_available() guard covers both. Don't require two separate downloads.
EchoMimic V3 backend priority (K2.5) -- when available, /lipsync/backends sets echomimic to recommended: true. Don't silently redirect from MuseTalk/LatentSync; let the user choose.
SEA-RAFT resolution cap (K2.9) -- cap input to 1080p and use downsample-process-upsample unless user explicitly requests 4K flow.
DiffBIR inference time (K2.10) -- expose tile_size (default 512) and fast_mode=true (4-step DPM-Solver++ vs 50 DDIM) to manage 30-60 s per-frame cost.
Gyroflow binary (K2.11) -- not on PyPI. check_gyroflow_available() fetches pre-built binary from gyroflow GitHub releases for the detected platform.
DepthFlow headless (K2.14) -- uses ModernGL for GPU rendering; needs virtual framebuffer (Xvfb) on headless Linux. Document in 503 install hint.
AudioGen weights (K2.15) -- CC-BY-NC cannot be bundled. check_audiogen_available() detects presence and surfaces the download URL. Never auto-download CC-BY-NC weights silently.
LTX-2 A+V joint (K2.17) -- different inference call from existing LTX T2V path. New route /generate/ltx/v2 keeps old /generate/ltx backward-compatible.
CineFocus bokeh (K2.19) -- pre-compute depth map for entire clip in batch before rendering blur sequence. Expose focal_z_start, focal_z_end, focal_frame_start, focal_frame_end params.
Dub pipeline translation (K3.1) -- NLLB-200 runs locally (MIT). Never route translation through a cloud LLM unless user explicitly selects an API provider.
Wave K total: 34 new routes (Tier 1 + Tier 2 + Tier 3 scaffolding), 27 new core modules,
20 new check_*_available() entries, 0 new required pip deps, 1 new blueprint (wave_k_bp).
Ten features where OpenCut will be first to ship locally (no OSS NLE equivalent):
K1.1 AudioSeal -- AI audio provenance watermarking on all generated output
K1.2 Brand Kit -- project-identity injection into every render
CEP/UXP ecosystem: bolt-cep (Hyper Brew), Bolt UXP WebView UI,
Adobe UXP Premiere Pro Samples, Adobe CEP Samples (PProPanel),
SoundBuddy Studio, jumpcut, vakago-tools QE API documentation.
MCP servers for NLEs (Wave I source):
ayushozha/AdobePremiereProMCP
— polyglot Go + Rust + TS + Python MCP server exposing ~907
Premiere tools over a WebSocket-in-panel transport on port 9801.
Architecture not adopted; four polish patterns (live stats widget,
lazy JSX chunking, WS heartbeat, cross-platform launchers) +
one strategic route (script-to-sequence ExecuteEDL equivalent)
promoted into Wave I.
Sources (Wave J addendum — April 2026 three-angle research pass)
Q1-Q2 2026 niche AI (evaluated, not adopted):
DiffuEraser, COCOCO, ViDeNN, UDVD, EmoMUNIT, C-MET (research-only),
GMR (motion retargeting — out of scope for an NLE extension),
ai-typography, atokern (out of scope), pHash (GPLv3).