Local, GPU-accelerated music video generator: upload a track, analyze it, align lyrics, generate stylized backgrounds (SDXL keyframe stills by default — with optional RIFE morph and Ken Burns on those stills enabled by default), composite reactive shaders and kinetic type, and encode with ffmpeg (NVENC on NVIDIA GPUs by default).
Examples (progress log, newest = current state): voidcat on YouTube
- Easiest install: Pinokio — install the Pinokio app, search glitchframe, install Glitchframe from the listing, then run Install and Start in the sidebar. You still need ffmpeg on your
PATHand a capable NVIDIA GPU for the intended experience — see Requirements and Pinokio. - UI: Gradio — run
python -m appand open the URL shown (default http://127.0.0.1:7860). Pinokio /requirements.txttracks Gradio 5.x;pyproject.tomlcore metadata may still list Gradio 4.x bounds — installrequirements.txtbeforepip install -e .for parity (see Install). - Manual setup (Windows CLI): Getting started on Windows — order of installs: Python, optional Git, ffmpeg with winget, venv, PyTorch; or use Pinokio above instead.
- Deep dive:
docs/index.md(includesbackground-keyframes-editor,background-stills, effects/lyrics editors) anddocs/technical/project-setup-and-config.md.
License: MIT · Repository: github.com/OlaProeis/Glitchframe
- Ingest and analysis: Per-song cache, waveform preview, beat/onset/spectrum features, optional Demucs vocal stem, segment/chapter hints.
- Lyrics: WhisperX word timings plus alignment to pasted lyrics; visual per-word timeline editor for fixes (saved to cache so re-runs do not clobber your edits); optional Export .srt aligned to the same per-word timings.
- Backgrounds: Background keyframes tab — waveform timeline to edit SDXL still timing and per-clip prompts, Generate SDXL stills / Regenerate per slot, Replace / Crop with image upload (staging until Save timeline; crop selection is locked to the output resolution aspect ratio). Still count is fixed from analysis (~one per ~8 s); see
docs/technical/background-keyframes-editor.md. SDXL + optional RIFE morph is the supported AI background path (smooth optical-flow interpolation between keyframes; Ken Burns on SDXL stills stays on by default). Static image + Ken Burns uses your uploaded plate only (no SDXL). In Visual style, pick a reactive shader (or none); that pre-fills an example scene prompt, typography style, and palette. Use Scene prompt for defaults applied when generating stills; refine individual clips on the Background keyframes tab. - Look and motion: Optional GLSL reactive shaders (curated list + no shader for a clean background plate), Skia kinetic typography, lyrics preview strip for quick draft checks, title/thumbnail text, optional logo placement with rim glow, beams, and branding-driven effects.
- Effects timeline: Per-clip post effects (e.g. screen shake, chromatic aberration, colour invert, scanline tear, block glitch, pixel smear, fade-to-black) with an in-UI editor and baked JSON under the song cache.
- Output: Full-length render (with clearer progress and cancellable/long-job UX) and 10 s preview (loudest window),
output.mp4+thumbnail.png+ YouTube-orientedmetadata.txt, with NVENC by default when available; the Preview & render tab (last tab) embeds the last MP4 in-browser and includes Open output folder (outputs/<run_id>/on disk — no redundant player download button). - Background stack: Optional HiDream-O1-Image stills backend (advanced / separate worker path — see
docs/technical/background-stills-hidream.md). SDXL+RIFE timelines continue to be the mainstream path (docs/technical/rife-morph-background.md).
master is the recommended branch — it absorbs merged work from dev, including: Ken Burns on static uploads and (by default) SDXL stills, with RMS drive plus optional effects_timeline.json automation; a post-rotate center crop (ROT_TRIM_OVERSCAN in pipeline/background_kenburns.py) so subtle tilt avoids black edge slivers; RIFE morph boundary/inset revisions; expanded effects rows (replacing legacy zoom punch); lyrics preview strip, SDXL/RIFE confirmation flows; cancellable long renders; clearer compositor/logging progress and optional HiDream stills (see docs/technical/background-stills-hidream.md). Pinokio/catalog defaults track master; checkout dev only if you want commits before they merge.
Lyrics timeline (per-word alignment and editing on the vocal waveform):
Effects timeline (clip-based post effects with rows, playhead, and per-clip controls):
Background keyframes matches the same interaction pattern (waveform, draggable clips, playhead, per-slot regenerate/replace/crop). See docs/technical/background-keyframes-editor.md.
- Background keyframes count is fixed from the analysis/plan (not open-ended in the UI). Edit timing and prompts for existing clips; Save timeline writes
keyframes_timeline.json+manifest.json(and clears the RIFE morph cache when the manifest changes).docs/technical/background-keyframes-editor.md - Vocal / lyrics matching can be unreliable in places. Treat alignment as a draft: use the lyrics timeline and listen back before you commit time to a full render. Improving this area is a priority; do not assume perfect lip-sync or line timing yet.
- Rendering is effectively single-threaded for the heavy pipeline. Full videos often take on the order of 1–2+ hours (sometimes more), depending on chosen shader (or none), scene complexity, length, resolution, GPU, and whether RIFE morph bakes extra frames after SDXL. Plan batch work accordingly.
- RIFE morph (optional, default-on in the UI) runs a CUDA bake after SDXL keyframes; it downloads ~24 MB of weights from Hugging Face on first use (see
docs/technical/rife-morph-background.md). GPU time scales with keyframe count and the subdivisions slider. - The app is under active development; UI labels and edge cases are still being hardened.
The following is a short, user-facing summary of work not yet done (also tracked in Taskmaster as pending / deferred in .taskmaster/tasks/tasks.json):
- Unify “auto” effects with the timeline — one control surface: analyser-driven glitch, beams, and related FX should not stack with the Effects timeline in confusing ways; timeline becomes authoritative where intended (pending).
- Faster preview backgrounds — generate SDXL (and optional RIFE) / Ken Burns assets only for the 10 s preview window (plus padding), then fill the rest on full render, with clearer cache keys so preview is much cheaper than today (deferred).
- Bass-driven logo pulse — optional mode where logo motion follows low-frequency energy / kicks instead of a generic beat grid, with tunable sensitivity (deferred).
- Overnight / multi-song queue — batch several full renders (CLI or Gradio) with stable paths and isolated failures (deferred).
- Single primary “export” affordance — one obvious control that runs the full pipeline, while keeping optional Analyze/Align as precache steps (deferred).
- Timestamped section headers in lyrics — lines like
[Verse 1 0:12]or[Chorus 1:00]that set both a section break and a coarse time anchor, to reduce manual[m:ss]busywork (deferred).
- Python 3.11+ (3.12/3.13 may work; optional deps like
madmomare pickier on newer Python) - ffmpeg on your
PATH(encode/mux). On Windows, install with winget (see Getting started on Windows); on other systems use your package manager or ffmpeg.org if needed - NVIDIA GPU + CUDA recommended for diffusers, analysis, and NVENC; Pinokio installs PyTorch builds from
torch.js(e.g. cu128 on Windows/Linux NVIDIA — seetorch.js). CPU-only is possible for lighter paths but not the main focus - Disk: model and song caches under
.cache/andcache/(large downloads on first use)
Recommended: Use Pinokio: install Pinokio, search glitchframe, install Glitchframe from the listing, then use Install / Start in the app. That flow runs the repo’s Pinokio scripts (install.js, torch.js, start.js, …). Prerequisites: ffmpeg on your PATH, a suitable NVIDIA GPU (for the full GPU path), and recent drivers — see Pinokio. Catalog installs clone the repo’s default Git branch (master). Optional: git checkout dev inside the Pinokio app clone and Reinstall to pin pre-merge revisions (see docs/technical/pinokio-package.md); Update runs git pull on whichever branch you are on.
Windows, full walkthrough (command line): docs/guides/getting-started-windows.md.
The steps below are a manual install (clone or ZIP + venv + pip); they match that guide. On Windows prefer py -3.11 if python is not on your PATH.
With git:
git clone https://github.com/OlaProeis/Glitchframe.git
cd Glitchframe
python -m venv .venvWithout git: from the GitHub repo, Code → Download ZIP, extract, cd into the folder, then python -m venv .venv (or py -3.11 -m venv .venv on Windows).
Then activate the venv:
- Windows (cmd):
.venv\Scripts\activate.bat - Windows (PowerShell):
.venv\Scripts\Activate.ps1 - macOS / Linux:
source .venv/bin/activate
Path A — match Pinokio / torch.js (good default for NVIDIA on Windows): after creating the venv and upgrading pip, install project deps (§3) or at minimum requirements.txt, then install [all], then run the same uv pip / pip line as in torch.js for your platform. NVIDIA + Windows (from torch.js):
python -m pip install -U uv
uv pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128 --force-reinstall --no-deps(Use the cpu, DirectML, ROCm, or macOS blocks inside torch.js for other platforms.) Order: Pinokio runs torch.js last, after requirements.txt, .[all], madmom, and beatnet — mirror that if you hit resolver issues.
Path B — legacy Windows “Align lyrics” stack (Python 3.11–3.12, cu121): optional extras all / lyrics / analysis in pyproject.toml still document a coherent PyTorch 2.2.2+cu121 + WhisperX 3.3.0 + ctranslate2 4.4.0 set for DLL alignment. Install the CUDA 12.1 index before the rest if you use this path:
python -m pip install --upgrade pip
python -m pip install torch==2.2.2+cu121 torchvision==0.17.2+cu121 torchaudio==2.2.2+cu121 --index-url https://download.pytorch.org/whl/cu121Then continue with §3–§4. scripts/windows_provision_cudnn_next_to_ctranslate2.py and windows-venv-recovery-guide.md apply mainly to this cu121 + ctranslate2 4.4 layout.
Path C — generic CUDA 12.4 wheels (cu124) for setups not using Path A or B (e.g. some Python 3.13 flows):
python -m pip install --upgrade pip
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124On Windows, if python is not on PATH, use the launcher: py -3.11 -m pip ....
Install requirements.txt before pip install -e . so you get the Gradio 5.x stack and the pins tested with Pinokio (pyproject.toml [project] dependencies still target Gradio 4.x unless you pull everything from requirements.txt first):
python -m pip install -r requirements.txt
python -m pip install -e .requirements.txt pins Gradio 5.x, fastapi, huggingface_hub, and related packages; see the comments at the top of that file if pip conflicts.
For vocal stem separation (Demucs) and lyrics alignment (WhisperX + Silero VAD):
python -m pip install -e ".[all]"If after installing whisperx CUDA disappears from PyTorch, reinstall the CUDA wheels from the same index you chose in §2, then re-run pip install -r requirements.txt if pip upgraded MarkupSafe / Pillow past Gradio’s expectations — recovery patterns are in requirements.txt comments and docs/technical/windows-venv-recovery-guide.md.
Optional beat detectors: Pinokio runs madmom and beatnet with --no-build-isolation after .[all]. For a manual venv, mirror Pinokio:
python -m pip install madmom --no-build-isolation
python -m pip install beatnet --no-build-isolation --no-depsOr use pip install -e ".[beats]" (can be finicky to build); the analyzer falls back to librosa without them.
copy .env.example .envOn Unix: cp .env.example .env — then edit .env if you need custom GLITCHFRAME_* paths or ffmpeg codec overrides. Legacy MUSICVIDS_* names are still read for the same settings. The sample file also lists optional API keys for Taskmaster / dev tooling, not for core Glitchframe.
python -m appOpen the local URL printed in the console (default port 7860).
Easiest path: Install Pinokio, open it, search glitchframe, and install Glitchframe from the listing — no need to paste a URL. Alternative: Download from URL and paste https://github.com/OlaProeis/Glitchframe.git.
This repository includes Pinokio scripts (install.js, torch.js, start.js, reset.js, update.js, pinokio.js version 3.7, icon.png). install.js (Python 3.11 venv env): uv pip install -r requirements.txt, uv pip install -e ".[all]", madmom / beatnet, then script.start → torch.js for platform-specific PyTorch (NVIDIA Windows/Linux: cu128 torch==2.7.0 trio by default — see torch.js). start.js sets GLITCHFRAME_WHISPERX_VAD_METHOD=silero, GLITCHFRAME_WHISPERX_DEVICE=cpu, and HF_HUB_DISABLE_SYMLINKS=1 (+ warning silencer) for reliable defaults on Windows; change or remove GLITCHFRAME_WHISPERX_DEVICE in start.js or .env to try GPU Align lyrics. No extra Pinokio step is required for RIFE (~24 MB on first morph). Click Start after install. ffmpeg must be on PATH. Technical details: docs/technical/pinokio-package.md. Repo topic pinokio for GitHub discovery.
Stale or custom clones: Update runs git pull on the current branch — use master vs dev intentionally (see pinokio-package doc).
-
Step-by-step (Windows, after PyTorch / lyrics issues): docs/technical/windows-venv-recovery-guide.md —
git pull, cleantorch/torchvision/torchaudioreinstall, extras, test Align lyrics. -
Align lyrics fails with
Weights only load failed/omegaconf/ListConfig: PyTorch 2.6+ defaultstorch.loadto a stricter mode that breaks some WhisperX/pyannote checkpoints. Prefer updating Glitchframe to a revision that includespipeline/torch_checkpoint_compat.pyand keeping a currenttorch/torchvision/torchaudiotrio from the same CUDA index (Install §2). Downgrading onlytorchto “fix” this often causes the cuDNN mismatch below. -
Could not load library cudnn_ops_infer64_8.dll/Could not locate cudnn_ops_infer64_8.dll/ error1920(often afterPerforming voice activity detection using Silero): the VAD line is misleading — Silero runs first; the crash is usually faster-whisper / CTranslate2 loading cuDNN DLLs. Windows + Python 3.11/3.12: use one coherent stack — PyTorch 2.2.2+cu121 + WhisperX 3.3.0 + ctranslate2 4.4.0 + faster-whisper 1.1.0 (see Install §2 andpyproject.tomlextras). Runpython scripts/windows_provision_cudnn_next_to_ctranslate2.pyafterpip install nvidia-cudnn-cu12. The app callsos.add_dll_directoryfortorch\libbefore WhisperX (pipeline/win_cuda_path.py). Windows + Python 3.13 or Linux/macOS: prefer PyTorch cu124 with ctranslate2 4.5+ and current WhisperX (seepyproject.tomlmarkers). If it still fails: reinstall the torch + torchvision + torchaudio trio from one CUDA index in a single command, thenpip install -e ".[all]"again; tryGLITCHFRAME_WHISPERX_DEVICE=cpu; last resort — manual cuDNN copy per NVIDIA cuDNN for CUDA 12. -
Pinokio / Windows:
install.jsinstallsrequirements.txt+.[all]+ madmom/beatnet, thentorch.js(typically cu128 PyTorch on NVIDIA).start.jsmay still default CPU WhisperX — removeGLITCHFRAME_WHISPERX_DEVICEor set cuda to try GPU alignment; Align lyrics can still fall back to CPU on DLL/load errors. -
Align lyrics fails with
[WinError 1314] A required privilege is not held by the clientwhile writing intocache\HF_HOME\hub\models--…\snapshots\…:huggingface_hubtries to symlink model snapshot files to the content-addressedblobs/directory, but Windows requires admin rights or Developer Mode to create symlinks. Glitchframe patcheshuggingface_hub.are_symlinks_supportedat startup so it copies blobs instead — Update + Start in Pinokio (no full reinstall) is enough to pick up the fix. If the cache ended up half-populated before updating, deleteC:\pinokio\api\Glitchframe.git\cache\HF_HOME\from Explorer and the next Align run will re-download cleanly. Details:docs/technical/pinokio-lyrics-align-windows-handover.md§ Bug F. -
Render is unexpectedly slow / "compositing" stuck around 40 %: the orchestrator label parks on its outer status during compositing because individual frames are reported by an inner Gradio progress callback. Recent builds replaced the per-frame UI message with a richer one (
Compositing 1843/4923 (37.4%) - 1.23 fps - ETA 41m42s - layers=BG+SHADER+TYPO) and update it from the request thread, so the bar now ticks live during long renders. After a render finishes, the run log appends a one-liner likecompositor: 4923 frames in 41m23s · avg 1.98 fps · encoder=h264_nvenc— ifencoder=libx264and you have an NVIDIA GPU, NVENC fell back to CPU; check the startup log for theffmpegcandidate list and anyCannot load nvEncodeAPI64.dll/Driver does not support the required nvenc API versionlines from the probe stderr. Multi-candidate ffmpeg discovery (env override → active env'sbin→ PATH → well-known dirs) means a working NVENC ffmpeg anywhere on your system will be picked up automatically, even if Pinokio's bundled one shadows it. -
Render fails with
Undefined constant or missing '(' in 'p5'/Unable to parse option value "p5"after background generation: your local ffmpeg is older than 4.4 (or NVENC SDK < 11) and doesn't understand the modernp1..p7preset family. Glitchframe now probes the chosen ffmpeg once and falls back to the legacyslowpreset automatically, so Update + Start in Pinokio is enough to pick up the fix — no reinstall. Visual quality is essentially unchanged (slowis the closest legacy equivalent ofp5). To get back to the modern preset family, install a recent ffmpeg (winget install ffmpegon Windows, orconda update -c conda-forge ffmpeginside Pinokio's env). Details:docs/technical/pinokio-lyrics-align-windows-handover.md§ Bug H. -
Pinokio / Gradio or dependency conflicts after updates: Update (
git pull) then Reinstall soinstall.jsre-runsuv pipagainst the currentrequirements.txt. If you mixpip install -e .withoutrequirements.txt, you can drift back topyproject.tomlGradio 4.x bounds — see Install §3. Advanced:docs/technical/pinokio-package.md. -
Apply crop / Regenerate keyframe:
PermissionError/WinError 5on*.tmp→.png: Windows blocks replacing a PNG if it is still open (Gradio or the browser preview of the same file, Explorer thumbnails, antivirus scan). Current builds close PIL readers before overwrite and retry atomic replace briefly; if it still fails, wait a second and retry, or close anything previewingcache/<hash>/background/(seedocs/technical/background-keyframes-editor.md§ Windows: saving PNGs).
- Smoke test config (paths + optional preset YAML count):
python config.py - Tests (after
pip install -e ".[dev]"):pytest - In this repo,
uv sync/uv run pytestis also used; seeai-context.mdfor maintainer notes.
AI-assisted development: Much of this codebase was built with AI coding assistants and planning tools (the same day-to-day workflow as the Ferrite project). For a concrete write-up of that process—context files, handover notes, and how tasks and reviews are organized—see Ferrite’s AI development workflow.
Issues and pull requests are welcome. Please keep changes focused; match existing style in the files you touch.
This project is licensed under the MIT License. Third-party assets (e.g. fonts under assets/fonts/) carry their own license files where applicable. RIFE inference code is derived from Practical-RIFE (MIT); runtime weights are downloaded from Hugging Face (MonsterMMORPG/RIFE_4_26 — confirm license/terms for your use case). SDXL and other diffusion models have their own licenses (e.g. Open RAIL-M).

