Mofusand Synth is a Tauri 2 + SvelteKit desktop app that converts YouTube audio into 8-bit chiptune. The Rust backend downloads audio via yt-dlp and handles file I/O through three IPC commands. The Svelte frontend owns all audio processing and offers two conversion modes:
- DSP Crush β degrades the original recording with a bitcrusher / lowpass / sample-rate-reduction chain (real-time, tweakable). Optional vocal removal.
- True Chiptune β transcribes the song to notes with Spotify's basic-pitch ML model, then re-synthesizes those notes on square/triangle/pulse/sawtooth oscillators.
Both modes ultimately produce an AudioBuffer that flows through one shared playback engine (play/pause/seek) and one shared WAV export path.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SvelteKit Frontend (Webview) β
β β
β +page.svelte ββ UrlInput ββ Player ββ ChiptuneControls β
β β
β DSP mode: source β preGain β WaveShaper β lowpass β
β β downsampler(AudioWorklet) β destination β
β β
β Chiptune mode: original β transcribe(basic-pitch) β
β β notes β renderChiptune() β AudioBuffer β
β β source β cleanGain β destination β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β Tauri IPC (invoke)
βββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β Rust / Tauri Commands β
β download_audio(url) β { path, title } β
β read_audio_file(path) β Vec<u8> β
β save_audio_file(bytes, name) β () (native save dialog) β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β std::process::Command
βββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β yt-dlp (external binary) β
β Downloads YouTube audio (mp3) to the OS temp directory β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
mofusand-synth/
βββ src-tauri/ # Rust / Tauri backend
β βββ src/
β β βββ main.rs # thin entry β lib::run()
β β βββ lib.rs # Tauri builder, registers commands
β β βββ commands/
β β βββ mod.rs
β β βββ download.rs # download_audio (yt-dlp)
β β βββ file.rs # read_audio_file, save_audio_file
β βββ capabilities/default.json # dialog:allow-save permission
β βββ Cargo.toml
β βββ tauri.conf.json # 620Γ800 window, csp: null
βββ src/ # SvelteKit frontend
β βββ app.css # global styles + Mofusand theme
β βββ app.html
β βββ routes/
β β βββ +layout.js # ssr = false (SPA mode)
β β βββ +layout.svelte # imports app.css
β β βββ +page.svelte # root: state + Tauri invocations
β βββ lib/
β βββ audio.js # makeBitCrushCurve, encodeWav
β βββ transcribe.js # basic-pitch wrapper (audio β notes)
β βββ chiptune.js # notes β rendered chiptune AudioBuffer
β βββ worklet/downsampler.js # AudioWorklet sample-rate reducer
β βββ components/
β βββ UrlInput.svelte
β βββ Player.svelte # both modes, playback, download
β βββ ChiptuneControls.svelte # DSP sliders
βββ static/
β βββ model/ # basic-pitch model (served at /model/)
β βββ model.json
β βββ group1-shard1of1.bin
βββ docs/
paste URL β +page.svelte invoke("download_audio", {url})
β Rust yt-dlp β { path, title }
β invoke("read_audio_file", {path}) β bytes
β Player decodes β originalBuffer
originalBuffer β BufferSource β [vocal removal?] β preGain
β WaveShaper(bitcrush) β BiquadFilter(lowpass)
β AudioWorklet(downsampler) β destination
Sliders (Bit Depth / Sample Rate / Wave Crush) update nodes live.
originalBuffer β resample mono 22050Hz β basic-pitch.evaluateModel()
β note events (pitchMidi, startTimeSeconds, durationSeconds, amplitude)
β renderChiptune(notes) [OfflineAudioContext: oscillators + envelopes]
β chiptuneBuffer β BufferSource β cleanGain β destination
Notes are cached; changing the waveform only re-runs renderChiptune.
DSP mode: OfflineAudioContext re-renders effects chain β encodeWav
Chiptune mode: chiptuneBuffer already rendered β encodeWav
β invoke("save_audio_file", {bytes, filename}) β native dialog
yt-dlpmust be on PATH β surfaced as an inline error if missing.- Tauri capabilities must allow
dialog:allow-save. - CSP is disabled (
csp: null) so tfjs and the local model load freely. - Vocal removal uses LβR channel cancellation; requires a stereo source.
- basic-pitch transcription is best on melodic content; very dense mixes get noisy.
- Audio is held in memory (
Vec<u8>/AudioBuffer) β fine for typical song lengths.