Skip to content

Commit 8787c30

Browse files
authored
Merge pull request #12 from n-n-code/model_handling_detach
detaching model handling from whisper.cpp to mutterkey side
2 parents a56df8e + 291a768 commit 8787c30

32 files changed

Lines changed: 1908 additions & 47 deletions

AGENTS.md

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ Current architecture:
99
- Global shortcut handling goes through `KGlobalAccel`
1010
- Audio capture uses Qt Multimedia
1111
- Transcription is in-process through vendored `whisper.cpp`
12+
- Native Mutterkey model packages are now the canonical model artifact; raw
13+
whisper.cpp-compatible `.bin` files remain only as a migration/import path
1214
- The public runtime seam is streaming-first through app-owned chunks, events, and compatibility helpers
1315
- Static backend support lives in `BackendCapabilities`, while runtime/device/model inspection lives in `RuntimeDiagnostics`
1416
- Clipboard writes prefer `KSystemClipboard` with `QClipboard` fallback
@@ -35,6 +37,11 @@ This repository is intentionally kept minimal:
3537
- `src/clipboardwriter.*`: clipboard integration, preferring KDE system clipboard support
3638
- `src/audio/recordingnormalizer.*`: conversion to runtime-ready mono `float32` at `16 kHz`
3739
- `src/transcription/audiochunker.*`: deterministic chunking of normalized audio for the streaming runtime path
40+
- `src/transcription/modelpackage.*`: product-owned manifest and validated package value types
41+
- `src/transcription/modelvalidator.*`: package integrity, compatibility, and bounds validation
42+
- `src/transcription/modelcatalog.*`: model artifact inspection and resolution
43+
- `src/transcription/rawwhisperprobe.*`: lightweight raw whisper.cpp header inspection used for migration compatibility
44+
- `src/transcription/rawwhisperimporter.*`: import path from raw Whisper `.bin` files into native Mutterkey packages
3845
- `src/transcription/transcriptassembler.*`: final transcript assembly from streaming transcript events
3946
- `src/transcription/transcriptioncompat.*`: compatibility wrapper that routes one-shot recordings through the streaming runtime seam
4047
- `src/transcription/whispercpptranscriber.*`: in-process Whisper integration and whisper-specific engine construction
@@ -112,7 +119,7 @@ QT_QPA_PLATFORM=offscreen "$BUILD_DIR/mutterkey" diagnose 1
112119

113120
Notes:
114121

115-
- `once` mode requires microphone access and a valid Whisper model path
122+
- `once` mode requires microphone access and a valid model artifact path
116123
- Real transcription verification needs a configured model in `~/.config/mutterkey/config.json` or a custom config path
117124
- A small `Qt Test` + `CTest` suite exists for config loading, audio normalization, streaming-runtime helpers, and transcription-worker orchestration, including malformed JSON, wrong-type config inputs, recording-normalizer edge cases, and fake streaming backend behavior
118125
- Repo-owned test cases are expected to carry `WHAT/HOW/WHY` comments near the start of each real test body; `scripts/check-test-commentary.sh` and `scripts/check-release-hygiene.sh` enforce that convention
@@ -130,6 +137,9 @@ Notes:
130137
- Use `cmake --build "$BUILD_DIR" --target docs` when touching repo-owned public headers, Doxygen config, the Doxygen main page, or CI/docs wiring
131138
- If install rules or licensing files change, confirm the temporary install contains the expected files under `share/licenses/mutterkey`
132139
- If you add or change public methods in repo-owned headers, expect `cmake --build "$BUILD_DIR" --target docs` to fail until the new API is documented; treat that as part of the normal implementation loop, not follow-up polish
140+
- Newly added repo-owned public structs and free functions in public headers also
141+
need Doxygen comments immediately; the `docs` target treats undocumented new
142+
API surface as a real failure, not optional cleanup
133143

134144
## Tooling Best Practices
135145

@@ -166,11 +176,17 @@ Notes:
166176
- Avoid introducing optional backends, plugin systems, or cross-platform abstractions unless the task requires them
167177
- Keep the audio path explicit: recorder output may not already match Whisper input requirements, so preserve normalization behavior
168178
- Prefer product-owned naming such as runtime audio, chunks, events, diagnostics, and compatibility wrappers over backend-shaped naming when touching app-owned code
179+
- Prefer product-owned model terminology too: package, manifest, catalog, metadata,
180+
compatibility marker, and model artifact path are the primary nouns now;
181+
reserve backend-shaped wording for the whisper adapter or raw-file migration path
169182
- Prefer narrow shared value types across subsystems; for example, consumers that only need captured audio should include `src/audio/recording.h`, not the full recorder class
170183
- Keep JSON and other transport details at subsystem boundaries; prefer typed C++ snapshots/results once data crosses into app-owned control, tray, or service code
171184
- Prefer dependency injection for tray-shell and control-surface code from the first implementation so headless Qt tests stay simple
172185
- When preparing the transcription path for future runtime work, prefer app-owned engine/session seams and injected sessions over leaking concrete backend types into CLI, service, or worker orchestration. Keep immutable capability reporting on the engine side, keep runtime inspection data in `RuntimeDiagnostics`, and keep the session side focused on mutable decode state, warmup, chunk ingestion, finish, and cancellation
173186
- Prefer product-owned runtime interfaces, model/session separation, and deterministic backend selection before adding new inference backends or widening cross-platform support
187+
- Keep model validation, metadata extraction, and compatibility checks app-owned.
188+
`whisper.cpp` should not be the first component that tells Mutterkey whether a
189+
model artifact is obviously malformed, incompatible, or oversized
174190
- Keep compatibility shims explicit in naming. If a one-shot daemon/CLI path is implemented on top of the streaming runtime seam, name it as a compatibility wrapper rather than making the old one-shot shape look like the primary contract
175191
- Keep backend-specific validation out of `src/config.*` when practical. Product config parsing should normalize and preserve user input, while backend support checks should live in the app-owned runtime layer near `src/transcription/*`
176192
- Preserve the current product direction: embedded `whisper.cpp`, KDE-first, CLI/service-first
@@ -199,6 +215,9 @@ Apply the C++ Core Guidelines selectively and pragmatically. For this repo, the
199215
- `scripts/update-whisper.sh` requires a clean Git work tree before it will fetch or run subtree operations
200216
- Treat `third_party/whisper.cpp` as subtree-managed vendor content and update it through the helper script rather than manual directory replacement
201217
- Prefer changing app-side integration code before patching vendored dependency code
218+
- Prefer resolving model-package, metadata, and import work entirely in app-owned
219+
code. Raw whisper.cpp `.bin` support is now a compatibility/import concern, not
220+
the canonical product contract
202221
- Prefer keeping fake runtime tests and app-owned helpers free of vendored whisper linkage unless the test is specifically about the whisper adapter or engine factory
203222
- Prefer fixing vendored target metadata from the top-level CMake when the issue is Mutterkey packaging or warning noise, instead of patching upstream vendored files directly
204223
- If you must modify vendored code, document why in the final response and record the deviation in `third_party/whisper.cpp.UPSTREAM.md`
@@ -209,6 +228,9 @@ Apply the C++ Core Guidelines selectively and pragmatically. For this repo, the
209228
- Repo-owned source is MIT-licensed in `LICENSE`
210229
- Third-party licensing and provenance notes live in `THIRD_PARTY_NOTICES.md`
211230
- `whisper.cpp` model files are not bundled; do not add model binaries to the repository
231+
- Native Mutterkey model packages also must not be committed to the repository;
232+
if a release needs to ship one, include it only in the release artifact or as a
233+
separate release asset outside Git
212234
- Do not introduce machine-specific home-directory paths, absolute local Markdown links, or generated build artifacts into tracked files
213235
- If a task changes install layout or shipped assets, keep the CMake install rules and license installs aligned with the new behavior
214236
- The installed shared-library payload is runtime-focused; do not start installing vendored upstream public headers unless the package contract intentionally changes
@@ -232,15 +254,24 @@ Default config path:
232254
Typical model location:
233255

234256
```text
235-
~/.local/share/mutterkey/models/ggml-base.en.bin
257+
~/.local/share/mutterkey/models/<package-id>
236258
```
237259

260+
Current `transcriber.model_path` semantics:
261+
262+
- package directory is the canonical target
263+
- `model.json` manifest path is also accepted
264+
- raw whisper.cpp-compatible `.bin` files are accepted only as a migration
265+
compatibility path
266+
238267
## Agent Workflow
239268

240269
- Read `README.md` first, especially `Overview`, `Quick Start`, `Run As Service`, and `Development`, then read the touched source files before editing
241270
- Prefer targeted changes over speculative cleanup
242271
- If a change grows daemon, tray, or control-plane behavior, prefer extracting or extending repo-owned libraries under `src/app/`, `src/control/`, or other focused modules instead of piling more orchestration into `src/main.cpp`
243272
- Update `README.md` and `config.example.json` when behavior or setup changes
273+
- Update `RELEASE_CHECKLIST.md` too when release-facing model packaging, shipped
274+
assets, or release-bundle guidance changes
244275
- Update `contrib/mutterkey.service` and `contrib/org.mutterkey.mutterkey.desktop` when service/desktop behavior changes
245276
- Update `LICENSE`, `THIRD_PARTY_NOTICES.md`, CMake install rules, and `third_party/whisper.cpp.UPSTREAM.md` when packaging, licensing, or vendored dependency behavior changes
246277
- Keep `README.md`, `AGENTS.md`, and any relevant local skills aligned with the current `scripts/update-whisper.sh` workflow when the vendor-update process changes
@@ -262,7 +293,9 @@ Typical model location:
262293
- Prefer the `lint` target for a full pre-handoff analyzer pass, and use the individual analyzer targets when iterating on one class of warnings
263294
- Run `bash scripts/run-valgrind.sh "$BUILD_DIR"` before handoff when the task is specifically about memory, ownership, lifetime, shutdown, or release hardening
264295
- Run `bash scripts/check-release-hygiene.sh` before handoff when the task touches publication-facing files or repository metadata
265-
- Remember that the release-hygiene script now also enforces test commentary coverage, so changes to test structure or helper scripts may need both test updates and commentary updates
296+
- Remember that the release-hygiene script now also enforces test commentary
297+
coverage and rejects tracked `.bin` / `.gguf` artifacts, so release-facing or
298+
helper-script changes may need both commentary updates and binary-artifact policy checks
266299
- If `QT_QPA_PLATFORM=offscreen "$BUILD_DIR/mutterkey" diagnose 1` fails in a headless environment after model loading or during KDE/session-dependent startup, note that limitation explicitly rather than assuming the runtime seam or docs-only change regressed behavior
267300
- A headless `diagnose 1` failure after whisper model loading still does not necessarily indicate a streaming-runtime regression; separate runtime-contract changes from KDE/session or headless-environment limits
268301
- Do not leave generated artifacts in the repository tree at the end of the task

CMakeLists.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,16 @@ set(MUTTERKEY_CORE_SOURCES
4747
src/transcription/transcriptionengine.h
4848
src/transcription/audiochunker.cpp
4949
src/transcription/audiochunker.h
50+
src/transcription/modelcatalog.cpp
51+
src/transcription/modelcatalog.h
52+
src/transcription/modelpackage.cpp
53+
src/transcription/modelpackage.h
54+
src/transcription/modelvalidator.cpp
55+
src/transcription/modelvalidator.h
56+
src/transcription/rawwhisperimporter.cpp
57+
src/transcription/rawwhisperimporter.h
58+
src/transcription/rawwhisperprobe.cpp
59+
src/transcription/rawwhisperprobe.h
5060
src/transcription/transcriptassembler.cpp
5161
src/transcription/transcriptassembler.h
5262
src/transcription/transcriptioncompat.cpp

README.md

Lines changed: 39 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Build requirements:
6868

6969
Runtime requirements:
7070

71-
1. a local Whisper model file
71+
1. a local Mutterkey model package, or a raw Whisper `.bin` file for migration compatibility
7272
2. a config file at `~/.config/mutterkey/config.json` or a custom `--config` path
7373

7474
Optional developer tooling:
@@ -81,9 +81,9 @@ Optional developer tooling:
8181
- `valgrind`
8282
- `libc6-dbg` on Debian-family systems so Valgrind Memcheck can start cleanly
8383

84-
The repository vendors `whisper.cpp`, but it does not bundle Whisper model
85-
files. Any model file you download separately may be subject to its own license
86-
or usage terms.
84+
The repository vendors `whisper.cpp`, but it does not bundle speech model
85+
artifacts. Any model file you download separately may be subject to its own
86+
license or usage terms.
8787

8888
If CMake fails before compilation starts, the most common cause is missing Qt 6
8989
development packages for `Core`, `Gui`, `Multimedia`, or KDE Frameworks
@@ -165,9 +165,30 @@ Notes:
165165
- `MUTTERKEY_ENABLE_WHISPER_BLAS=ON` improves CPU inference speed rather than enabling GPU execution
166166
- these options are forwarded to the vendored `whisper.cpp` / `ggml` build and install any resulting backend libraries alongside Mutterkey
167167

168-
### 2. Put a Whisper model on disk
168+
### 2. Put a model on disk
169169

170-
Example location:
170+
Preferred Phase 4 path:
171+
172+
1. place a raw Whisper `.bin` file somewhere temporary
173+
2. import it into a native Mutterkey package:
174+
175+
```bash
176+
~/.local/bin/mutterkey model import /path/to/ggml-base.en.bin
177+
```
178+
179+
This creates a package directory under:
180+
181+
```text
182+
~/.local/share/mutterkey/models/<package-id>/
183+
```
184+
185+
You can inspect a package or a legacy raw file with:
186+
187+
```bash
188+
~/.local/bin/mutterkey model inspect /path/to/ggml-base.en.bin
189+
```
190+
191+
Legacy compatibility path:
171192

172193
```text
173194
~/.local/share/mutterkey/models/ggml-base.en.bin
@@ -176,7 +197,7 @@ Example location:
176197
### 3. Create the config file
177198

178199
```bash
179-
mutterkey config init --model-path ~/.local/share/mutterkey/models/ggml-base.en.bin
200+
mutterkey config init --model-path ~/.local/share/mutterkey/models/<package-id>
180201
```
181202

182203
`mutterkey config init` writes the Linux config file to:
@@ -213,7 +234,7 @@ Minimal example:
213234
"sequence": "F8"
214235
},
215236
"transcriber": {
216-
"model_path": "/absolute/path/to/ggml-base.en.bin",
237+
"model_path": "/absolute/path/to/mutterkey-model-package",
217238
"language": "en",
218239
"translate": false,
219240
"threads": 0,
@@ -228,6 +249,7 @@ Config notes:
228249

229250
- `transcriber.threads: 0` means auto-detect based on the local machine
230251
- `transcriber.language` accepts a Whisper language code such as `en` or `fi`, or `auto` for language detection
252+
- `transcriber.model_path` may point to a native Mutterkey package directory, a `model.json` manifest, or a legacy raw Whisper `.bin` file
231253
- invalid numeric values fall back to safe defaults and log a warning
232254
- invalid `transcriber.language` values fall back to the default and log a warning
233255
- empty `shortcut.sequence` or `transcriber.model_path` values fall back to defaults and log a warning
@@ -306,7 +328,8 @@ installed setup looks like:
306328
Useful config commands:
307329

308330
```bash
309-
~/.local/bin/mutterkey config init --model-path ~/.local/share/mutterkey/models/ggml-base.en.bin
331+
~/.local/bin/mutterkey config init --model-path ~/.local/share/mutterkey/models/<package-id>
332+
~/.local/bin/mutterkey model inspect ~/.local/share/mutterkey/models/<package-id>
310333
~/.local/bin/mutterkey config set shortcut.sequence Meta+F8
311334
~/.local/bin/mutterkey config set transcriber.language fi
312335
```
@@ -329,10 +352,9 @@ journalctl --user -u mutterkey.service -f
329352

330353
Common failures:
331354

332-
`Embedded Whisper model not found: ...`
355+
`Model artifact not found: ...`
333356

334-
- the embedded backend is active
335-
- the configured model path does not exist
357+
- the configured package path, manifest path, or raw compatibility artifact does not exist
336358
- fix `transcriber.model_path`
337359

338360
`Recorder returned no audio`
@@ -375,6 +397,11 @@ Repository layout:
375397
- `src/transcription/audiochunker.*`: fixed-size normalized streaming chunk generation
376398
- `src/transcription/transcriptassembler.*`: final transcript assembly from streaming events
377399
- `src/transcription/transcriptioncompat.*`: compatibility wrapper from one-shot recordings to the streaming runtime path
400+
- `src/transcription/modelpackage.*`: product-owned manifest and validated package value types
401+
- `src/transcription/modelvalidator.*`: package integrity and compatibility validation
402+
- `src/transcription/modelcatalog.*`: model artifact inspection and resolution
403+
- `src/transcription/rawwhisperprobe.*`: lightweight raw Whisper header inspection
404+
- `src/transcription/rawwhisperimporter.*`: migration path from raw Whisper files to native packages
378405
- `src/transcription/whispercpptranscriber.*`: embedded Whisper integration behind the app-owned runtime seam
379406
- `src/transcription/transcriptionworker.*`: worker object on a dedicated `QThread`
380407
- `src/transcription/transcriptiontypes.h`: runtime diagnostics, normalized-audio, chunk, event, and error value types

RELEASE_CHECKLIST.md

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@ bash scripts/check-release-hygiene.sh
2121
- Review [THIRD_PARTY_NOTICES.md](THIRD_PARTY_NOTICES.md) for accuracy.
2222
- Review [third_party/whisper.cpp.UPSTREAM.md](third_party/whisper.cpp.UPSTREAM.md)
2323
and make sure the recorded upstream version/ref is current.
24-
- Confirm no Whisper model binaries or other large third-party artifacts are
25-
tracked in the repository.
24+
- Confirm no speech model binaries, native model packages, or other large
25+
third-party artifacts are tracked in the repository source tree.
26+
- If the release is intended to ship a model, treat that as a release-bundle or
27+
release-asset decision, not a Git-tracked source-tree decision.
2628

2729
## Build And Test
2830

@@ -150,11 +152,58 @@ cmake --install "$BUILD_DIR" --prefix "$INSTALL_DIR"
150152
install rules ship the runtime libraries but intentionally clear vendored
151153
`PUBLIC_HEADER` metadata to avoid upstream header-install warnings.
152154

155+
## Model Packaging For Releases
156+
157+
- Decide explicitly whether the release ships:
158+
- no model at all
159+
- a separate downloadable model package
160+
- a release bundle that includes a model package alongside the binaries
161+
- Keep model artifacts out of Git history even when the release ships one.
162+
The repository source tree should stay free of raw Whisper `.bin` files and
163+
native Mutterkey model packages.
164+
- If you need a model for the release, start from a raw whisper.cpp-compatible
165+
`ggml` `.bin` file and import it into a native Mutterkey package:
166+
167+
```bash
168+
MODEL_SRC="/path/to/ggml-base.en.bin"
169+
MODEL_OUT="$(mktemp -d /tmp/mutterkey-release-model-XXXXXX)/base-en"
170+
"$BUILD_DIR/mutterkey" model import "$MODEL_SRC" --output "$MODEL_OUT"
171+
```
172+
173+
- Inspect the resulting package before shipping it:
174+
175+
```bash
176+
"$BUILD_DIR/mutterkey" model inspect "$MODEL_OUT"
177+
```
178+
179+
- Confirm the package contains at least:
180+
- `model.json`
181+
- `assets/model.bin`
182+
- Review the inspected metadata and make sure the release notes record:
183+
- model family / size
184+
- language profile
185+
- source provenance
186+
- any separate model license or usage terms
187+
- If the release bundle is meant to include a model, add the package directory
188+
to the release artifact outside the Git source tree. Preferred locations are:
189+
- a separate downloadable release asset such as `mutterkey-model-base-en.tar.zst`
190+
- a bundled runtime tree under `share/mutterkey/models/<package-id>/`
191+
- If you include a model in an installable release bundle, validate the final
192+
staged tree after copying the package in:
193+
- the package directory is intact
194+
- `mutterkey model inspect <bundled-package-path>` succeeds
195+
- release notes and packaging docs tell users where `transcriber.model_path`
196+
should point
197+
- Do not commit the raw `.bin` source file, the generated native package, or
198+
any unpacked release-bundle copy back into the repository.
199+
153200
## Documentation And User Flow
154201

155202
- Review [README.md](README.md) for consistency with current behavior.
156203
- Review `docs/mainpage.md` and `docs/Doxyfile.in` if the release touched
157204
repo-owned API docs or docs/CI wiring.
205+
- Confirm the docs describe native Mutterkey model packages as the canonical
206+
artifact and raw Whisper `.bin` files as migration compatibility only.
158207
- Confirm the documented recommended path is still the `systemd --user` service.
159208
- Confirm [contrib/mutterkey.service](contrib/mutterkey.service) matches the
160209
recommended installed-binary setup.

0 commit comments

Comments
 (0)