You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -150,163 +151,69 @@ Cfg-gate fix (fd8593a): After merging main (which had reverted the lastfm discov
150
151
- Alpine.js store for agent state, basecoat/Tailwind for UI components
151
152
152
153
Must satisfy AC #1 (prompt input), AC #2 (LLM interpretation), AC #3 (8 tools visible), AC #4 (local tracks only), AC #5 (graceful degradation UX).
154
+
155
+
Documented next-step options only; not started in this commit:
156
+
1. Continue Python-only validation before any Rust port.
157
+
2. Prototype deterministic candidate aggregation/finalization in `scripts/agent.py` so the LLM does discovery/routing while Python enforces playlist policy.
158
+
3. Candidate guard rails to evaluate: track count bounds, one-track-per-artist default, seed-artist cap for artist-based prompts, scoring by supporting tool sources, local genre, Last.fm similarity/tag evidence, and recency/history signals.
159
+
4. Once Python business logic is stable across mood, artist-based, and mixed-history prompts, port the proven rules to Rust.
13 eval tests in `evals.rs` using wiremock mock Ollama server.
187
-
Categories: tool execution (3), output format (6), degradation (5).
188
-
Refactored `build_agent()` and `check_ollama()` to accept `base_url: &str` for test injection.
189
-
Added `wiremock = "0.6"` to dev-dependencies.
190
-
191
-
Cfg-gate fix: After merging main (which had reverted lastfm discovery methods), restored them and gated all discovery types, methods, and tests behind `#[cfg(feature = "agent")]`.
165
+
## 2026-04-02: Python to Rust Migration Complete
192
166
193
-
## Key Design Decisions
194
-
195
-
- Feature-flagged (`agent`) — zero overhead when disabled
196
-
- Uses llama3.2:1b via Ollama — small enough for local inference
197
-
- 8 tools covering local library + Last.fm APIs with graceful degradation
198
-
- Heuristic evals (no LLM judge) for deterministic CI
199
-
- Thin wrapper Tauri commands in lib.rs with cfg pairs (agent/not-agent) to keep generate_handler! unconditional
1.**Parallel tool execution not working**: `with_tool_concurrency(8)` was configured but model wasn't calling multiple tools per turn
234
-
2.**Token generation too long**: Final LLM turn took 63 seconds generating ~57 IDs with multiple recounts
167
+
Successfully migrated the Python reference implementation to Rust:
235
168
236
169
### Changes Made
237
170
238
-
**mod.rs - Agent builder:**
239
-
- Added `.max_tokens(1024)` to cap response length (prevents endless recounting)
240
-
241
-
**prompt.rs - System prompt:**
242
-
- Enhanced RULES section with explicit parallel tool calling instructions:
243
-
- "Call MULTIPLE tools PER TURN in PARALLEL"
244
-
- "When planning your strategy, call ALL independent tools at once"
245
-
246
-
### Why These Fixes Work
247
-
248
-
1.**max_tokens(1024)**: Limits the LLM to ~1024 tokens for the final response. The playlist format (name + 25 track IDs) needs only ~200-500 tokens. This prevents the model from generating excessive intermediate reasoning (listing 57 IDs, recounting, selecting, etc.) that was causing the 63-second response time.
249
-
250
-
2.**Explicit parallel instructions**: The previous prompt mentioned "Call multiple tools per turn" but wasn't emphatic enough. The new language uses ALL CAPS for key concepts and provides concrete examples ("get_similar_artists + search_library + get_track_tags together") to guide the model toward parallel tool calling.
251
-
252
-
### Test Results
253
-
- All 757 tests pass with `cargo nextest run --workspace --features agent`
254
-
- Both `cargo check --features agent` and `cargo check` compile cleanly
**prompt.rs** - Updated system prompt with enhanced artist variety rules:
172
+
- Added "DEFAULT to 1 track per artist for MAXIMUM variety"
173
+
- Added "PRIORITY: 20 tracks from 20 different artists > 20 tracks from 10 artists with 2 each"
174
+
- Added "A playlist should feel like a JOURNEY through different artists, not an artist deep dive"
175
+
- Added "When compiling: pick the BEST track from each artist, then move on"
176
+
- Converted to raw string literal (`r#"..."#`) for cleaner syntax
177
+
- Added CRITICAL section warning against common mistakes (keyword searches)
178
+
179
+
**mod.rs** - Added artist-spreading shuffle algorithm and duplicate name handling:
180
+
-`shuffle_spread_artists()` function uses greedy approach to spread same-artist tracks apart
181
+
-`generate_unique_playlist_name()` automatically appends (2), (3), etc. for duplicate names
182
+
- Groups tracks by artist, shuffles each group locally, then greedily selects from the artist with most remaining tracks (excluding the last-played artist when possible)
183
+
- Updated `agent_generate_playlist()` to:
184
+
1. Fetch track details (id + artist) for parsed track IDs
185
+
2. Shuffle using `shuffle_spread_artists()` before adding to playlist
186
+
3. Handle duplicate playlist names by appending numbers
187
+
4. Log validation count and shuffling info
188
+
189
+
### Key Features Ported from Python
190
+
1.**Artist variety priority** - System prompt enforces 1 track per artist by default
191
+
2.**Shuffled output** - Greedy algorithm spreads same-artist tracks apart in final playlist
192
+
3.**Track validation** - Verifies all track IDs exist in library before shuffling
193
+
4.**Unique playlist names** - Automatically appends (2), (3), etc. if name exists
194
+
5.**Configurable limits** - Min/max tracks dynamically calculated from MAX_PLAYLIST_TRACKS
195
+
196
+
### Test Coverage
197
+
- 5 unit tests for `shuffle_spread_artists()`:
198
+
- Empty input returns empty
199
+
- Spreads same-artist tracks apart (no adjacent duplicates)
200
+
- Preserves all tracks in output
201
+
- Single track works correctly
202
+
- Unique artists handled properly
203
+
- 2 unit tests for `generate_unique_playlist_name()`:
204
+
- Returns base name when available
205
+
- Appends number when name exists
206
+
207
+
Total: 764 tests pass (762 existing + 2 new)
208
+
209
+
2026-04-02: Added `PROMPT` override support to `scripts/agent.py` via python-decouple so prompt experiments can run without changing the default built-in system prompt. `_build_system_prompt()` now keeps the existing prompt for normal runs and uses the env-provided override when present, still interpolating `{min_tracks}` and `{max_tracks}`. Console output now reports whether the run used the default prompt or the override.
210
+
211
+
2026-04-02: Prompt experiment results in Python:
212
+
- Mood request (`make me a chill playlist`): default prompt used 4 turns; override prompt used 2 turns with valid library-only output.
213
+
- Artist-based request (`make me a playlist like Radiohead`): default prompt used 3 turns; override prompt used 2 turns, but prompt-only steering still leaked multiple seed-artist tracks.
214
+
- Mixed-history request (`make me a chill playlist like what I listened to last Friday`): default prompt used 4 turns; override prompt used 2 turns and treated weak recent-history results as a weighting signal instead of spending extra turns matching them exactly.
215
+
216
+
2026-04-02: Conclusion from prompt experiments: tighter stop rules materially reduce turn count, but prompt-only business-rule enforcement remains unreliable for cases like seed-artist caps. The most promising direction is to keep LLM-driven discovery/tool routing while moving playlist compilation and policy enforcement into deterministic business logic that scores and filters candidates using empirical evidence (tool source overlap, local genre, Last.fm tags/similarity, last played date, play history, and explicit duplicate-artist caps).
0 commit comments