You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(agent): add performance fixes and artist variety controls
Add max_tokens limit (1024) to prevent long response generation
eagerly spread same-artist tracks apart using greedy shuffle algorithm
Update system prompt with artist variety rules:
- Default to 1 track per artist for maximum variety
- Only add 2nd track when unique artists exhausted
- Priority: N tracks from N artists > N tracks from fewer artists
Python script improvements:
- Add _shuffle_spread_artists() for playlist ordering
- Add AGENT_MIN_PLAYLIST_TRACKS env var (default: 12)
- Add num_predict: 2048 to prevent response truncation
- Update system prompt with parallel tool calling emphasis
Documentation:
- Update docs/agent.md with new features
- Update backlog task-277 with implementation notes
References: scripts/agent.py as working reference implementation
1.**Parallel tool execution not working**: `with_tool_concurrency(8)` was configured but model wasn't calling multiple tools per turn
234
+
2.**Token generation too long**: Final LLM turn took 63 seconds generating ~57 IDs with multiple recounts
235
+
236
+
### Changes Made
237
+
238
+
**mod.rs - Agent builder:**
239
+
- Added `.max_tokens(1024)` to cap response length (prevents endless recounting)
240
+
241
+
**prompt.rs - System prompt:**
242
+
- Enhanced RULES section with explicit parallel tool calling instructions:
243
+
- "Call MULTIPLE tools PER TURN in PARALLEL"
244
+
- "When planning your strategy, call ALL independent tools at once"
245
+
246
+
### Why These Fixes Work
247
+
248
+
1.**max_tokens(1024)**: Limits the LLM to ~1024 tokens for the final response. The playlist format (name + 25 track IDs) needs only ~200-500 tokens. This prevents the model from generating excessive intermediate reasoning (listing 57 IDs, recounting, selecting, etc.) that was causing the 63-second response time.
249
+
250
+
2.**Explicit parallel instructions**: The previous prompt mentioned "Call multiple tools per turn" but wasn't emphatic enough. The new language uses ALL CAPS for key concepts and provides concrete examples ("get_similar_artists + search_library + get_track_tags together") to guide the model toward parallel tool calling.
251
+
252
+
### Test Results
253
+
- All 757 tests pass with `cargo nextest run --workspace --features agent`
254
+
- Both `cargo check --features agent` and `cargo check` compile cleanly
0 commit comments