Skip to content

Commit 5e3eb1e

Browse files
dylanjeffersclaude
andauthored
perf(for-you): cap my_saved_artists to 200 most-recent (#805)
## Summary The `similar_artists` CTE in `/v1/users/{id}/feed/for-you` self-joins `saves` against the `my_saved_artists` set. For long-tenure power users (e.g. an account with thousands of saved artists) the join blows up and the planner times out — observed in prod: the request hangs >60s and returns nothing. Cap `my_saved_artists` to the **200 most-recently saved artists**. Recency is the right axis to cut on: - Old saves are weak signal of current taste anyway - 200 artists still gives the collaborative-filter step plenty of surface to find similar-artist candidates - Bounds the saves self-join cost to predictable size regardless of user tenure ## Change ```sql -- Before my_saved_artists AS ( SELECT DISTINCT t.owner_id AS artist_id FROM my_saved_tracks mst JOIN tracks t ON t.track_id = mst.track_id ), -- After my_saved_artists AS ( SELECT t.owner_id AS artist_id, MAX(s.created_at) AS last_saved_at FROM saves s JOIN tracks t ON t.track_id = s.save_item_id WHERE s.user_id = @userid AND s.save_type = 'track' AND s.is_current = true AND s.is_delete = false GROUP BY t.owner_id ORDER BY last_saved_at DESC LIMIT 200 ), ``` CTE output shape (column `artist_id`) is preserved, so the downstream `IN (SELECT artist_id FROM my_saved_artists)` and `NOT IN (...)` consumers in `similar_artists` are unchanged. ## Test plan - ✅ All 9 existing `TestV1FeedForYou_*` tests pass locally against the test DB (fixtures have <200 saved artists, so the cap doesn't kick in and behavior is identical to before for the test cases). - ✅ `go build ./api/...` / `go vet ./api/...` clean. - After deploy: re-curl `/v1/users/eYZmn/feed/for-you?user_id=eYZmn&limit=5` (notjulian, deep save history) — should return 200 in <2s instead of hanging. ## Follow-ups not in this PR - Worth adding `EXPLAIN ANALYZE` instrumentation behind a flag to catch this class of regression earlier next time. - The `my_artist_affinity` CTE also unions saves+reposts+plays for the user unbounded — likely the next-slowest piece for power users. Worth a similar bound in a follow-up if this fix doesn't fully resolve the timeout. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e8194b7 commit 5e3eb1e

1 file changed

Lines changed: 16 additions & 3 deletions

File tree

api/v1_users_feed_for_you.go

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,10 +115,23 @@ func (app *ApiServer) v1UsersFeedForYou(c *fiber.Ctx) error {
115115
AND is_current = true
116116
AND is_delete = false
117117
),
118+
-- The set of artists I save anchors the collaborative-filter join
119+
-- below. For long-tenure power users with thousands of saved artists
120+
-- the saves self-join explodes and the planner times out, so we cap
121+
-- to the 200 most recently saved artists. Recency is the right axis:
122+
-- old saves are weaker signal of current taste anyway, and a 200-artist
123+
-- anchor still gives the similar-artists CTE enough to work with.
118124
my_saved_artists AS (
119-
SELECT DISTINCT t.owner_id AS artist_id
120-
FROM my_saved_tracks mst
121-
JOIN tracks t ON t.track_id = mst.track_id
125+
SELECT t.owner_id AS artist_id, MAX(s.created_at) AS last_saved_at
126+
FROM saves s
127+
JOIN tracks t ON t.track_id = s.save_item_id
128+
WHERE s.user_id = @userId
129+
AND s.save_type = 'track'
130+
AND s.is_current = true
131+
AND s.is_delete = false
132+
GROUP BY t.owner_id
133+
ORDER BY last_saved_at DESC
134+
LIMIT 200
122135
),
123136
-- 1-hop collaborative filter on the saves graph: artists saved by
124137
-- users who *also* save my saved-artists, but who I haven't saved myself.

0 commit comments

Comments
 (0)