Skip to content

Commit 276fdeb

Browse files
dylanjeffersclaude
andauthored
perf(for-you): bound my_artist_affinity and follow_set by recency (#806)
## Summary Builds on #805 (which capped \`my_saved_artists\`). After that fix the endpoint still times out on prod for power users — the remaining unbounded CTEs are scanning the user's full history every request: 1. **\`my_artist_affinity\`** unions saves + reposts + plays for the caller. \`plays\` is the biggest table by far — a heavy listener can have hundreds of thousands of rows, all scanned on every request. Cap each source to most recent N: **200 saves, 200 reposts, 500 plays**. 2. **\`follow_set\`** is every user the caller follows; for a power-user with thousands of follows this becomes a wide hash join against every recent-track upload. Cap to **500 most-recently followed**. Recency is the right axis on all three: old engagement is a weak signal of current taste, and the bounds match the magnitude of the hidden cost (plays >> saves ≈ reposts). ## Diff (CTEs) \`\`\`sql follow_set AS ( SELECT followee_user_id AS user_id FROM follows WHERE follower_user_id = @userid AND is_current AND NOT is_delete ORDER BY created_at DESC LIMIT 500 -- new ), my_artist_affinity AS ( SELECT t.owner_id, LN(1 + COUNT(*)) AS affinity FROM ( (SELECT save_item_id ... ORDER BY created_at DESC LIMIT 200) -- new UNION ALL (SELECT repost_item_id ... ORDER BY created_at DESC LIMIT 200) -- new UNION ALL (SELECT play_item_id ... ORDER BY created_at DESC LIMIT 500) -- new ) eng JOIN tracks t ON ... GROUP BY t.owner_id ), \`\`\` ## Test plan - ✅ All 9 \`TestV1FeedForYou_*\` tests pass locally. Fixtures have <200 saves/<200 reposts/<500 plays/<500 follows so the caps don't kick in and observable behavior is unchanged. - ✅ \`go build ./api/...\` / \`go vet ./api/...\` clean. - After deploy: \`/v1/users/eYZmn/feed/for-you?user_id=eYZmn&limit=5\` (notjulian, deep history) — currently times out at the Cloudflare upstream (>120s). Target: <2s. ## Follow-ups Parallel EXPLAIN ANALYZE work happening to verify the bound shifts the cost as expected and to flag any missing indexes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5e3eb1e commit 276fdeb

1 file changed

Lines changed: 21 additions & 3 deletions

File tree

api/v1_users_feed_for_you.go

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,19 @@ func (app *ApiServer) v1UsersFeedForYou(c *fiber.Ctx) error {
100100

101101
sql := `
102102
WITH
103+
-- Cap to the 500 most-recently-followed users. A power user with
104+
-- thousands of follows pulls a huge hash table here that then has to
105+
-- join against every recent track upload to find in-network candidates,
106+
-- so the planner can stall. Recent follows are a better signal of
107+
-- current taste anyway.
103108
follow_set AS (
104109
SELECT followee_user_id AS user_id
105110
FROM follows
106111
WHERE follower_user_id = @userId
107112
AND is_current = true
108113
AND is_delete = false
114+
ORDER BY created_at DESC
115+
LIMIT 500
109116
),
110117
my_saved_tracks AS (
111118
SELECT save_item_id AS track_id
@@ -157,20 +164,31 @@ func (app *ApiServer) v1UsersFeedForYou(c *fiber.Ctx) error {
157164
),
158165
-- Per-artist engagement strength (saves + reposts + plays of any of
159166
-- their tracks by me). Used for the social_boost multiplier.
167+
--
168+
-- Each sub-select is bounded by recency: a heavy listener can have
169+
-- hundreds of thousands of play rows, and the unbounded union forces
170+
-- a full scan of those rows on every request. Recent engagement is
171+
-- the right signal anyway — old listens say less about current taste.
160172
my_artist_affinity AS (
161173
SELECT t.owner_id AS artist_id,
162174
LN(1 + COUNT(*)) AS affinity
163175
FROM (
164-
SELECT save_item_id AS track_id FROM saves
176+
(SELECT save_item_id AS track_id FROM saves
165177
WHERE user_id = @userId AND save_type = 'track'
166178
AND is_current = true AND is_delete = false
179+
ORDER BY created_at DESC
180+
LIMIT 200)
167181
UNION ALL
168-
SELECT repost_item_id AS track_id FROM reposts
182+
(SELECT repost_item_id AS track_id FROM reposts
169183
WHERE user_id = @userId AND repost_type = 'track'
170184
AND is_current = true AND is_delete = false
185+
ORDER BY created_at DESC
186+
LIMIT 200)
171187
UNION ALL
172-
SELECT play_item_id AS track_id FROM plays
188+
(SELECT play_item_id AS track_id FROM plays
173189
WHERE user_id = @userId
190+
ORDER BY created_at DESC
191+
LIMIT 500)
174192
) eng
175193
JOIN tracks t ON t.track_id = eng.track_id
176194
GROUP BY t.owner_id

0 commit comments

Comments
 (0)