I was looking at Spanner's query insights and I see the query using the most CPU time in the last [whatever period of time you choose] is:
SELECT COUNT(*)
FROM bsos
WHERE fxa_uid = @fxa_uid
AND fxa_kid = @fxa_kid
AND collection_id = @collection_id
GROUP BY fxa_uid
It uses about 18% of total CPU, according to query insights (attached image).
I looked for that query in the code and only found it here:
|
"SELECT COUNT(*) |
|
FROM bsos |
|
WHERE fxa_uid = @fxa_uid |
|
AND fxa_kid = @fxa_kid |
|
AND collection_id = @collection_id |
|
GROUP BY fxa_uid" |
It looks to me like it runs if quota is not enabled (and we just talked about how that was a feature we never turned on).
And ALSO, it looks like the results of that query are NEVER USED if quota is not turned on (see:
|
} else { |
|
"UPDATE user_collections |
|
SET modified = @modified |
|
WHERE fxa_uid = @fxa_uid |
|
AND fxa_kid = @fxa_kid |
|
AND collection_id = @collection_id" |
|
} |
)So it seems to me like we could save a lot of (Spanner CPU) time by removing that query altogether.
After a slack convo with @jr Conlin he pointed out that if there are no results, then other things happen, so this query is important, but only in a “are there zero rows or more” sense. So I propose changing the query to something along the lines of:
SELECT 1
FROM bsos
WHERE fxa_uid = @fxa_uid
AND fxa_kid = @fxa_kid
AND collection_id = @collection_id
LIMIT 1
Returns 1 or [no results]. And shouldn’t have to scan a bunch of rows
┆Issue is synchronized with this Jira Task
I was looking at Spanner's query insights and I see the query using the most CPU time in the last [whatever period of time you choose] is:
It uses about 18% of total CPU, according to query insights (attached image).
I looked for that query in the code and only found it here:
syncstorage-rs/syncstorage/src/db/spanner/models.rs
Lines 870 to 875 in a52900f
And ALSO, it looks like the results of that query are NEVER USED if quota is not turned on (see:
syncstorage-rs/syncstorage/src/db/spanner/models.rs
Lines 920 to 926 in a52900f
After a slack convo with @jr Conlin he pointed out that if there are no results, then other things happen, so this query is important, but only in a “are there zero rows or more” sense. So I propose changing the query to something along the lines of:
Returns 1 or [no results]. And shouldn’t have to scan a bunch of rows
┆Issue is synchronized with this Jira Task