Skip to content

Possibly useless Spanner work  #2250

@data-sync-user

Description

@data-sync-user

I was looking at Spanner's query insights and I see the query using the most CPU time in the last [whatever period of time you choose] is:

SELECT COUNT(*)
            FROM bsos
           WHERE fxa_uid = @fxa_uid
             AND fxa_kid = @fxa_kid
             AND collection_id = @collection_id
           GROUP BY fxa_uid

It uses about 18% of total CPU, according to query insights (attached image).
I looked for that query in the code and only found it here: 

"SELECT COUNT(*)
FROM bsos
WHERE fxa_uid = @fxa_uid
AND fxa_kid = @fxa_kid
AND collection_id = @collection_id
GROUP BY fxa_uid"
It looks to me like it runs if quota is not enabled (and we just talked about how that was a feature we never turned on).

And ALSO, it looks like the results of that query are NEVER USED if quota is not turned on (see: 

} else {
"UPDATE user_collections
SET modified = @modified
WHERE fxa_uid = @fxa_uid
AND fxa_kid = @fxa_kid
AND collection_id = @collection_id"
}
)So it seems to me like we could save a lot of (Spanner CPU) time by removing that query altogether.

After a slack convo with @jr Conlin he pointed out that if there are no results, then other things happen, so this query is important, but only in a “are there zero rows or more” sense. So I propose changing the query to something along the lines of:

SELECT 1
FROM bsos
WHERE fxa_uid = @fxa_uid
AND fxa_kid = @fxa_kid
AND collection_id = @collection_id
LIMIT 1

Returns 1 or [no results]. And shouldn’t have to scan a bunch of rows

┆Issue is synchronized with this Jira Task

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions