PoC: Cache complexity#5638
Conversation
|
A few caveats: the code isn't pretty, relying on the caller to sort is awkward, and even delegating the SHA256 hashing might be questionable. All of this could potentially happen inside the gem itself, exposing only the final fingerprint. But again, I want to validate the idea/principle of the problem and solution before moving forward with these technical decisions. |
|
|
||
| if max_possible_page_size.nil? | ||
| raise GraphQL::Error, "Can't calculate complexity for #{field_definition.path}, no `first:`, `last:`, `default_page_size`, `max_page_size` or `default_max_page_size`" | ||
| end |
There was a problem hiding this comment.
This entire algorithm was copied over from GraphQL::Schema::Field#calculate_complexity. It is very likely that we want to extract this max_possible_page_size to a method instead of duplicating it.
| # The page size affects the complexity. | ||
| if field_definition.connection? | ||
| arguments = visitor.arguments_for(node, field_definition) | ||
| return unless arguments.is_a?(GraphQL::Execution::Interpreter::Arguments) # FIXME: Are errors by definition not cacheable? |
There was a problem hiding this comment.
It is possible that what we are looking for here is to flag as not cacheable since we can't access the actual arguments:
unless arguments.is_a?(GraphQL::Execution::Interpreter::Arguments)
@uncacheable << visitor.query
@complexity_digest.delete(visitor.query)
return
end| raise GraphQL::Error, "Can't calculate complexity for #{field_definition.path}, no `first:`, `last:`, `default_page_size`, `max_page_size` or `default_max_page_size`" | ||
| end | ||
|
|
||
| complexity_digest << max_possible_page_size |
There was a problem hiding this comment.
For a field (assuming complexity 1):
["Person.name", 1]For a connection (assuming complexity 1 and first/last as 10):
["People", 1, 10]|
We're also trying to cache in a way where the query string itself isn't relevant — what matters are the actual fields being selected. Some queries have the exact same complexity even though their query strings differ. For example: query MyQuery($search: String, $limit: Int!) {
people(search: $search, first: $limit) {
id
firstName
lastName
}
}has the exact same complexity as the following, even though the query string is completely different: query AnotherQuery($s: String!, $x: Int!) {
myAlias: people(first: $x, search: $s) {
lastName
id
firstName
}
}In the end, we're selecting the same fields with the same arguments. Furthermore, the So whether I send The PoC takes this into account. |
In the first round (#5631) we dropped from ~806ms to ~176ms, but that's still too much processing time. Caching the complexity calculation can bring that 176ms down to ~1ms on a cache hit. This PR is a PoC for that approach.
The slowest piece is calling
max_possible_complexity, not the visitor itself. Further improvements could be made with that in mind, but I didn't go down that path since caching largely solves the problem without requiring a refactor of the complexity calculation. We also havefield_complexity, but it can't be used here because it's called duringmax_possible_complexity— and what we actually want is to avoid callingmax_possible_complexityaltogether.The digesting principle
The approach is simple:
complexity_for).Open questions
def resultdoesn't have access to queries, so I haven't figured out how to make this analyzer work with multiplexed queries. (We don't multiplex very often on our end, so this hasn't been a blocker for the PoC.)visitor.arguments_formight raise. What should we do in that case — skip the cache attempt and fall through to the normal calculation?Query#complexity_fingerprint: We initially discussed in Cache complexity analyzer in-memory/redis #5632 the idea of adding aQuery#complexity_fingerprintmethod, but that would require firing a dedicated analyzer, which feels out of place compared to how the other fingerprints are designed:The upside would be reusability for other purposes (though honestly, I can't think of any concrete ones right now).
analyze_queryshould be fast, but this still feels a bit redundant given that we already have the complexity analyzer.Feedback
How do you feel about this PoC? There are no specs yet — I'm looking for initial feedback before moving forward and handling all the edge cases.