IFC-2536: Migrate InfrahubClient to graphql query for get and filters methods#9126
IFC-2536: Migrate InfrahubClient to graphql query for get and filters methods#9126solababs wants to merge 35 commits into
Conversation
…e-prefect-queries-phase-3-ifc-2536
| status: str | ||
|
|
||
|
|
||
| class GeneratorInstanceQuery(BaseModel): |
There was a problem hiding this comment.
are you using the Pydantic model generator from the SDK or have you generated this class using Claude directly ?
There was a problem hiding this comment.
This is generated using claude
There was a problem hiding this comment.
What we can do with the Pydantic model generator (https://docs.infrahub.app/infrahubctl/infrahubctl-graphql) is to have a reference to the GraphQL query in text and then by having it there we can run checks in the CI to validate that the query we've written is up to date based on the current core GraphQL schema shipped with Infrahub, i.e. the one stored here: https://github.com/opsmill/infrahub/blob/infrahub-v1.9.3/schema/schema.graphql. Basically together with a GraphQL query and the GraphQL schema we can generate Python bindings and validate that they are up to date in the CI. One of the thoughts around this JPD card was that we'd use this approach but it seems like this information was lost in the creation of that card. So the primary goal of the JPD card is to improve the performance by avoiding over fetching data, the secondary goal was to have queries defined as text and be able to both validate that they are up to date and that we have generated classes that match the queries. The intention is to start using this approach in multiple places within the backend but also get started in the frontend. This approach would not work for the changes in #9123 though since in that work we're directly querying for user defined nodes that doesn't exist in the core schema. But for the work in this PR it should be possible.
| def render_query(self) -> str: | ||
| filters: dict[str, str | list[str]] = ( | ||
| {"ids": [self.transform_id]} if _is_uuid(self.transform_id) else {"name__value": self.transform_id} | ||
| ) |
There was a problem hiding this comment.
We will need a small change with regards to how this query would be executed. If you check this issue #8945. In short there's a caching feature of the backend that would cache the rendering and processing of the GraphQL queries, i.e. to convert them into a GraphQL document. But this only happens if the raw text of the query looks exactly the same. As such we don't want to have self.transform_id be hardcoded into the body of the query but instead we'd need to send it as variables in the SDK call to await client.execute_graphql()
| if not transform_attribute.transform: | ||
| continue | ||
| transform_query = ComputedAttributeTransformQuery(transform_id=transform_attribute.transform) | ||
| transform_response = await client.execute_graphql(query=transform_query.render_query(), branch_name=branch_name) |
There was a problem hiding this comment.
I think a call like this should be fine with regards to pagination as we'd only expect to get a single result back, i.e. regarding the comments on pagination on the other PR.
…e-prefect-queries-phase-3-ifc-2536
| uv run ruff format --check --diff . \ | ||
| --exclude python_sdk \ | ||
| --exclude backend/infrahub/computed_attribute/graphql_queries/computed_attribute_fetch_transform.py \ | ||
| --exclude backend/infrahub/generators/graphql_queries/generator_instance_fetch.py |
There was a problem hiding this comment.
These exclude lines are a bit problematic for anyone just running "ruff check ." which is a lot faster than using the invoke commands. I'd wonder what's failing in these generated files. Might it just be a matter of us having to run ruff check [filenames] --fix and ruff format [filenames]? I.e. when we generate them we should be able to format them correctly provided that there are no other violations in these files. For reference:
https://github.com/opsmill/infrahub/blob/infrahub-v1.9.3/tasks/backend.py#L260-L261
There was a problem hiding this comment.
Note that we have a file called backend/infrahub/computed_attribute/graphql_queries.py in #9123 which would conflict with the module directory here backend/infrahub/computed_attribute/graphql_queries/. So could be that this PR should be rebased and reworked once the other one lives in develop
Actually this is probably the merge conflict for this PR.
…new query (#9123) * IFC-2536: Migrate hfid and computed_attribute to use new query * use pydantic basemodel * add pagination * rename and move files * update fetch_all logic * remove optimization tests * update tasks
| } | ||
| } | ||
| } | ||
| """ |
There was a problem hiding this comment.
I think we'd want the same approach for this as you have a bit further down where we have generator_instance_fetch.gql as a separate file. It make it clearer that we'd expect to the bindings for this be generated from the actual file. As it is inline here it looks like it's just part of the code.
Also we can probably combine the two queries TRANSFORM_QUERY_BY_NAME and TRANSFORM_QUERY_BY_IDS into one, then we just have the queries accept both of the parameters and it would use whatever we send in with with the variables.
| def render_query(self) -> str: | ||
| if _is_uuid(self.transform_id): | ||
| return TRANSFORM_QUERY_BY_IDS | ||
| return TRANSFORM_QUERY_BY_NAME |
There was a problem hiding this comment.
Combining this into one query as per above would remove the if-statement here and we'd only have the get_variables() one.
| ) | ||
| if instances: | ||
| instance = instances[0] | ||
| response = await client.execute_graphql( |
There was a problem hiding this comment.
I think that we might want to have pagination for this query if there's a high number of generator instances.
There was a problem hiding this comment.
this request is for one instance, uses instance_nodes[0].id below
Why
Prefect task files were making overfetching calls to
client.all(),client.filters(), andclient.get()that returned every attribute, relationship, and metadata field — but the tasks only consumedid(or a handful of fields). This caused unnecessary Neo4j reads, large GraphQL payloads, and slower task execution on every background worker cycle.This PR replaces those calls with targeted
execute_graphql()calls backed by purpose-built query model classes that request only the fields each task actually uses.How to test
Benchmark expected output (≥30% time reduction, ≥50% size reduction per task):
display_labelshfidgitImpact & rollout
Checklist
uv run towncrier create ...)