Skip to content

bug: validate_artifacts_generation crashes on artifacts with unresolvable object peers, blocking artifact generation for all group members #9188

@PhillSimonds

Description

@PhillSimonds

Component

  • API Server / GraphQL
  • Python SDK

Infrahub version

1.9.2 (also reproduces on 1.7.7; line is unchanged in between)

Current Behavior

validate_artifacts_generation in backend/infrahub/proposed_change/tasks.py crashes with ValueError("Node must have at least one identifier (ID or HFID) to query it.") whenever any single CoreArtifact returned by the initial filter has an object relationship whose payload is empty (no peer id / typename / hfid).

The relevant code (1.9.2 line numbers):

# proposed_change/tasks.py:766
existing_artifacts = await client.filters(
    kind=InfrahubKind.ARTIFACT,
    definition__ids=[model.artifact_definition.definition_id],
    include=["object"],
    branch=model.source_branch,
)
group = await fetch_artifact_definition_targets(...)

# tasks.py:777
artifacts_by_member = {}
for artifact in existing_artifacts:
    artifacts_by_member[artifact.object.peer.id] = artifact.id   # <-- raises here

artifact.object.peer invokes RelatedNode.get() (python_sdk/infrahub_sdk/node/related_node.py, ~line 257), which raises ValueError when the related node has neither a stored peer object nor (id, typename) nor hfid_str. With one such "orphan" row in existing_artifacts, the loop dies on first iteration, the function bails, and the dispatch loop further down (for relationship in group.members.peers: ... CheckArtifactCreate(...)) never runs — so no new artifacts are generated for any group member.

Net effect from a user's perspective: a Proposed Change that creates a new artifact target shows the artifact validator failed, the new target has 0 artifacts/0 files, and the rest of the pipeline is unaffected (so the failure is easy to mistake for "my new node is broken" rather than "an unrelated old artifact is poisoning the iteration").

Expected Behavior

The validator should tolerate an artifact whose object peer can't be resolved — either skip the row with a warning and continue, or filter such rows out at the GraphQL layer so they never reach the loop. The dispatch of CheckArtifactCreate for unaffected group members should not be blocked by the existence of one bad row.

A minimum sketch (1.9.2, around line 778):

artifacts_by_member = {}
for artifact in existing_artifacts:
    if not artifact.object or not getattr(artifact.object, "id", None):
        log.warning(
            f"Skipping orphan artifact {artifact.id} for definition "
            f"{model.artifact_definition.definition_name}: object peer unresolvable"
        )
        continue
    artifacts_by_member[artifact.object.peer.id] = artifact.id

The exact attribute check should be validated against RelatedNode internals — checking the stored id directly avoids triggering the same RelatedNode.get() raise that the loop body already triggers.

Steps to Reproduce

The reliable repro requires getting a CoreArtifact into a state where its object payload comes back empty from client.filters(..., include=["object"]). The most concrete path I've validated end-to-end is the historical "missing cascade" path:

  1. Use a schema where the CoreArtifactTarget-inheriting node had its artifacts cascade missing (true in 1.7.7; cascade was added by 1.8.5). With cascade missing:
    1. Create an artifact-target node, run a Proposed Change that generates an artifact for it.
    2. Delete the target node via the GraphQL delete mutation.
    3. The CoreArtifact is left in place — its object relationship now has nothing live to point to.
    4. Open another Proposed Change that touches the same artifact definition. validate_artifacts_generation raises the ValueError.
  2. On 1.8.5+ this exact path is plugged by the cascade — but the same crash has been observed on 1.9.2. The orphan must have been created via some path the cascade doesn't cover (candidates we haven't fully audited: branch-merge involving a deleted target, schema reload re-keying relationships, direct DB manipulation, migration). The validator's hardening is independently useful regardless of which path created the row.

To confirm the failure mode without the historical cascade gap, you can also force a manual orphan via Cypher (delete the target node directly, leaving the artifact's object edge dangling), but that's not a clean repro path I'd recommend for testing.

Additional Information

Source confirmation across versions — the buggy line is byte-identical between 1.7.7 and 1.9.2 (extracted with docker run --rm --entrypoint='' opsmill/infrahub-enterprise:<tag> cat /source/community/backend/infrahub/proposed_change/tasks.py). The surrounding logic was restructured in the 1.7.7 → 1.9.2 window (new only_has_unique_targets / relevant_node_changes flow, impacted_artifacts = list(artifacts_by_member.values()) at line 831), but no orphan guard was added to the dict-population loop.

SDK sideRelatedNode.get() in python_sdk/infrahub_sdk/node/related_node.py:

def get(self) -> InfrahubNode:
    if self._peer:
        return self._peer
    if self.id and self.typename:
        return self._client.store.get(key=self.id, kind=self.typename, branch=self._branch)
    if self.hfid_str:
        return self._client.store.get(key=self.hfid_str, branch=self._branch)
    raise ValueError("Node must have at least one identifier (ID or HFID) to query it.")

It might also be worth checking whether client.filters(..., include=["object"]) ever legitimately returns a row with all three of (peer-object, id+typename, hfid_str) unset on a non-orphan artifact — if so there's a separate SDK-population bug worth tracking, but the validator hardening is necessary either way.

Diagnostic users can run on their own data to find suspect rows (per artifact definition, per branch):

query Orphans($defId: ID!) {
  CoreArtifact(definition__ids: [$defId]) {
    edges { node {
      id display_label
      object { node { __typename id } }
    } }
  }
}

Rows where object is null, object.node is null, or object.node.id is missing are the offending artifacts; deleting them via CoreArtifactDelete unblocks the validator.

Stack trace (from a 1.7.7-tagged production deployment, identical line in 1.9.2):

ValueError('Node must have at least one identifier (ID or HFID) to query it.')

  File "/source/community/backend/infrahub/proposed_change/tasks.py", line ~779,
       in validate_artifacts_generation
    artifacts_by_member[artifact.object.peer.id] = artifact.id
  File "/source/community/python_sdk/infrahub_sdk/node/related_node.py",
       line 245, in peer
    return self.get()
  File "/source/community/python_sdk/infrahub_sdk/node/related_node.py",
       line 257, in get
    raise ValueError("Node must have at least one identifier (ID or HFID) to query it.")

Metadata

Metadata

Assignees

Labels

type/bugSomething isn't working as expected

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions