These are not blockers for starting Stage 7 if human-in-the-loop review will handle missing or weak data later.
-
Add per-association confidence/status
- Add explicit statuses such as
exact,near,weak, andrejectedfor each accepted/rejected association. - Preserve distance, threshold, source artifact, and matching rule so human review can prioritize weak links.
- Add explicit statuses such as
-
Parse line numbers into structured fields
- Keep raw/normalized line text.
- Add parsed candidates for size, service, sequence, spec/class, insulation/tracing, and suffix where detectable.
- Attach parse confidence and unresolved parse tokens.
-
Add trace grouping hints
- Group traces that likely belong to the same process line by line number, connected terminals, branch relation, direction continuity, and shared equipment/page connector context.
- Keep these as hints only; Stage 7 should still preserve trace-level provenance.
-
Preserve explicit reused/skipped trace references
- Keep
skipped_existing_tracerecords as non-physical sources withreused_trace_id. - Ensure downstream stages do not count skipped traces as physical pipe edges.
- Keep skipped port/node records available for review and equipment-port completeness checks.
- Keep
-
Add QA severity classification
blocking: invalid geometry, no terminal where one is required, disconnected equipment, malformed trace.review: missing line number, weak association, multiple line candidates, ambiguous terminal, dead end requiring review.info: skipped duplicate, reused path, accepted exact match.
Build the next graph assembly stage directly from stage6_trace_associations.json, instead of the older stage5_geometric_segments / Phase 3 geometric path.
- Stage 6 is the source of truth for the geometric-route Stage 7.
- Human review will later resolve missing line numbers and weak associations.
- Skipped/reused traces should be represented as metadata, not counted as duplicate physical edges.
- Existing Stage 7 artifact names should be preserved where practical to avoid breaking downstream export and UI flows.
stage6_trace_associations.jsoncontainstrace_edges, per-edgeattachments, accepted/rejected associations, and unresolved QA lists.- Current
stage7_geometric_graph_assembly()still builds fromstage5_geometric_segments.json, Phase 3 runs, and geometric edge builders. - Current geometric pipeline route already runs Stage 5b then Stage 6 before Stage 7.
Use Stage 6 as the input contract for Stage 7 graph assembly.
Reason:
- Stage 6 already has traced paths, terminals, equipment ports, inline objects, line numbers, instrument tags, flow arrows, and QA lists.
- Reusing old geometric-segment artifacts would discard the recent path-tracing fixes and duplicate-suppression logic.
-
Add a Stage 7 loader for Stage 6
- Read
stage6_trace_associations.json. - Validate required keys:
trace_edges,associations,unresolved. - Reject or warn when no physical trace edges exist.
- Read
-
Normalize Stage 6 trace edges into graph edges
- Include only physical traces with non-empty
segments. - Exclude
skipped_existing_tracefrom physical edge count. - Preserve skipped/reused records in graph metadata/review artifacts.
- Carry
trace_id, source object, source port, terminal, polyline, turns, hits, and attachments.
- Include only physical traces with non-empty
-
Create graph nodes
- Create source nodes for equipment ports, page connections, branch starts, and object ports.
- Create terminal nodes for equipment, page connections, instruments, tee junctions, branch connections, dead ends, and sheet edges.
- Merge node positions within a small tolerance when they represent the same physical point.
- Preserve original trace provenance on each node.
-
Build connectivity
- Connect source node to terminal node for each physical trace edge.
- Use tee/branch terminal metadata to connect branch edges at shared junction nodes.
- Use
reused_trace_idmetadata to mark equivalent/reverse sources without duplicating edges.
-
Attach process semantics
- Edge attributes: line number candidates, inline objects/valves, instruments, flow arrows, terminal types, source/terminal objects.
- Node attributes: equipment/page connector/instrument identity, source detections, associated ports.
-
Emit Stage 7 artifacts
stage7_graph.jsonstage7_graph_summary.jsonstage7_trace_edge_nodes.jsonstage7_line_groups.jsonas provisional grouping hints if simple grouping is included.stage7_review_queue.jsonfor missing/weak data.stage7_graph_overlay.pngif practical in the same pass.
-
Add QA/review output
- Missing line numbers.
- Dead ends.
- Skipped/reused traces.
- Weak or rejected associations.
- Duplicate/near-duplicate nodes.
- Unconnected equipment ports.
cd backend && /Users/maetee/Code/GARNET/.venv/bin/python -m py_compile api.py garnet/*.py garnet/utils/*.pycd backend && ./run_stage5b_only.sh /Users/maetee/Code/GARNET/backend/output_debug/Test-00001- Regenerate Stage 6 for Test-00001.
- Run the new Stage 7 for Test-00001 first.
- Then run Stage 7 for
Test-00001throughTest-00009. - Check summaries for edge count, node count, skipped trace count, missing line number count, and review item count.
- Too-aggressive node merging can hide real nearby ports.
- Too-conservative node merging can fragment tees into disconnected nodes.
- Missing line numbers should not block graph creation, but must remain visible for review.
- Skipped/reused traces must not create duplicate physical pipe edges.
- Branch traces that end at the same tee from different directions must share a junction node.
Primary input:
stage6_trace_associations.json
Required top-level keys:
image_idtrace_sourcetrace_edgesassociationsunresolved
Required per physical trace edge:
trace_idtrace_kindsource_obj_idsource_obj_typeportterminal_typeterminal_obj_idterminal_xysegmentspolylinetrace_length_pxstatusattachments
Physical edge rule:
- Include trace edges with non-empty
segments. - Exclude records with
status = skipped_existing_traceor no segments from physical graph edges. - Preserve skipped/reused records in review and source-port metadata.
stage7_graph.json:
{
"schema_version": "stage7_trace_graph_v1",
"image_id": "Test-00001.jpg",
"trace_source": "stage6",
"nodes": [],
"edges": [],
"line_groups": [],
"review_queue": [],
"metadata": {}
}Node shape:
{
"id": "node::tee::obj_000054",
"type": "tee_junction",
"position": {"x": 792, "y": 1634},
"source_refs": ["obj_000054"],
"trace_refs": ["obj_000197", "branch_000001"],
"port_refs": [],
"terminal_refs": [],
"review_status": "ok"
}Edge shape:
{
"id": "edge::obj_000197",
"trace_id": "obj_000197",
"source_node_id": "node::source::obj_000197",
"target_node_id": "node::tee::obj_000054",
"polyline": [[208, 978], [793, 979], [792, 1634]],
"length_px": 1240,
"line_number_candidates": [],
"inline_objects": [],
"instrument_tags": [],
"flow_arrows": [],
"terminal_type": "tee_junction",
"terminal_obj_id": "obj_000054",
"trace_kind": "port",
"review_status": "review"
}Skipped/reused source shape:
{
"id": "skipped::equip_1_vessel:port_03",
"source_obj_id": "equip_1_vessel",
"port_index": 3,
"reused_trace_id": "obj_000194",
"reason": "source_reached_by_existing_trace",
"physical_edge_created": false
}-
Source node from trace start
- If source is equipment: node type
equipment_port. - If source is page/utility/connection: node type
page_connectionorconnectionbased onsource_obj_type. - If source is branch: node type
branch_start. - Position comes from
edge.port.x/y.
- If source is equipment: node type
-
Terminal node from trace terminal
equipmentterminal -> node typeequipmentorequipment_portwhen terminal point is a known equipment port.page_connectionterminal -> node typepage_connection.instrument_tagterminal -> node typeinstrument.tee_junctionterminal -> node typetee_junction.branch_connectionterminal -> node typetee_junctionorbranch_connection.dead_endterminal -> node typedead_end.sheet_edgeterminal -> node typesheet_edge.
-
Node merge rule
- Merge by stable object id when available, e.g. same
terminal_obj_idfor a tee node. - Else merge by spatial bucket within a small tolerance.
- Use different tolerances by type:
- tee/branch junction: 12 px
- equipment port: 16 px
- page connector: 20 px
- dead end: 8 px
- Do not merge two equipment ports on the same equipment unless same point or explicit reused/skipped metadata indicates equivalence.
- Merge by stable object id when available, e.g. same
-
Provenance rule
- Every node keeps
source_refs,trace_refs, andport_refswhere applicable. - Never drop the original trace id that caused the node.
- Every node keeps
- One physical edge per included Stage 6 trace edge.
- Edge id is
edge::<trace_id>. - Source and target node ids come from the node creation rules.
- Edge carries Stage 6 attachments directly:
line_numbersinline_objectsinstrument_tagsflow_arrowsequipment_portsterminals
- Edge does not include skipped/reused trace records as geometry.
- Edge review status:
okif terminal is known and at least one strong semantic association exists where expected.reviewif missing line number, dead end, weak/multiple associations, or ambiguous terminal.blockingonly for malformed geometry or missing required endpoint.
Initial grouping should be conservative.
Group traces into stage7_line_groups.json using:
- Same accepted line number association.
- Shared connected node or terminal.
- Same or compatible flow direction evidence.
- Branch traces connected to the grouped line via tee junctions.
Line group shape:
{
"id": "line_group::3-CUL-25-002013-B1A2-NI",
"line_number_raw": "3-CUL-25-002013-B1A2-NI",
"line_number_normalized": "3-CUL-25-002013-B1A2-NI",
"trace_ids": ["obj_000190", "obj_000193"],
"node_ids": [],
"equipment_ids": [],
"inline_object_ids": [],
"instrument_tag_ids": [],
"review_status": "review",
"warnings": []
}If no line number exists:
- Create provisional groups by connectivity only.
- Use id pattern
line_group::unassigned::<component_id>. - Add review item
missing_line_number.
stage7_review_queue.json should be the human-in-the-loop input surface.
Review item shape:
{
"id": "review::missing_line_number::obj_000197",
"severity": "review",
"category": "missing_line_number",
"trace_id": "obj_000197",
"node_id": null,
"message": "Trace has no accepted line number association.",
"suggested_action": "Assign line number or mark not required.",
"evidence": {}
}Initial categories:
missing_line_numberdead_end_traceskipped_existing_traceweak_associationmultiple_line_candidatesunattached_line_numberunattached_instrument_tagunconnected_equipment_portambiguous_terminalmalformed_trace_geometry
Severity policy:
blocking: graph cannot safely use this trace/node.review: graph can proceed, but process engineer must review.info: no action required unless auditing provenance.
Core artifacts:
stage7_graph.jsonstage7_graph_summary.jsonstage7_trace_edge_nodes.jsonstage7_line_groups.jsonstage7_line_group_summary.jsonstage7_review_queue.jsonstage7_review_queue_summary.jsonstage7_graph_overlay.png
Compatibility artifacts to preserve or map:
stage7_equipment_attachments.jsonstage7_connection_attachments.jsonstage7_edge_terminals.jsonstage7_text_attachments.jsonstage7_instrument_tag_attachments.json
These can initially be derived from Stage 6 associations instead of recomputing from old geometric edges.
stage7_graph_summary.json should include:
node_countedge_countphysical_trace_edge_countskipped_trace_countline_group_countunassigned_line_group_countequipment_node_counttee_junction_node_countdead_end_node_countpage_connection_node_countreview_item_countblocking_review_item_countmissing_line_number_trace_countdead_end_trace_countsource_artifacts
Phase 7A: Minimal graph from Stage 6
- Add helper functions in
pid_extractor.pyor a new module such asbackend/garnet/trace_graph_builder.py. - Build nodes and edges from Stage 6 only.
- Emit graph, summary, node mapping, review queue.
- Do not attempt advanced line grouping yet.
Phase 7B: Compatibility outputs
- Derive existing Stage 7 attachment artifacts from Stage 6 associations.
- Keep downstream
stage7b_graph_export()working. - Confirm
graph_export_adapter.pycan consume the newstage7_graph.jsonfields.
Phase 7C: Provisional line groups
- Group by accepted line number first.
- Add unassigned connectivity groups second.
- Emit review items for missing/ambiguous line numbers.
Phase 7D: Overlay and QA refinement
- Draw graph nodes/edges with review colors.
- Add visible labels for line groups and problematic traces.
- Add counts by QA category.
Minimum acceptable first pass:
- Stage 7 runs on
Test-00001using only Stage 6 input. stage7_graph.jsoncontains physical edges from Stage 6 traces and no duplicate skipped edges.stage7_graph_summary.jsonreports node/edge/review counts.stage7_review_queue.jsonincludes missing line numbers and dead ends.stage7b_graph_export()still runs or has a documented compatibility gap.
Full validation pass:
- Stage 7 runs on
Test-00001throughTest-00009. - No runtime errors.
- Skipped/reused traces are not counted as physical edges.
- Known Stage 5b endpoints remain represented in graph nodes.
- Human review queue contains all Stage 6 unresolved items.
Implement Phase 7A only first.
Recommended first code slice:
- Add
backend/garnet/trace_graph_builder.py. - Implement
build_trace_graph_from_stage6(payload, image_id). - Update
stage7_geometric_graph_assembly()to use Stage 6 whenstage6_trace_associations.jsonexists. - Emit only:
stage7_graph.jsonstage7_graph_summary.jsonstage7_trace_edge_nodes.jsonstage7_review_queue.jsonstage7_review_queue_summary.json
- Run on
Test-00001and inspect the graph payload before adding compatibility artifacts.
These are deferred for first production hardening, not part of the current Stage 6-10 implementation.
Add a new Stage 3 review step for human-created or human-corrected major equipment bounding boxes.
Purpose:
- Ensure vessels, columns, exchangers, pumps, compressors, and other major equipment have reliable physical extents before port detection and topology association.
- Allow reviewers to add missing major equipment boxes, resize bad boxes, merge duplicate boxes, and assign stable equipment IDs/tags.
Expected output candidates:
stage3_equipment_review.jsonstage3_equipment_overlay.png- downstream equipment objects promoted into Stage 4/5 context.
Add HITL review for object detection and semantic text classes.
Review scope:
- detected inline objects and fittings
- line numbers
- instrument tags
- equipment tags
- page/utility connections
- false positives and missing objects
Required reviewer actions:
- accept/reject object detections
- correct class names
- edit bounding boxes
- add missed objects
- correct OCR text for line numbers and tags
Expected output candidates:
stage4_object_review.jsonstage4_line_number_review.jsonstage4_instrument_tag_review.json- reviewed objects become the source for Stage 5/5b and later associations.
Add HITL review for pipe mask and object port geometry before path tracing.
Review scope:
- add/remove/edit object ports
- remove invalid detected pipe lines
- mark non-pipe drawing lines/text artifacts
- correct pipe gaps or bridge/crossing ambiguity where needed
- optionally force known connection directions for equipment ports
Required reviewer actions:
- add missing ports on equipment, inline objects, page connections, and instruments
- delete invalid ports
- correct port direction and snap point
- mask out invalid pipe segments/noise
- annotate explicit keep/remove regions for pipe geometry
Expected output candidates:
stage5_port_review.jsonstage5_pipe_geometry_review.jsonstage5_review_overlay.png- reviewed ports and pipe geometry become authoritative inputs for Stage 5b tracing.