Skip to content

Commit 310dc3f

Browse files
authored
feature: add create_subgraph() (#2441)
Add the feature create_subgraph() for materialized induced-subgraph extraction. Add ag_catalog.create_subgraph(new_graph, from_graph, node_filter, relationship_filter) which materializes a new, persistent, fully Cypher-queryable AGE graph as the induced subgraph of an existing graph. Selection follows the graph-theory induced-subgraph definition as operationalized by Neo4j GDS gds.graph.filter(): * a vertex is kept iff node_filter holds ('*' keeps all); * an edge is kept iff relationship_filter holds AND both of its endpoints were kept (no dangling edges). Filters are arbitrary Cypher predicates bound to `n` (nodes) and `r` (relationships) and are evaluated by AGE's own Cypher engine against the source graph, so the full predicate language is available; label selection uses label(n)/label(r) since the match pattern is fixed. Implementation notes: * Result is a real, ACID, registered graph (create_graph + create_v/ elabel), not a virtual view; it composes with cypher() and itself. * Entity graphids are reassigned from the destination labels' own sequences (graphid encodes a per-graph label id), and edge endpoints are remapped through an old->new vertex map, enforcing the induced rule via inner joins. * Source label tables are read with FROM ONLY to avoid double-copying children under PostgreSQL table inheritance. * Properties of any agtype are preserved; self-loops and parallel edges (multigraph structure) are retained. * SECURITY INVOKER: reads respect the caller's table privileges and RLS; the new graph is owned by the caller. * Validates NULL/identical graph names, missing source, pre-existing destination, and a reserved dollar-quote token in predicates. Wire-up: * sql/age_subgraph.sql (new) registered in sql/sql_files after age_pg_upgrade; identical body added to age--1.7.0--y.y.y.sql so the upgrade-path catalog comparison matches. * regress/sql/subgraph.sql + expected output (new), added to REGRESS. Covers full copy, vertex-induced, node+rel, label-only edge drop, bipartite, empty result, composability, self-loops/parallel edges, property fidelity, and error cases over a ~4500-vertex / 2000-edge source graph. All 38 regression tests pass against PostgreSQL 18. Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]> modified: Makefile modified: age--1.7.0--y.y.y.sql new file: regress/expected/subgraph.out new file: regress/sql/subgraph.sql new file: sql/age_subgraph.sql modified: sql/sql_files
1 parent a17bf45 commit 310dc3f

6 files changed

Lines changed: 1084 additions & 1 deletion

File tree

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,8 @@ REGRESS = scan \
184184
security \
185185
reserved_keyword_alias \
186186
agtype_jsonb_cast \
187-
containment_selectivity
187+
containment_selectivity \
188+
subgraph
188189

189190
ifneq ($(EXTRA_TESTS),)
190191
REGRESS += $(EXTRA_TESTS)

age--1.7.0--y.y.y.sql

Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -800,3 +800,260 @@ ALTER OPERATOR ag_catalog.?&(agtype, text[])
800800
SET (RESTRICT = contsel, JOIN = contjoinsel);
801801
ALTER OPERATOR ag_catalog.?&(agtype, agtype)
802802
SET (RESTRICT = contsel, JOIN = contjoinsel);
803+
804+
--
805+
-- create_subgraph(): materialized subgraph extraction (see sql/age_subgraph.sql).
806+
-- Induced-subgraph semantics matching Neo4j GDS gds.graph.filter(): a vertex is
807+
-- kept iff node_filter holds ('*' = all); an edge is kept iff relationship_filter
808+
-- holds AND both endpoints are kept. Produces a persistent, Cypher-queryable graph.
809+
--
810+
CREATE FUNCTION ag_catalog.create_subgraph(new_graph name,
811+
from_graph name,
812+
node_filter text DEFAULT '*',
813+
relationship_filter text DEFAULT '*')
814+
RETURNS TABLE(node_count bigint, relationship_count bigint)
815+
LANGUAGE plpgsql
816+
VOLATILE
817+
SET search_path = ag_catalog, pg_catalog
818+
AS $function$
819+
DECLARE
820+
from_oid oid;
821+
new_oid oid;
822+
v_node_count bigint := 0;
823+
v_rel_count bigint := 0;
824+
rec RECORD;
825+
cypher_q text;
826+
where_clause text;
827+
dst_label_id int;
828+
dst_seq_fqn text;
829+
dst_relation text;
830+
inserted bigint;
831+
has_rows boolean;
832+
BEGIN
833+
-- Argument validation.
834+
IF new_graph IS NULL THEN
835+
RAISE EXCEPTION 'new graph name must not be NULL';
836+
END IF;
837+
IF from_graph IS NULL THEN
838+
RAISE EXCEPTION 'source graph name must not be NULL';
839+
END IF;
840+
IF new_graph = from_graph THEN
841+
RAISE EXCEPTION 'cannot extract a subgraph of "%" into itself', from_graph;
842+
END IF;
843+
844+
-- NULL predicate is treated as the '*' wildcard (keep all).
845+
IF node_filter IS NULL THEN
846+
node_filter := '*';
847+
END IF;
848+
IF relationship_filter IS NULL THEN
849+
relationship_filter := '*';
850+
END IF;
851+
852+
-- The predicates are embedded into a dollar-quoted cypher() query using the
853+
-- $age_subgraph$ tag; reject predicates that contain the tag to keep the
854+
-- quoting unambiguous.
855+
IF position('$age_subgraph$' IN node_filter) > 0
856+
OR position('$age_subgraph$' IN relationship_filter) > 0 THEN
857+
RAISE EXCEPTION 'filter predicate must not contain the reserved token $age_subgraph$';
858+
END IF;
859+
860+
-- Validate source graph exists.
861+
SELECT graphid INTO from_oid
862+
FROM ag_catalog.ag_graph WHERE name = from_graph;
863+
IF from_oid IS NULL THEN
864+
RAISE EXCEPTION 'graph "%" does not exist', from_graph;
865+
END IF;
866+
867+
-- Validate destination graph does not exist (create_graph also enforces
868+
-- naming rules and uniqueness, but we give a clear early error).
869+
IF EXISTS (SELECT 1 FROM ag_catalog.ag_graph WHERE name = new_graph) THEN
870+
RAISE EXCEPTION 'graph "%" already exists', new_graph;
871+
END IF;
872+
873+
-- Create the destination graph (default labels are created automatically).
874+
PERFORM ag_catalog.create_graph(new_graph);
875+
876+
SELECT graphid INTO new_oid
877+
FROM ag_catalog.ag_graph WHERE name = new_graph;
878+
879+
-- Working sets / mapping (uniquely named to avoid colliding with user temps).
880+
DROP TABLE IF EXISTS _ag_sg_kept_v;
881+
DROP TABLE IF EXISTS _ag_sg_kept_e;
882+
DROP TABLE IF EXISTS _ag_sg_vmap;
883+
DROP TABLE IF EXISTS _ag_sg_vstage;
884+
DROP TABLE IF EXISTS _ag_sg_estage;
885+
886+
--
887+
-- Kept vertices: evaluate node_filter with AGE's Cypher engine. The node
888+
-- variable `n` is bound exactly as in the spec; '*' selects all vertices.
889+
--
890+
IF node_filter IS NULL OR btrim(node_filter) = '*' THEN
891+
where_clause := '';
892+
ELSE
893+
where_clause := ' WHERE ' || node_filter;
894+
END IF;
895+
cypher_q := 'MATCH (n)' || where_clause || ' RETURN id(n)';
896+
897+
EXECUTE format(
898+
'CREATE TEMP TABLE _ag_sg_kept_v ON COMMIT DROP AS '
899+
'SELECT DISTINCT ag_catalog.agtype_to_graphid(vid) AS gid '
900+
'FROM ag_catalog.cypher(%L, $age_subgraph$%s$age_subgraph$) AS (vid agtype)',
901+
from_graph, cypher_q);
902+
CREATE INDEX ON _ag_sg_kept_v (gid);
903+
904+
--
905+
-- Kept edges: evaluate relationship_filter with AGE's Cypher engine. The
906+
-- relationship variable `r` is bound exactly as in the spec.
907+
--
908+
IF relationship_filter IS NULL OR btrim(relationship_filter) = '*' THEN
909+
where_clause := '';
910+
ELSE
911+
where_clause := ' WHERE ' || relationship_filter;
912+
END IF;
913+
cypher_q := 'MATCH ()-[r]->()' || where_clause || ' RETURN id(r)';
914+
915+
EXECUTE format(
916+
'CREATE TEMP TABLE _ag_sg_kept_e ON COMMIT DROP AS '
917+
'SELECT DISTINCT ag_catalog.agtype_to_graphid(eid) AS gid '
918+
'FROM ag_catalog.cypher(%L, $age_subgraph$%s$age_subgraph$) AS (eid agtype)',
919+
from_graph, cypher_q);
920+
CREATE INDEX ON _ag_sg_kept_e (gid);
921+
922+
-- old -> new vertex id mapping (graphid is unique within a graph).
923+
CREATE TEMP TABLE _ag_sg_vmap (old_id graphid PRIMARY KEY,
924+
new_id graphid NOT NULL) ON COMMIT DROP;
925+
926+
--
927+
-- PASS 1: copy kept vertices, label by label, assigning new graphids and
928+
-- recording the old->new mapping for edge remapping.
929+
--
930+
FOR rec IN
931+
SELECT name, id, relation, seq_name
932+
FROM ag_catalog.ag_label
933+
WHERE graph = from_oid AND kind = 'v'
934+
ORDER BY id
935+
LOOP
936+
-- Skip labels with no surviving vertices. Read ONLY this label's own
937+
-- rows: AGE label tables use table inheritance (custom labels inherit
938+
-- from _ag_label_vertex), so a plain scan of a parent would also return
939+
-- its children and copy them twice.
940+
EXECUTE format(
941+
'SELECT EXISTS (SELECT 1 FROM ONLY %s t '
942+
'WHERE EXISTS (SELECT 1 FROM _ag_sg_kept_v k WHERE k.gid = t.id))',
943+
rec.relation::regclass::text)
944+
INTO has_rows;
945+
IF NOT has_rows THEN
946+
CONTINUE;
947+
END IF;
948+
949+
-- Ensure the label exists in the destination graph.
950+
IF rec.name <> '_ag_label_vertex' THEN
951+
PERFORM 1 FROM ag_catalog.ag_label
952+
WHERE graph = new_oid AND name = rec.name;
953+
IF NOT FOUND THEN
954+
EXECUTE format('SELECT ag_catalog.create_vlabel(%L, %L)',
955+
new_graph, rec.name);
956+
END IF;
957+
END IF;
958+
959+
SELECT id, seq_name, relation::regclass::text
960+
INTO dst_label_id, dst_seq_fqn, dst_relation
961+
FROM ag_catalog.ag_label
962+
WHERE graph = new_oid AND name = rec.name;
963+
dst_seq_fqn := format('%I.%I', new_graph, dst_seq_fqn);
964+
965+
-- Stage surviving vertices with freshly generated ids in a real temp
966+
-- table (single evaluation), then copy to the label table and record
967+
-- the old->new mapping. A materialized stage avoids any ambiguity from
968+
-- referencing a nextval-bearing CTE more than once.
969+
DROP TABLE IF EXISTS _ag_sg_vstage;
970+
EXECUTE format(
971+
'CREATE TEMP TABLE _ag_sg_vstage ON COMMIT DROP AS '
972+
'SELECT t.id AS old_id, '
973+
' ag_catalog._graphid(%s, nextval(%L)) AS new_id, '
974+
' t.properties AS props '
975+
'FROM ONLY %s t '
976+
'WHERE EXISTS (SELECT 1 FROM _ag_sg_kept_v k WHERE k.gid = t.id)',
977+
dst_label_id, dst_seq_fqn, rec.relation::regclass::text);
978+
979+
EXECUTE format('INSERT INTO %s (id, properties) '
980+
'SELECT new_id, props FROM _ag_sg_vstage', dst_relation);
981+
982+
INSERT INTO _ag_sg_vmap (old_id, new_id)
983+
SELECT old_id, new_id FROM _ag_sg_vstage;
984+
985+
DROP TABLE _ag_sg_vstage;
986+
END LOOP;
987+
988+
SELECT count(*) INTO v_node_count FROM _ag_sg_vmap;
989+
990+
--
991+
-- PASS 2: copy kept edges, remapping endpoints. The joins to _ag_sg_vmap
992+
-- enforce the induced rule (an edge survives only if BOTH endpoints were
993+
-- kept); membership in _ag_sg_kept_e applies relationship_filter.
994+
--
995+
FOR rec IN
996+
SELECT name, id, relation, seq_name
997+
FROM ag_catalog.ag_label
998+
WHERE graph = from_oid AND kind = 'e'
999+
ORDER BY id
1000+
LOOP
1001+
-- Skip labels with no surviving edges. Read ONLY this label's own rows
1002+
-- (see the vertex pass for why inheritance requires ONLY).
1003+
EXECUTE format(
1004+
'SELECT EXISTS ('
1005+
' SELECT 1 FROM ONLY %s x '
1006+
' JOIN _ag_sg_vmap vs ON vs.old_id = x.start_id '
1007+
' JOIN _ag_sg_vmap ve ON ve.old_id = x.end_id '
1008+
' WHERE EXISTS (SELECT 1 FROM _ag_sg_kept_e k WHERE k.gid = x.id))',
1009+
rec.relation::regclass::text)
1010+
INTO has_rows;
1011+
IF NOT has_rows THEN
1012+
CONTINUE;
1013+
END IF;
1014+
1015+
IF rec.name <> '_ag_label_edge' THEN
1016+
PERFORM 1 FROM ag_catalog.ag_label
1017+
WHERE graph = new_oid AND name = rec.name;
1018+
IF NOT FOUND THEN
1019+
EXECUTE format('SELECT ag_catalog.create_elabel(%L, %L)',
1020+
new_graph, rec.name);
1021+
END IF;
1022+
END IF;
1023+
1024+
SELECT id, seq_name, relation::regclass::text
1025+
INTO dst_label_id, dst_seq_fqn, dst_relation
1026+
FROM ag_catalog.ag_label
1027+
WHERE graph = new_oid AND name = rec.name;
1028+
dst_seq_fqn := format('%I.%I', new_graph, dst_seq_fqn);
1029+
1030+
-- Stage surviving edges, remapping endpoints through _ag_sg_vmap. The
1031+
-- joins enforce the induced rule (both endpoints kept); membership in
1032+
-- _ag_sg_kept_e applies relationship_filter.
1033+
DROP TABLE IF EXISTS _ag_sg_estage;
1034+
EXECUTE format(
1035+
'CREATE TEMP TABLE _ag_sg_estage ON COMMIT DROP AS '
1036+
'SELECT ag_catalog._graphid(%s, nextval(%L)) AS new_id, '
1037+
' vs.new_id AS new_start, ve.new_id AS new_end, '
1038+
' x.properties AS props '
1039+
'FROM ONLY %s x '
1040+
'JOIN _ag_sg_vmap vs ON vs.old_id = x.start_id '
1041+
'JOIN _ag_sg_vmap ve ON ve.old_id = x.end_id '
1042+
'WHERE EXISTS (SELECT 1 FROM _ag_sg_kept_e k WHERE k.gid = x.id)',
1043+
dst_label_id, dst_seq_fqn, rec.relation::regclass::text);
1044+
1045+
EXECUTE format('INSERT INTO %s (id, start_id, end_id, properties) '
1046+
'SELECT new_id, new_start, new_end, props '
1047+
'FROM _ag_sg_estage', dst_relation);
1048+
GET DIAGNOSTICS inserted = ROW_COUNT;
1049+
v_rel_count := v_rel_count + inserted;
1050+
1051+
DROP TABLE _ag_sg_estage;
1052+
END LOOP;
1053+
1054+
RETURN QUERY SELECT v_node_count, v_rel_count;
1055+
END;
1056+
$function$;
1057+
1058+
COMMENT ON FUNCTION ag_catalog.create_subgraph(name, name, text, text) IS
1059+
'Materializes a new persistent graph as the induced subgraph of from_graph selected by a Cypher node predicate (on n) and relationship predicate (on r); ''*'' keeps all. An edge is kept only if its predicate holds and both endpoints are kept. Returns (node_count, relationship_count).';

0 commit comments

Comments
 (0)