Skip to content

Commit 8fe2df2

Browse files
authored
Merge pull request #874 from graphistry/feat/collections-support
feat: add collections validation and GFQL support
2 parents d52b310 + 3a4f494 commit 8fe2df2

11 files changed

Lines changed: 1084 additions & 3 deletions

File tree

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,13 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
5252
- **GFQL / Cypher**: Extracted `ASTNormalizer` into `graphistry/compute/gfql/cypher/ast_normalizer.py` and moved shortestPath + WHERE-pattern-predicate rewrite ownership out of `lowering.py`, with parity-preserving wiring in compile/lowering flows and focused regression coverage for rewrite behavior and invocation order (#1117).
5353
- **GFQL / Cypher compiler**: Lowering now functionally consumes `BoundIR` metadata for the M1 integration slice: binder-provided params are merged into effective lowering params (runtime overrides preserved) with binder metadata keys filtered out of runtime-param resolution, scope membership narrowing uses the active scope frame for WITH-boundary correctness, semantic-table entity kinds inform alias table routing, and nullable alias metadata is wired into optional-only alias detection. `_StageScope` duplicated table bookkeeping was reduced, binder now runs pre- and post-normalization in compile flow, and binder-path regression tests were added for these code paths (#1116).
5454

55+
### Changed
56+
- **Collections**: Autofix validation now drops invalid collections (e.g., invalid GFQL ops) and non-string collection color fields instead of string-coercing them; warnings still emit when `warn=True`.
57+
- **Collections**: `collections(...)` now always canonicalizes to URL-encoded JSON (string inputs are parsed + re-encoded); the `encode` parameter was removed to avoid ambiguous behavior.
58+
- **Collections**: Set collections now require an `id` field (server requires it for subgraph storage); missing IDs are warned and dropped in autofix mode rather than auto-generated.
59+
- **Collections**: Intersection collections now cross-validate that referenced set IDs exist; dangling references are warned and dropped in autofix mode.
60+
- **Collections**: GFQL parsing consolidated to use `_wrap_gfql_expr` from `collections.py` as the canonical implementation with precise exception handling.
61+
5562
### Tests
5663
- **GFQL / Cypher binder**: Added PR-4 white-box binder semantic conformance coverage for name resolution success/failure (including unresolved alias errors), WITH scope-reset visibility, OPTIONAL MATCH `null_extended_from` lineage as `frozenset` clause ids, label narrowing from MATCH labels + conjunctive `WHERE alias:Label` checks, and SchemaConfidence rules (min-rule propagation, operand inheritance, and strong literal/`COUNT` behavior). Parser/lowering regression lanes remain green (#1114).
5764
- **Plugins / cuDF**: 14 GPU tests in `TestCpuOnlyPluginsCudfRoundTrip` (`test_call_operations_gpu.py`) verifying real cuDF→pandas→cuDF round-trip for `compute_igraph` (pagerank, spanning_tree Graph-returning path, articulation_points list-return path, edge-attribute merge path), `layout_igraph`, `layout_graphviz`, `render_graphviz`, `execute_call`, `ensure_pandas` nullable dtype preservation, and `restore_engine` conversion. Requires `TEST_CUDF=1` and RAPIDS.
@@ -413,6 +420,17 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
413420
### Infra
414421
- **CI**: Added pandas 2.2.3/3.0.0 compatibility jobs and minimal suite coverage.
415422

423+
### Added
424+
- **Collections**: New `g.collections(...)` API for defining subsets via GFQL expressions with priority-based visual encodings. Includes helper constructors `graphistry.collection_set(...)` and `graphistry.collection_intersection(...)`, support for `showCollections`, `collectionsGlobalNodeColor`, and `collectionsGlobalEdgeColor` URL params, and automatic JSON encoding. Accepts GFQL AST, Chain objects, or wire-protocol dicts (#874).
425+
- **Docs / Collections**: Added collections usage guide in visualization/layout/settings, tutorial notebook (`demos/more_examples/graphistry_features/collections.ipynb`), and cross-references in 10-minute guides, cheatsheet, and GFQL docs (#875).
426+
427+
### Changed
428+
- **Collections**: Autofix validation now drops invalid collections (e.g., invalid GFQL ops) and non-string collection color fields instead of string-coercing them; warnings still emit when `warn=True`.
429+
- **Collections**: `collections(...)` now always canonicalizes to URL-encoded JSON (string inputs are parsed + re-encoded); the `encode` parameter was removed to avoid ambiguous behavior.
430+
431+
### Tests
432+
- **Collections**: Added `test_collections.py` covering encoding, GFQL Chain/AST normalization, wire-protocol acceptance, validation modes, and helper constructors.
433+
416434
## [0.50.4 - 2026-01-15]
417435

418436
### Fixed

ai/prompts/PLAN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ git log --oneline -n 10
125125
- Source: `graphistry/`
126126
- Tests: `graphistry/tests/` (mirrors source structure: `graphistry/foo/bar.py``graphistry/tests/foo/test_bar.py`)
127127
- Docs: `docs/`
128-
- Plans: `plans/` (gitignored - safe for auxiliary files, temp secrets, working data)
128+
- Plans: `plans/` (gitignored - safe for auxiliary files, temp secrets, working data; Codex: avoid `~/.codex/plans`; if used, copy here then delete)
129129
- AI prompts: `ai/prompts/`
130130
- AI docs: `ai/docs/`
131131

graphistry/Plottable.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
from graphistry.Engine import EngineAbstractType
1818
from graphistry.utils.json import JSONVal
1919
from graphistry.client_session import ClientSession, AuthManagerProtocol
20+
from graphistry.models.collections import CollectionsInput
2021
from graphistry.models.types import ValidationParam
2122

2223
if TYPE_CHECKING:
@@ -783,6 +784,17 @@ def settings(self,
783784
) -> 'Plottable':
784785
...
785786

787+
def collections(
788+
self,
789+
collections: Optional[CollectionsInput] = None,
790+
show_collections: Optional[bool] = None,
791+
collections_global_node_color: Optional[str] = None,
792+
collections_global_edge_color: Optional[str] = None,
793+
validate: ValidationParam = 'autofix',
794+
warn: bool = True
795+
) -> 'Plottable':
796+
...
797+
786798
def privacy(self, mode: Optional[PrivacyMode] = None, notify: Optional[bool] = None, invited_users: Optional[List[str]] = None, message: Optional[str] = None, mode_action: Optional[ModeAction] = None) -> 'Plottable':
787799
...
788800

graphistry/PlotterBase.py

Lines changed: 53 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from typing import Any, Callable, Dict, List, Optional, Union, Tuple, cast, overload, TYPE_CHECKING
33
from typing_extensions import Literal
44
from graphistry.io.types import ComplexEncodingsDict
5+
from graphistry.models.collections import CollectionsInput
56
from graphistry.models.types import ValidationMode, ValidationParam
67
from graphistry.plugins_types.hypergraph import HypergraphResult
78
from graphistry.render.resolve_render_mode import resolve_render_mode
@@ -1872,7 +1873,8 @@ def graph(self, ig: Any) -> Plottable:
18721873
def settings(self, height=None, url_params={}, render=None):
18731874
"""Specify iframe height and add URL parameter dictionary.
18741875
1875-
The library takes care of URI component encoding for the dictionary.
1876+
Collections URL params are normalized and URL-encoded at plot time; other
1877+
params should already be URL-safe.
18761878
18771879
:param height: Height in pixels.
18781880
:type height: int
@@ -1892,6 +1894,51 @@ def settings(self, height=None, url_params={}, render=None):
18921894
return res
18931895

18941896

1897+
def collections(
1898+
self,
1899+
collections: Optional[CollectionsInput] = None,
1900+
show_collections: Optional[bool] = None,
1901+
collections_global_node_color: Optional[str] = None,
1902+
collections_global_edge_color: Optional[str] = None,
1903+
validate: ValidationParam = 'autofix',
1904+
warn: bool = True
1905+
) -> 'Plottable':
1906+
"""Set collections URL parameters. Additive over previous settings.
1907+
1908+
:param collections: List/dict of collections or JSON/URL-encoded JSON string (stored as URL-encoded JSON).
1909+
:param show_collections: Toggle collections panel display.
1910+
:param collections_global_node_color: Hex color for non-collection nodes (leading # stripped).
1911+
:param collections_global_edge_color: Hex color for non-collection edges (leading # stripped).
1912+
:param validate: Validation mode. 'autofix' (default) drops invalid collections and color fields with warnings, 'strict' raises on issues.
1913+
:param warn: Whether to emit warnings when validate='autofix'. validate=False forces warn=False.
1914+
"""
1915+
from graphistry.validate.validate_collections import (
1916+
encode_collections,
1917+
normalize_collections,
1918+
normalize_collections_url_params,
1919+
)
1920+
1921+
settings: Dict[str, Any] = {}
1922+
if collections is not None:
1923+
normalized = normalize_collections(collections, validate=validate, warn=warn)
1924+
settings['collections'] = encode_collections(normalized)
1925+
extras: Dict[str, Any] = {}
1926+
if show_collections is not None:
1927+
extras['showCollections'] = show_collections
1928+
if collections_global_node_color is not None:
1929+
extras['collectionsGlobalNodeColor'] = collections_global_node_color
1930+
if collections_global_edge_color is not None:
1931+
extras['collectionsGlobalEdgeColor'] = collections_global_edge_color
1932+
if extras:
1933+
extras = normalize_collections_url_params(extras, validate=validate, warn=warn)
1934+
settings.update(extras)
1935+
1936+
if len(settings.keys()) > 0:
1937+
return self.settings(url_params={**self._url_params, **settings})
1938+
else:
1939+
return self
1940+
1941+
18951942
def privacy(
18961943
self,
18971944
mode: Optional[Mode] = None,
@@ -2239,7 +2286,11 @@ def plot(
22392286
'viztoken': str(uuid.uuid4())
22402287
}
22412288

2242-
viz_url = self._pygraphistry._viz_url(info, self._url_params)
2289+
# Validate collections in url_params (catches bypass of .collections() method)
2290+
from graphistry.validate.validate_collections import normalize_collections_url_params
2291+
url_params = normalize_collections_url_params(self._url_params, validate=validate_mode, warn=warn)
2292+
2293+
viz_url = self._pygraphistry._viz_url(info, url_params)
22432294
cfg_client_protocol_hostname = self.session.client_protocol_hostname
22442295
full_url = ('%s:%s' % (self.session.protocol, viz_url)) if cfg_client_protocol_hostname is None else viz_url
22452296

graphistry/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
nodes,
2525
graph,
2626
settings,
27+
collections,
2728
encode_point_color,
2829
encode_point_size,
2930
encode_point_icon,
@@ -65,6 +66,13 @@
6566
from_cugraph
6667
)
6768

69+
from graphistry.collections import (
70+
collection_set,
71+
collection_intersection,
72+
CollectionSet,
73+
CollectionIntersection,
74+
)
75+
6876
from graphistry.compute import (
6977
n, e, e_forward, e_reverse, e_undirected,
7078
let, ref,

graphistry/collections.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from typing import Optional, Sequence, TypeVar
2+
3+
from graphistry.models.collections import (
4+
CollectionIntersection,
5+
CollectionExprInput,
6+
CollectionSet,
7+
)
8+
9+
CollectionDict = TypeVar("CollectionDict", CollectionSet, CollectionIntersection)
10+
11+
12+
def _apply_collection_metadata(collection: CollectionDict, **metadata: Optional[str]) -> CollectionDict:
13+
value = metadata.get("id")
14+
if value is not None:
15+
collection["id"] = value
16+
value = metadata.get("name")
17+
if value is not None:
18+
collection["name"] = value
19+
value = metadata.get("description")
20+
if value is not None:
21+
collection["description"] = value
22+
value = metadata.get("node_color")
23+
if value is not None:
24+
collection["node_color"] = value
25+
value = metadata.get("edge_color")
26+
if value is not None:
27+
collection["edge_color"] = value
28+
return collection
29+
30+
31+
def collection_set(
32+
*,
33+
expr: CollectionExprInput,
34+
id: Optional[str] = None,
35+
name: Optional[str] = None,
36+
description: Optional[str] = None,
37+
node_color: Optional[str] = None,
38+
edge_color: Optional[str] = None,
39+
) -> CollectionSet:
40+
"""Build a collection dict for a GFQL-defined set."""
41+
from graphistry.compute.ast import normalize_gfql_to_wire
42+
collection: CollectionSet = {"type": "set", "expr": {"type": "gfql_chain", "gfql": normalize_gfql_to_wire(expr)}}
43+
return _apply_collection_metadata(
44+
collection,
45+
id=id,
46+
name=name,
47+
description=description,
48+
node_color=node_color,
49+
edge_color=edge_color,
50+
)
51+
52+
53+
def collection_intersection(
54+
*,
55+
sets: Sequence[str],
56+
id: Optional[str] = None,
57+
name: Optional[str] = None,
58+
description: Optional[str] = None,
59+
node_color: Optional[str] = None,
60+
edge_color: Optional[str] = None,
61+
) -> CollectionIntersection:
62+
"""Build a collection dict for an intersection of set IDs."""
63+
collection: CollectionIntersection = {
64+
"type": "intersection",
65+
"expr": {
66+
"type": "intersection",
67+
"sets": list(sets),
68+
},
69+
}
70+
return _apply_collection_metadata(
71+
collection,
72+
id=id,
73+
name=name,
74+
description=description,
75+
node_color=node_color,
76+
edge_color=edge_color,
77+
)

graphistry/compute/ast.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1588,6 +1588,60 @@ def from_json(o: JSONVal, validate: bool = True) -> Union[ASTNode, ASTEdge, ASTL
15881588
return out
15891589

15901590

1591+
def normalize_gfql_to_wire(expr: Any) -> List[Dict[str, JSONVal]]:
1592+
"""
1593+
Normalize GFQL expression to wire format (list of JSON-serializable dicts).
1594+
1595+
Accepts:
1596+
- Chain object
1597+
- Single ASTObject
1598+
- List of ASTObjects
1599+
- Dict with 'type': 'Chain' and 'chain' key
1600+
- Dict with 'type': 'gfql_chain' and 'gfql' key
1601+
- Dict with just 'chain' or 'gfql' key
1602+
- Single dict (parsed as AST op)
1603+
1604+
Returns:
1605+
- List of JSON-serializable dicts ready for wire protocol
1606+
1607+
Raises:
1608+
- TypeError: if expr type is not supported
1609+
- ValueError: if expr is empty
1610+
- GFQLSyntaxError: if dict cannot be parsed as valid AST
1611+
"""
1612+
from graphistry.compute.chain import Chain
1613+
1614+
def _normalize_op(op: object) -> Dict[str, JSONVal]:
1615+
if isinstance(op, ASTObject):
1616+
return op.to_json()
1617+
if isinstance(op, dict):
1618+
return from_json(op, validate=True).to_json()
1619+
raise TypeError("GFQL operations must be AST objects or dictionaries")
1620+
1621+
def _normalize_ops(raw: object) -> List[Dict[str, JSONVal]]:
1622+
if isinstance(raw, Chain):
1623+
return _normalize_ops(raw.to_json().get("chain", []))
1624+
if isinstance(raw, ASTObject):
1625+
return [raw.to_json()]
1626+
if isinstance(raw, list):
1627+
if len(raw) == 0:
1628+
raise ValueError("GFQL operations list cannot be empty")
1629+
return [_normalize_op(op) for op in raw]
1630+
if isinstance(raw, dict):
1631+
if raw.get("type") == "Chain" and "chain" in raw:
1632+
return _normalize_ops(raw.get("chain"))
1633+
if raw.get("type") == "gfql_chain" and "gfql" in raw:
1634+
return _normalize_ops(raw.get("gfql"))
1635+
if "chain" in raw:
1636+
return _normalize_ops(raw.get("chain"))
1637+
if "gfql" in raw:
1638+
return _normalize_ops(raw.get("gfql"))
1639+
return [_normalize_op(raw)]
1640+
raise TypeError("GFQL expr must be Chain, ASTObject, list, or dict")
1641+
1642+
return _normalize_ops(expr)
1643+
1644+
15911645
###############################################################################
15921646
# User-friendly aliases for public API
15931647

graphistry/models/collections.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
from __future__ import annotations
2+
3+
from typing import Dict, List, TYPE_CHECKING, Union
4+
from typing_extensions import Literal, NotRequired, Required, TypedDict
5+
6+
from graphistry.utils.json import JSONVal
7+
8+
if TYPE_CHECKING:
9+
from graphistry.compute.ast import ASTObject
10+
from graphistry.compute.chain import Chain
11+
12+
13+
CollectionExprInput = Union[
14+
"Chain",
15+
"ASTObject",
16+
List["ASTObject"],
17+
Dict[str, JSONVal],
18+
List[Dict[str, JSONVal]],
19+
]
20+
21+
22+
class IntersectionExpr(TypedDict):
23+
type: Literal["intersection"]
24+
sets: List[str]
25+
26+
27+
class CollectionBase(TypedDict, total=False):
28+
id: str
29+
name: str
30+
description: str
31+
node_color: str
32+
edge_color: str
33+
34+
35+
class CollectionSet(CollectionBase):
36+
type: NotRequired[Literal["set"]]
37+
expr: Required[CollectionExprInput]
38+
39+
40+
class CollectionIntersection(CollectionBase):
41+
type: NotRequired[Literal["intersection"]]
42+
expr: Required[IntersectionExpr]
43+
44+
45+
Collection = Union[CollectionSet, CollectionIntersection]
46+
CollectionsInput = Union[str, Collection, List[Collection]]

graphistry/pygraphistry.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
from graphistry.plugins_types.gexf_types import GexfEdgeViz, GexfNodeViz, GexfParseEngine
77
from graphistry.client_session import ClientSession, ApiVersion, ENV_GRAPHISTRY_API_KEY, DatasetInfo, AuthManagerProtocol, strtobool
88
from graphistry.Engine import EngineAbstractType
9+
from graphistry.models.collections import CollectionsInput
10+
from graphistry.models.types import ValidationParam
911
from graphistry.otel import inject_trace_headers, otel as otel_config
1012

1113
"""Top-level import of class PyGraphistry as "Graphistry". Used to connect to the Graphistry server and then create a base plotter."""
@@ -2376,6 +2378,24 @@ def settings(self, height=None, url_params={}, render=None):
23762378

23772379
return self._plotter().settings(height, url_params, render)
23782380

2381+
def collections(
2382+
self,
2383+
collections: Optional[CollectionsInput] = None,
2384+
show_collections: Optional[bool] = None,
2385+
collections_global_node_color: Optional[str] = None,
2386+
collections_global_edge_color: Optional[str] = None,
2387+
validate: ValidationParam = 'autofix',
2388+
warn: bool = True
2389+
):
2390+
return self._plotter().collections(
2391+
collections=collections,
2392+
show_collections=show_collections,
2393+
collections_global_node_color=collections_global_node_color,
2394+
collections_global_edge_color=collections_global_edge_color,
2395+
validate=validate,
2396+
warn=warn
2397+
)
2398+
23792399
def _viz_url(self, info: DatasetInfo, url_params: Dict[str, Any]) -> str:
23802400
splash_time = int(calendar.timegm(time.gmtime())) + 15
23812401
extra = "&".join([k + "=" + str(v) for k, v in list(url_params.items())])
@@ -2604,6 +2624,7 @@ def _handle_api_response(self, response):
26042624
pipe = PyGraphistry.pipe
26052625
graph = PyGraphistry.graph
26062626
settings = PyGraphistry.settings
2627+
collections = PyGraphistry.collections
26072628
hypergraph = PyGraphistry.hypergraph
26082629
bolt = PyGraphistry.bolt
26092630
cypher = PyGraphistry.cypher

0 commit comments

Comments
 (0)