-
Notifications
You must be signed in to change notification settings - Fork 2
Syntax
Each graph rule is semi-colon separated, where the last match shall not terminate with a semicolon. The grammar requires at least one graph rule (centralmatch)
Each graph grammar comes with a rule name (OTHERS) and is introduced by an equivalence sign (EQ). It requires providing a node match with a name, potential ingoing or outgoing edges (many_edges) and some joining edges not necessarily referring to the ego-net of the matching node (edge_joining). Data matching conditions on the nodes or the edges being considered are introduced by a where non-terminal. The rewriting is optional: if not specified, both the node and the descendant nodes are unchanged. Otherwise, we can selectively update the nodes in the graph being matched (rewrite_to). As the matching node might be replaced by a newly created node specified in the rewriting phase, this last one must also be specified explicitly at the end of the rewriting.
This statement refers to a generic node match. When appearing at the beginning of a pattern match query, it refers to the entry-point of the graph query. Node declarations are bounded by rounded parentheses (L/RPAR), and might be associated to one or more variables ||-separated across the patterns (multiple_labels), so to determine the order of the graph grammar's rules application depending on the relationship between such nodes.
The following parts of this declaration are only expressed by the query language at this stage but are left to future work for implementation:
-
STARsemantics: expressing the optionality/multi-node match without necessarily nesting the morphism. - specifying an optional match referring to the node's label (
COL (:) OTHERS). At these stage, these are only supported via subsequent data conditions (seeWHERE), and not by pruning the retrieved edges within the morphism.
This matches many optional single edges arriving or departing from the query entry point. These edges can refer to an outgoing edge (#1), an ongoing edge (#2) or a hook (#3). When the node is morphism-aggregated, this refers to an edge connecting all the nodes being grouped by, but not necessarily forming a clique.
We now discuss the edgelabel not-terminal component: edges are introduced by squared brackets (Q/PPAR): the question mark remarks an edge being optionally matched (QM), while the ∀ symbol (FORALL) indicates that the target node of the current directed edge will be nested within a nested morphism. Edges can be optionally referred to by a variable name (OTHERS) that should always be followed by a colon (:). We can there express the possible options allowed for the edge's label via multiple_labels.
Last, we can specify free edges to connect the already-described nodes through multiple optional edge_joining edges within the same pattern.
The data conditions are expressed as a further refinement of the current graph pattern and are optionally introduced by a where token. We can then exploit straightforward equality or inequality conditions (=,≠,<,≤) between expressions identified in rewrite_expr, express a conjunction (∧) or disjunction (∨) against the previous boolean predicates, to use an advanced data predicate being expressed in Script v3.0 syntax in double-quotes. We can also further determine whether to keep an entire morphism from the current pattern as valid if one of the nodes or edges associated to a variable OTHER₁ do (not) appear as part of a morphism of the pattern OTHER₂ associated to a variable OTHER₃.
The data condition can be either a simple non double-quoted string (OTHER) or some specific data retrieved through the morphism (test_expr_side), which can be the following:
- The
OTHERS-th value (𝜉) associated to an object or variablerewrite_expr - The
OTHERS-th label (ℓ) associated to an object or variablerewrite_expr - A key or property
rewrite_expr₁belonging to an object or variablerewrite_expr₂ - A node-id accessible through an outgoing edge (or GSM containment) of
rewrite_expr₂with label (or containment name)rewrite_expr₁ - The edge label for an edge associated with a variable
rewrite_expr - The source (
src) or an edge associated with a variablerewrite_expr - The target (
dst) of an edge associated with a variablerewrite_expr - An if-then-else condition, where the alternative condition is optional
- Yet again a boolean interpretation of a Script v3.0 expression within double quotes.
The graph rewriting operations are optional and are introduced by a ↪ sign (REWRITE_TO). Please observe that matching a graph without necessarily restructuring the graph will, in this query language, return the same graph as initially loaded and will not return a set of subgraphs of the original graph; to obtain the latter, you also have to specify which edges or nodes have to be deleted (del). new will temporally instantiate a variable OTHERS holding a newly created object, while set will update the values in rewrite_expr₁ with the ones determined in rewrite_expr₂. If the current entry-point query has been removed and we want to replace such a node with a newly created one, we can then express our intent to return a specific node associated with a given variable with an optional final node statement.
The full Syntax Diagram generated from this ANTLR4 grammar provides the full specification of this language.
[TODO]
The complete Syntax Diagram generated from this ANTLR4 grammar provides the full specification of this language.
The most recent description of this scripting language acting as a MetaModel for GSM is described in our seminal paper;
G. Bergami, W. Zegadło. 2023. "Towards a Generalised Semistructured Data Model and Query Language" SIGWEB Newsl. 2023, Summer, Article 4 (Summer 2023), 22 pages.
Still, the current syntax has been improved to easily be parsed with an LL(1) grammar, thus allowing a more efficient code analysis at run time. Therefore, the specific syntax from that paper was used in this, another version of our project.