Skip to content

Commit 8ac6e1f

Browse files
committed
more subtypes
1 parent 054e574 commit 8ac6e1f

5 files changed

Lines changed: 193 additions & 15 deletions

File tree

docs/manual/notation.md

Lines changed: 53 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,14 @@ Specifications can be other relations, for example in "Maria plays chess when it
9292
(plays/P.sox maria/C chess/C (when/T (rains/P.o it/C)))
9393
```
9494

95+
**Every argument filling the specification role (`x`) must be of type specifier (`S`).** Specifiers are produced by triggers (`(T [CR])``S`), so a specification is normally introduced by a trigger word, as in `(in/Tt ...)` or `(when/Tt ...)` above. When a specification has no natural trigger word in the surface text — for example a bare indirect object — the argument must still be turned into an `S` by enclosing it in the appropriate special trigger atom `_/T*/.` (see [Special atoms](#special-atoms)). For instance, "Maria gave Peter a book" has Peter as a recipient with no preposition:
96+
97+
```
98+
(gave/P.sox maria/Cp book/Cc (_/Ti/. peter/Cp))
99+
```
100+
101+
Here `(_/Ti/. peter/Cp)` is a specifier of indirect-object subtype, satisfying the requirement that the `x` argument be of type `S`.
102+
95103
### Builders
96104

97105
In builders, the two argument roles are used to identify the main concept and the auxiliary concept
@@ -125,31 +133,48 @@ The following tables present the subtypes that SH semantic parsers are expected
125133
| Cq | quantitative | 27/Cn |
126134
| Ca | adjective | 27/Ca |
127135
| Cd | determinant | some/Cd |
136+
| Cw | interrogative / wh (nominal) | who/Cw, what/Cw, which/Cw |
137+
| Ce | demonstrative pronoun | this/Ce, that/Ce |
138+
| Cg | nominalized verb / gerund | swimming/Cg |
128139
| Cx | unclassified | |
129140

130141
### Predicate
131142

143+
`Pv` is the default declarative verbal predicate. The mood/kind subtypes below take precedence when they apply; a predicate carries exactly one single-character subtype.
144+
132145
| Code | Subtype | Example |
133146
|------|---------|---------|
134-
| Pv | verbal | is/Pv |
147+
| Pv | verbal (declarative, default) | is/Pv |
148+
| Pi | interrogative | is/Pi (Is the sky blue?) |
149+
| Pj | imperative / jussive | close/Pj (Close the door) |
150+
| Pn | nominal / copular | non-verbal / copula-drop predicate |
151+
| Pe | existential | there is/are |
135152
| Px | unclassified | |
136153

137154
### Builder
138155

139156
| Code | Subtype | Example |
140157
|------|---------|---------|
158+
| Bp | genitive / relational | of/Bp.ma (capital of France) |
159+
| Bm | partitive / measure | cup of coffee, slice of bread |
141160
| Bx | unclassified | |
142161

162+
Appositives ("Obama, the president") are *not* builders: express them with the generic conjunction `:/J/.`, e.g. `(:/J/. obama/Cp (the/Md president/Cc))`.
163+
143164
### Modifier
144165

145166
| Code | Subtype | Example |
146167
|------|---------|---------|
147168
| Md | determinant | the/Md |
148169
| Ma | adjective | green/Ma |
149170
| Mq | quantitative | 100/Mq |
150-
| Mm | modal | will/Mv |
171+
| Mm | modal / tense / auxiliary | will/Mm, was/Mm |
172+
| Mb | adverbial / manner | quickly/Mb, carefully/Mb |
173+
| Mg | degree / intensifier | very/Mg, more/Mg, too/Mg |
151174
| Mn | negation | not/Mn |
152175
| Mp | possessive | my/Mp |
176+
| Me | demonstrative determiner | this/Me, that/Me |
177+
| Mw | interrogative determiner | which/Mw, whose/Mw |
153178
| Mx | unclassified | |
154179

155180
### Trigger
@@ -184,4 +209,29 @@ Special atoms are annotated with the reserved `.` namespace.
184209
|------|---------|---------|
185210
| +/B/. | Define compound nouns | (+/B.am/. alan/Cp turing/Cp) |
186211
| :/J/. | Generic conjunction | |
187-
| _/Ti/. | Indirect object | |
212+
213+
### Special trigger atoms
214+
215+
There is one special trigger atom per trigger subtype. Use them to turn a bare concept or relation into a specifier (`S`) when a specification argument (the `x` role of a predicate) has no natural trigger word in the surface text. The subtype is chosen to match the semantic role the specification plays.
216+
217+
| Atom | Subtype | Example |
218+
|------|---------|---------|
219+
| _/Tt/. | temporal | (_/Tt/. monday/Cc) |
220+
| _/Tl/. | locative | (_/Tl/. berlin/Cp) |
221+
| _/Ti/. | indirect object | (_/Ti/. peter/Cp) |
222+
| _/Ta/. | passive actor / agent | (_/Ta/. dog/Cc) |
223+
| _/Tb/. | beneficiary | (_/Tb/. him/Ci) |
224+
| _/Ts/. | source / ablative | (_/Ts/. paris/Cp) |
225+
| _/Tn/. | manner / means / instrument | (_/Tn/. hammer/Cc) |
226+
| _/Tw/. | comitative | (_/Tw/. john/Cp) |
227+
| _/Tr/. | reference / topic | (_/Tr/. politics/Cc) |
228+
| _/Tq/. | quantitative / measure | (_/Tq/. (5/Mq percent/Cc)) |
229+
| _/Tv/. | privative / negative | (_/Tv/. money/Cc) |
230+
| _/Tf/. | conditional | (_/Tf/. (rains/Pv.s it/Ci)) |
231+
| _/Tc/. | causal | (_/Tc/. rain/Cc) |
232+
| _/Tp/. | purpose / final | (_/Tp/. safety/Cc) |
233+
| _/To/. | result / consecutive | (_/To/. victory/Cc) |
234+
| _/Tg/. | concessive | (_/Tg/. rain/Cc) |
235+
| _/Te/. | comparative | (_/Te/. lion/Cc) |
236+
| _/Td/. | declarative complementizer | (_/Td/. (won/Pv.s she/Ci)) |
237+
| _/Tx/. | unclassified | (_/Tx/. thing/Cc) |

src/hyperbase/correctness.py

Lines changed: 62 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,21 @@
1010
from hyperbase.hyperedge import Atom, Hyperedge
1111

1212

13-
def check_correctness(edge: Hyperedge) -> dict[Hyperedge, list[tuple[str, str, int]]]:
13+
def check_correctness(
14+
edge: Hyperedge, strict: bool = False
15+
) -> dict[Hyperedge, list[tuple[str, str, int]]]:
1416
"""Check correctness of a hyperedge, returning errors keyed by subedge.
1517
1618
Each error is ``(code, message, severity)``. Correctness failures are hard
1719
grammar violations, so they all carry severity ``0`` (the most serious).
20+
21+
When ``strict`` is ``True``, additional grammar rules are enforced: every
22+
argument filling a predicate's specification role (``x``) must be of type
23+
specifier (``S``). Default (``strict=False``) behaviour is unchanged.
1824
"""
1925
if edge.atom:
2026
return _check_atom(edge) # type: ignore[arg-type]
21-
return _check_edge(edge)
27+
return _check_edge(edge, strict)
2228

2329

2430
def _check_atom(atom: Atom) -> dict[Hyperedge, list[tuple[str, str, int]]]:
@@ -42,7 +48,9 @@ def _check_atom(atom: Atom) -> dict[Hyperedge, list[tuple[str, str, int]]]:
4248
return output
4349

4450

45-
def _check_edge(edge: Hyperedge) -> dict[Hyperedge, list[tuple[str, str, int]]]:
51+
def _check_edge(
52+
edge: Hyperedge, strict: bool = False
53+
) -> dict[Hyperedge, list[tuple[str, str, int]]]:
4654
output: dict[Hyperedge, list[tuple[str, str, int]]] = {}
4755
errors: list[tuple[str, str]] = []
4856

@@ -96,6 +104,27 @@ def _check_edge(edge: Hyperedge) -> dict[Hyperedge, list[tuple[str, str, int]]]:
96104
if at not in {EdgeType.CONCEPT, EdgeType.RELATION, EdgeType.SPECIFIER}:
97105
e = f"predicate argument '{arg}' of '{edge}' has incorrect type: {at}"
98106
errors.append(("pred-arg-bad-type", e))
107+
# strict: every specification-role (x) argument must be a specifier (S)
108+
if strict:
109+
try:
110+
ars = edge.argroles()
111+
except RuntimeError:
112+
ars = ""
113+
for i, arg in enumerate(edge[1:]):
114+
if (
115+
i < len(ars)
116+
and ars[i] == const.ArgRole.SPECIFICATION
117+
and arg.mtype() != EdgeType.SPECIFIER
118+
):
119+
errors.append(
120+
(
121+
"spec-arg-not-specifier",
122+
f"specification argument '{arg}' of '{edge}' must "
123+
f"be a specifier (type 'S'), but has type "
124+
f"{arg.mtype()}. Wrap it in a trigger (e.g. a "
125+
f"special trigger atom like _/Tt/.).",
126+
)
127+
)
99128
# check if conjunction structure is correct
100129
elif ct == EdgeType.CONJUNCTION and len(edge) < 3:
101130
errors.append(
@@ -165,7 +194,7 @@ def _check_edge(edge: Hyperedge) -> dict[Hyperedge, list[tuple[str, str, int]]]:
165194
output[edge] = [(code, msg, 0) for code, msg in errors]
166195

167196
for subedge in edge:
168-
output.update(check_correctness(subedge))
197+
output.update(check_correctness(subedge, strict))
169198

170199
return output
171200

@@ -212,7 +241,10 @@ def _visit(current_edge: Hyperedge) -> None:
212241
connector_type = current_edge[0].type()
213242
if len(current_edge) >= 2:
214243
target_mt = current_edge[1].mt
215-
if connector_type in {"Ma", "Md", "Mq", "Mp"} and target_mt != "C":
244+
if (
245+
connector_type in {"Ma", "Md", "Mq", "Mp", "Me", "Mw"}
246+
and target_mt != "C"
247+
):
216248
current_errors.append(
217249
(
218250
f"bad-{connector_type.lower()}-target",
@@ -225,10 +257,31 @@ def _visit(current_edge: Hyperedge) -> None:
225257
elif connector_type == "Mm" and target_mt != "P":
226258
current_errors.append(
227259
(
228-
"bad-mm-target",
229-
f"Modifier '{current_edge}' of type 'Mm' should only be "
230-
"applied to predicates (type 'P'), but its target "
231-
f"'{current_edge[1]}' has type '{target_mt}'.",
260+
f"bad-{connector_type.lower()}-target",
261+
f"Modifier '{current_edge}' of type '{connector_type}' "
262+
"should only be applied to predicates (type 'P'), but its "
263+
f"target '{current_edge[1]}' has type '{target_mt}'.",
264+
3,
265+
)
266+
)
267+
elif connector_type == "Mb" and target_mt not in {"P", "T"}:
268+
current_errors.append(
269+
(
270+
f"bad-{connector_type.lower()}-target",
271+
f"Modifier '{current_edge}' of type '{connector_type}' "
272+
"should only be applied to predicates/triggers "
273+
f"(type 'P' or 'T'), but its target '{current_edge[1]}' "
274+
f"has type '{target_mt}'.",
275+
3,
276+
)
277+
)
278+
elif connector_type == "Mg" and target_mt not in {"C", "M"}:
279+
current_errors.append(
280+
(
281+
"bad-mg-target",
282+
f"Modifier '{current_edge}' of type 'Mg' should only be "
283+
"applied to adjectives/adverbs (type 'C' or 'M'), but its "
284+
f"target '{current_edge[1]}' has type '{target_mt}'.",
232285
3,
233286
)
234287
)

src/hyperbase/hyperedge.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -428,10 +428,12 @@ def arguments_with_role(self, argrole: str) -> list[Hyperedge]:
428428
edges.append(self[pos + 1])
429429
return edges
430430

431-
def check_correctness(self) -> dict[Hyperedge, list[tuple[str, str, int]]]:
431+
def check_correctness(
432+
self, strict: bool = False
433+
) -> dict[Hyperedge, list[tuple[str, str, int]]]:
432434
from hyperbase.correctness import check_correctness
433435

434-
return check_correctness(self)
436+
return check_correctness(self, strict=strict)
435437

436438
def normalise(self) -> Hyperedge:
437439
from hyperbase.transforms import _propagate_root_text, normalise

src/hyperbase/parsers/correctness.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@
88
original tokens. The third value in each error tuple is a severity (lower is
99
worse): ``0`` for hard correctness failures, ``1`` for token-mismatch issues,
1010
``2`` for argrole problems, ``3`` for junction issues.
11+
12+
When ``strict`` is ``True``, the underlying :func:`check_correctness` also
13+
enforces that every predicate specification-role (``x``) argument is a
14+
specifier (``S``), emitting a ``spec-arg-not-specifier`` failure otherwise.
15+
Default (``strict=False``) behaviour is unchanged.
1116
"""
1217

1318
from hyperbase.correctness import check_structural_quality
@@ -18,11 +23,12 @@
1823
def check_parse_correctness(
1924
edge: Hyperedge,
2025
tokens: list[str],
26+
strict: bool = False,
2127
) -> dict[str | Hyperedge, list[tuple[str, str, int]]]:
2228

2329
# Hard grammar failures (severity 0), keyed by subedge.
2430
errors: dict[str | Hyperedge, list[tuple[str, str, int]]] = {
25-
k: list(v) for k, v in edge.check_correctness().items()
31+
k: list(v) for k, v in edge.check_correctness(strict=strict).items()
2632
}
2733

2834
structural_errors = check_structural_quality(edge)

tests/test_parse_correctness.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,3 +324,70 @@ def test_check_correctness_severity(self):
324324
assert err[2] == 0
325325
found = True
326326
assert found, "Should have build-2-args with severity 0"
327+
328+
329+
def _codes(errors):
330+
return {code for v in errors.values() for code, _msg, _sev in v}
331+
332+
333+
class TestStrictMode:
334+
"""Strict mode enforces that x-role arguments are specifiers (S)."""
335+
336+
def test_strict_flags_bare_specification_argument(self):
337+
# "peter" fills the x slot but is a bare concept, not a specifier
338+
edge = hedge("(gave/Pv.sox maria/Cp book/Cc peter/Cp)")
339+
assert edge
340+
errors = check_parse_correctness(edge, [], strict=True)
341+
assert "spec-arg-not-specifier" in _codes(errors)
342+
343+
def test_strict_error_has_severity_zero(self):
344+
edge = hedge("(gave/Pv.sox maria/Cp book/Cc peter/Cp)")
345+
assert edge
346+
errors = check_parse_correctness(edge, [], strict=True)
347+
severities = [
348+
sev
349+
for v in errors.values()
350+
for code, _msg, sev in v
351+
if code == "spec-arg-not-specifier"
352+
]
353+
assert severities == [0]
354+
355+
def test_default_mode_does_not_flag_bare_specification(self):
356+
# Same edge, default (non-strict) mode: behaviour unchanged.
357+
edge = hedge("(gave/Pv.sox maria/Cp book/Cc peter/Cp)")
358+
assert edge
359+
errors = check_parse_correctness(edge, [])
360+
assert "spec-arg-not-specifier" not in _codes(errors)
361+
362+
def test_strict_passes_wrapped_specification(self):
363+
# Wrapping the recipient in a special trigger atom makes it an S.
364+
edge = hedge("(gave/Pv.sox maria/Cp book/Cc (_/Ti/. peter/Cp))")
365+
assert edge
366+
errors = check_parse_correctness(edge, [], strict=True)
367+
assert "spec-arg-not-specifier" not in _codes(errors)
368+
369+
def test_strict_allows_relation_specification(self):
370+
# A trigger applied to a relation is an S and is accepted in the x slot.
371+
edge = hedge("(plays/Pv.sox maria/Cp chess/Cc (when/Tt (rains/Pv.s it/Ci)))")
372+
assert edge
373+
errors = check_parse_correctness(edge, [], strict=True)
374+
assert "spec-arg-not-specifier" not in _codes(errors)
375+
376+
377+
class TestNewSubtypeModifierRules:
378+
"""Soft modifier-target checks cover the newly added modifier subtypes."""
379+
380+
def test_interrogative_determiner_on_predicate_flagged(self):
381+
edge = hedge("(which/Mw (ran/Pv.sox he/Ci))")
382+
assert edge
383+
assert "bad-mw-target" in _codes(check_parse_correctness(edge, []))
384+
385+
def test_demonstrative_determiner_on_concept_ok(self):
386+
edge = hedge("(this/Me book/Cc)")
387+
assert edge
388+
assert "bad-me-target" not in _codes(check_parse_correctness(edge, []))
389+
390+
def test_manner_adverb_on_concept_flagged(self):
391+
edge = hedge("(quickly/Mb sky/Cc)")
392+
assert edge
393+
assert "bad-mb-target" in _codes(check_parse_correctness(edge, []))

0 commit comments

Comments
 (0)