You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For A.aggr(B, ...):
- Every attribute in PK(A) must have a homologous namesake in B
- This ensures non-overlapping aggregation groups
- Result PK = PK(A) always
- Same constraint applies for left join aggregation
Copy file name to clipboardExpand all lines: docs/SPEC-semantic-matching.md
+42Lines changed: 42 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,6 +73,7 @@ Semantic matching applies to all binary operations that match attributes between
73
73
|`A * B`| Join | Matches on homologous namesakes |
74
74
|`A & B`| Restriction | Matches on homologous namesakes |
75
75
|`A - B`| Anti-restriction | Matches on homologous namesakes |
76
+
|`A.aggr(B, ...)`| Aggregation | Matches on homologous namesakes |
76
77
77
78
Note: `A - B` is the negated form of restriction (equivalent to `A & ~B`), not a true set difference.
78
79
@@ -139,6 +140,36 @@ PK(A * B) =
139
140
- Both PK(A) ⊆ J and PK(B) ⊆ J
140
141
- Result PK = PK(A) = {a} (left operand wins)
141
142
143
+
### Aggregation Rules
144
+
145
+
For `A.aggr(B, ...)`, semantic matching applies with an additional constraint.
146
+
147
+
**Primary key**: PK(result) = PK(A) — always the left operand's primary key.
148
+
149
+
**Constraint**: Every attribute in PK(A) must have a homologous namesake in B.
150
+
151
+
This ensures that each B tuple belongs to exactly one A entity, so aggregation groups are non-overlapping. If B is missing any part of A's primary key, a B tuple could match multiple A tuples and be counted in multiple aggregates.
152
+
153
+
**Left join aggregation**: The same constraint applies when using `A.aggr(B, ..., left=True)`. The left join allows A tuples with no matching B tuples to appear in the result (with NULL aggregates), but the grouping constraint remains: B must contain A's complete primary key.
154
+
155
+
**Example**:
156
+
```python
157
+
# Valid: Session.aggr(Spike, count="count(*)")
158
+
# Session PK = {subject_id, session_id}
159
+
# Spike has {subject_id, session_id, spike_id, ...}
# Session contains Subject's PK ✓ — this is actually valid!
166
+
167
+
# Invalid: Session.aggr(Subject, ...)
168
+
# Session PK = {subject_id, session_id}
169
+
# Subject has {subject_id, ...}
170
+
# Subject is missing session_id from Session's PK ✗
171
+
```
172
+
142
173
## Current Implementation Analysis
143
174
144
175
### Attribute Representation (`heading.py:48`)
@@ -655,6 +686,16 @@ WHERE c.contype = 'f'
655
686
656
687
**Rationale**: Based on Armstrong's axioms. If PK(B) ⊆ J, then PK(A) → J → PK(B) by transitivity, so PK(A) alone determines all result attributes. The union rule is only needed when neither PK is fully covered by the join.
657
688
689
+
### D9: Aggregation Constraint
690
+
691
+
**Decision**: For `A.aggr(B, ...)`, require that every attribute in PK(A) has a homologous namesake in B.
692
+
693
+
**Primary key**: PK(result) = PK(A) — always.
694
+
695
+
**Rationale**: This ensures non-overlapping aggregation groups. Each B tuple belongs to exactly one A entity, preventing double-counting.
696
+
697
+
**Left join**: The same constraint applies for `A.aggr(B, ..., left=True)`. A tuples with no matching B tuples appear with NULL aggregates, but the grouping constraint remains.
698
+
658
699
## Testing Strategy
659
700
660
701
1.**Unit tests** for lineage propagation through all query operations
@@ -700,6 +741,7 @@ Semantic matching is a significant change to DataJoint's join semantics that imp
700
741
|**D6**: `@` operator | Deprecated - use `.join(semantic_check=False)`|
701
742
|**D7**: Migration | Utility function + automatic fallback computation |
0 commit comments