Skip to content

Commit 5432141

Browse files
committed
Add D9: Aggregation constraint and rules
For A.aggr(B, ...): - Every attribute in PK(A) must have a homologous namesake in B - This ensures non-overlapping aggregation groups - Result PK = PK(A) always - Same constraint applies for left join aggregation
1 parent 3235bcb commit 5432141

1 file changed

Lines changed: 42 additions & 0 deletions

File tree

docs/SPEC-semantic-matching.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ Semantic matching applies to all binary operations that match attributes between
7373
| `A * B` | Join | Matches on homologous namesakes |
7474
| `A & B` | Restriction | Matches on homologous namesakes |
7575
| `A - B` | Anti-restriction | Matches on homologous namesakes |
76+
| `A.aggr(B, ...)` | Aggregation | Matches on homologous namesakes |
7677

7778
Note: `A - B` is the negated form of restriction (equivalent to `A & ~B`), not a true set difference.
7879

@@ -139,6 +140,36 @@ PK(A * B) =
139140
- Both PK(A) ⊆ J and PK(B) ⊆ J
140141
- Result PK = PK(A) = {a} (left operand wins)
141142

143+
### Aggregation Rules
144+
145+
For `A.aggr(B, ...)`, semantic matching applies with an additional constraint.
146+
147+
**Primary key**: PK(result) = PK(A) — always the left operand's primary key.
148+
149+
**Constraint**: Every attribute in PK(A) must have a homologous namesake in B.
150+
151+
This ensures that each B tuple belongs to exactly one A entity, so aggregation groups are non-overlapping. If B is missing any part of A's primary key, a B tuple could match multiple A tuples and be counted in multiple aggregates.
152+
153+
**Left join aggregation**: The same constraint applies when using `A.aggr(B, ..., left=True)`. The left join allows A tuples with no matching B tuples to appear in the result (with NULL aggregates), but the grouping constraint remains: B must contain A's complete primary key.
154+
155+
**Example**:
156+
```python
157+
# Valid: Session.aggr(Spike, count="count(*)")
158+
# Session PK = {subject_id, session_id}
159+
# Spike has {subject_id, session_id, spike_id, ...}
160+
# Spike contains Session's entire PK ✓
161+
162+
# Invalid: Subject.aggr(Session, count="count(*)")
163+
# Subject PK = {subject_id}
164+
# Session has {subject_id, session_id, ...}
165+
# Session contains Subject's PK ✓ — this is actually valid!
166+
167+
# Invalid: Session.aggr(Subject, ...)
168+
# Session PK = {subject_id, session_id}
169+
# Subject has {subject_id, ...}
170+
# Subject is missing session_id from Session's PK ✗
171+
```
172+
142173
## Current Implementation Analysis
143174

144175
### Attribute Representation (`heading.py:48`)
@@ -655,6 +686,16 @@ WHERE c.contype = 'f'
655686

656687
**Rationale**: Based on Armstrong's axioms. If PK(B) ⊆ J, then PK(A) → J → PK(B) by transitivity, so PK(A) alone determines all result attributes. The union rule is only needed when neither PK is fully covered by the join.
657688

689+
### D9: Aggregation Constraint
690+
691+
**Decision**: For `A.aggr(B, ...)`, require that every attribute in PK(A) has a homologous namesake in B.
692+
693+
**Primary key**: PK(result) = PK(A) — always.
694+
695+
**Rationale**: This ensures non-overlapping aggregation groups. Each B tuple belongs to exactly one A entity, preventing double-counting.
696+
697+
**Left join**: The same constraint applies for `A.aggr(B, ..., left=True)`. A tuples with no matching B tuples appear with NULL aggregates, but the grouping constraint remains.
698+
658699
## Testing Strategy
659700

660701
1. **Unit tests** for lineage propagation through all query operations
@@ -700,6 +741,7 @@ Semantic matching is a significant change to DataJoint's join semantics that imp
700741
| **D6**: `@` operator | Deprecated - use `.join(semantic_check=False)` |
701742
| **D7**: Migration | Utility function + automatic fallback computation |
702743
| **D8**: PK formation | Functional dependency analysis; left operand wins ties; non-commutative |
744+
| **D9**: Aggregation | B must contain A's entire PK; result PK = PK(A); applies to left join too |
703745

704746
### Compatibility
705747

0 commit comments

Comments
 (0)