Fix #86: normalize promoted dependent attributes for age realism#93
Fix #86: normalize promoted dependent attributes for age realism#93RandomOscillations wants to merge 1 commit intocodex/issue-85-dependent-namesfrom
Conversation
DeveshParagiri
left a comment
There was a problem hiding this comment.
Code Review
Verdict: ✅ Ready to merge
Summary
Comprehensive fix that normalizes age-inappropriate attributes for promoted minors using spec-driven heuristics.
Normalization Applied
| Attribute | Logic |
|---|---|
| Education | Capped by age (≤10: elementary, ≤13: middle, ≤17: high school) |
| Employment | Restricted to student/part_time/intern/none; <16 prefers student/none |
| Occupation | Forced to student/none for minors |
| Income | personal_income=0, household_income preserved from parent |
Edge Cases
| Age | Max Education | Employment |
|---|---|---|
| 5-10 | elementary | student/none |
| 11-13 | middle school | student/none |
| 14-15 | high school | student/none |
| 16-17 | high school | student/part_time/intern |
| 18+ | Not normalized | Not normalized |
Minor Suggestion (non-blocking)
Consider adding a boundary-age test (e.g., 16-year-old can have part_time but 10-year-old cannot).
No blocking changes required.
DeveshParagiri
left a comment
There was a problem hiding this comment.
Code Review
Verdict: ❌ Needs changes - fragile keyword matching
Problem
The attribute type detection uses substring matching against hardcoded keywords:
_INCOME_KEYWORDS = ("income", "salary", "wage", "earnings", "compensation")
_EDUCATION_KEYWORDS = ("education", "degree")
_EMPLOYMENT_KEYWORDS = ("employment", "work_status", "labor_status")
if any(token in key for token in _INCOME_KEYWORDS):
# normalize income for minorsThis silently fails if the LLM names attributes differently:
| Attribute Name | Matches? | Result |
|---|---|---|
household_income |
✅ | Normalized correctly |
annual_earnings |
✅ | Normalized correctly |
earning |
❌ | 10-year-old keeps $150k income |
pay |
❌ | 10-year-old keeps adult pay |
annual_pay |
❌ | Silent failure |
diploma |
❌ | 10-year-old keeps "masters" |
schooling |
❌ | Silent failure |
work |
❌ | 10-year-old keeps "full_time" |
Suggested Fix
Add a semantic_type field to AttributeSpec that explicitly declares the attribute's category:
# population.v1.yaml
attributes:
- name: annual_pay
type: numeric
semantic_type: income # explicit declaration
- name: schooling_level
type: categorical
semantic_type: education
options: ["none", "elementary", "high_school", "bachelors", "masters"]Then in the sampler:
for attr in spec.attributes:
if attr.semantic_type == "income":
agent[attr.name] = _coerce_minor_income(agent[attr.name], age)
elif attr.semantic_type == "education":
agent[attr.name] = _coerce_minor_education(agent[attr.name], age, attr.options)
# etc.Fallback: Keep the keyword heuristics as a fallback for specs without semantic_type, but log a warning so users know to add explicit types.
This approach:
- Explicit > heuristic
- No silent failures
- Spec authors control the behavior
- Backward compatible with warning
DeveshParagiri
left a comment
There was a problem hiding this comment.
Additional Issue: _education_stage() also uses fragile keyword matching
The _education_stage() function at line 434-450 has the same problem for education values:
stage_map = [
(0, ("none", "home", "preschool", "pre_k", "kindergarten")),
(1, ("elementary", "primary", "grade_school")),
(2, ("middle_school", "middle", "junior_high")),
(3, ("high_school", "secondary", "ged")),
(5, ("bachelor", "college", "undergraduate", "university")),
# ...
]
if any(marker in v for marker in markers):
return stageFailure cases:
| Education Value | Expected Stage | Actual | Why |
|---|---|---|---|
high_school |
3 | 3 | ✅ |
12th_grade |
3 | None | ❌ No keywords |
completed_12 |
3 | None | ❌ No keywords |
bachelors |
5 | 5 | ✅ "bachelor" matches |
4_year_degree |
5 | None | ❌ No keywords |
BA |
5 | None | ❌ No keywords |
BS |
5 | None | ❌ No keywords |
When _education_stage() returns None, the coercion logic may not cap education properly.
Suggested fix: Use the spec's options list order as the canonical stage mapping, or add education_stage: int to each option in the spec.
Summary
Validation
Closes #86