Skip to content

Fix #86: normalize promoted dependent attributes for age realism#93

Closed
RandomOscillations wants to merge 1 commit intocodex/issue-85-dependent-namesfrom
codex/issue-86-dependent-attributes
Closed

Fix #86: normalize promoted dependent attributes for age realism#93
RandomOscillations wants to merge 1 commit intocodex/issue-85-dependent-namesfrom
codex/issue-86-dependent-attributes

Conversation

@RandomOscillations
Copy link
Copy Markdown
Collaborator

Summary

  • seed dependent promotion sampling with fixed age/gender so downstream sampling evaluates with child context
  • add age-aware normalization for promoted minors (education/employment/occupation/income) using spec-driven attribute/options heuristics
  • keep household-scoped economics inherited from household context
  • add focused regression tests for promoted minor realism

Validation

  • pytest -q tests/test_agent_focus.py::TestPromotedDependentQuality

Closes #86

Copy link
Copy Markdown
Collaborator

@DeveshParagiri DeveshParagiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Verdict: ✅ Ready to merge

Summary

Comprehensive fix that normalizes age-inappropriate attributes for promoted minors using spec-driven heuristics.

Normalization Applied

Attribute Logic
Education Capped by age (≤10: elementary, ≤13: middle, ≤17: high school)
Employment Restricted to student/part_time/intern/none; <16 prefers student/none
Occupation Forced to student/none for minors
Income personal_income=0, household_income preserved from parent

Edge Cases

Age Max Education Employment
5-10 elementary student/none
11-13 middle school student/none
14-15 high school student/none
16-17 high school student/part_time/intern
18+ Not normalized Not normalized

Minor Suggestion (non-blocking)

Consider adding a boundary-age test (e.g., 16-year-old can have part_time but 10-year-old cannot).

No blocking changes required.

Copy link
Copy Markdown
Collaborator

@DeveshParagiri DeveshParagiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Verdict: ❌ Needs changes - fragile keyword matching

Problem

The attribute type detection uses substring matching against hardcoded keywords:

_INCOME_KEYWORDS = ("income", "salary", "wage", "earnings", "compensation")
_EDUCATION_KEYWORDS = ("education", "degree")
_EMPLOYMENT_KEYWORDS = ("employment", "work_status", "labor_status")

if any(token in key for token in _INCOME_KEYWORDS):
    # normalize income for minors

This silently fails if the LLM names attributes differently:

Attribute Name Matches? Result
household_income Normalized correctly
annual_earnings Normalized correctly
earning 10-year-old keeps $150k income
pay 10-year-old keeps adult pay
annual_pay Silent failure
diploma 10-year-old keeps "masters"
schooling Silent failure
work 10-year-old keeps "full_time"

Suggested Fix

Add a semantic_type field to AttributeSpec that explicitly declares the attribute's category:

# population.v1.yaml
attributes:
  - name: annual_pay
    type: numeric
    semantic_type: income  # explicit declaration
    
  - name: schooling_level
    type: categorical
    semantic_type: education
    options: ["none", "elementary", "high_school", "bachelors", "masters"]

Then in the sampler:

for attr in spec.attributes:
    if attr.semantic_type == "income":
        agent[attr.name] = _coerce_minor_income(agent[attr.name], age)
    elif attr.semantic_type == "education":
        agent[attr.name] = _coerce_minor_education(agent[attr.name], age, attr.options)
    # etc.

Fallback: Keep the keyword heuristics as a fallback for specs without semantic_type, but log a warning so users know to add explicit types.

This approach:

  • Explicit > heuristic
  • No silent failures
  • Spec authors control the behavior
  • Backward compatible with warning

Copy link
Copy Markdown
Collaborator

@DeveshParagiri DeveshParagiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Issue: _education_stage() also uses fragile keyword matching

The _education_stage() function at line 434-450 has the same problem for education values:

stage_map = [
    (0, ("none", "home", "preschool", "pre_k", "kindergarten")),
    (1, ("elementary", "primary", "grade_school")),
    (2, ("middle_school", "middle", "junior_high")),
    (3, ("high_school", "secondary", "ged")),
    (5, ("bachelor", "college", "undergraduate", "university")),
    # ...
]
if any(marker in v for marker in markers):
    return stage

Failure cases:

Education Value Expected Stage Actual Why
high_school 3 3
12th_grade 3 None ❌ No keywords
completed_12 3 None ❌ No keywords
bachelors 5 5 ✅ "bachelor" matches
4_year_degree 5 None ❌ No keywords
BA 5 None ❌ No keywords
BS 5 None ❌ No keywords

When _education_stage() returns None, the coercion logic may not cap education properly.

Suggested fix: Use the spec's options list order as the canonical stage mapping, or add education_stage: int to each option in the spec.

@DeveshParagiri DeveshParagiri deleted the branch codex/issue-85-dependent-names February 17, 2026 18:56
@DeveshParagiri DeveshParagiri deleted the codex/issue-86-dependent-attributes branch February 23, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants