Skip to content

Tier A Phase 2: Schema Enforcement + Guardrails #62

@BPMSoftwareSolutions

Description

@BPMSoftwareSolutions

Overview

Add JSON schema validation and guardrails to prevent hallucination and ensure consistent, high-quality resume output.

Problem

Current LLM output has issues:

  • ~5% hallucination rate (fabricated employers/dates)
  • ~70% schema compliance (missing fields, inconsistent format)
  • No validation of output structure
  • No guardrails to prevent fabrication

Solution

Implement 3-layer validation:

  1. Schema Validation: Enforce JSON structure and field types
  2. Guardrails Prompt: Add system prompt rules to prevent hallucination
  3. Post-Processing: Validate output, fill defaults, trace bullets to sources

Deliverables

1. JSON Schema Definition

  • File: n8n/schemas/resume_output.json
  • Fields:
    • professional_summary (string, required)
    • top_skills (array of strings, required)
    • tailored_bullets (array of objects with text + source_id, required)
    • ats_keywords (array of strings, required)
    • notes (string, optional)
  • Validation: Type checking, required fields, format validation

2. Validator Script

  • File: n8n/scripts/validate_resume_output.py
  • Functions:
    • validate_schema() - Check JSON structure
    • detect_hallucination() - Check employers exist in experiences.json
    • fill_defaults() - Add missing fields with safe defaults
    • trace_bullets() - Verify bullets link to source IDs
  • Output: Validated JSON + validation report

3. Guardrails Prompt

  • File: .agent/prompts/system.tailor.guardrails.md
  • Rules:
    • No fabrication of employers/dates
    • Every bullet must have source_id
    • Action-first, past-tense language
    • No generic/template language
    • Specific metrics and achievements
  • Integration: Update n8n tailor workflow system prompt

4. Testing

  • Test on 20 sample LLM outputs:
    • Valid output → Should pass validation
    • Missing fields → Should fill with defaults
    • Hallucinated employer → Should be caught
    • Invalid JSON → Should be handled gracefully
    • Low-quality output → Should be flagged

Success Criteria

  • ✅ JSON schema created and validated
  • ✅ Validator script created and tested
  • ✅ Guardrails prompt created and integrated
  • ✅ 100% schema compliance on test outputs
  • ✅ Zero hallucinated employers in 20 test runs
  • ✅ All missing fields filled with safe defaults
  • ✅ All bullets traceable to source IDs
  • ✅ Validation latency < 100ms per output

Demonstrable Improvements

  1. Consistency: All outputs follow same structure
  2. Reliability: No hallucinated employers/dates
  3. Traceability: Every bullet linked to source
  4. Completeness: All required fields present
  5. Quality: Guardrails enforce high-quality language

Implementation Guide

See n8n/docs/TIER_A_PHASE_2_SCHEMA.md for detailed instructions.

Estimated Effort

  • Time: 2-3 hours
  • Difficulty: Medium
  • Dependencies: Phase 1 (FAISS integration)

Files to Create

  • n8n/schemas/resume_output.json (~50 lines)
  • n8n/scripts/validate_resume_output.py (~150 lines)
  • .agent/prompts/system.tailor.guardrails.md (~50 lines)

Files to Modify

  • n8n/n8n/workflows/tailor.json (add validation node + update prompt)

Related Issues

Acceptance Criteria

  • JSON schema created and validated
  • Validator script created and tested
  • Guardrails prompt created
  • n8n workflow updated with validation
  • All 20 test outputs pass validation
  • Metrics documented in test_results_phase_a2.md
  • Code reviewed and merged

Labels

  • enhancement
  • rag
  • n8n
  • phase-2
  • tier-a
  • quality

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions