Skip to content

Tier B: LoRA Fine-tuning for Llama 3.1 #64

@BPMSoftwareSolutions

Description

@BPMSoftwareSolutions

Overview

Fine-tune a LoRA adapter on Llama 3.1-8B using past tailored resumes to improve quality and speed.

Problem

After Tier A (40-50% improvement), further improvements require:

  • Model-specific knowledge of resume tailoring patterns
  • Consistent "Sidney voice" across all resumes
  • Faster inference (2x speedup)
  • Better handling of edge cases

Solution

Train a LoRA adapter using Supervised Fine-Tuning (SFT):

  1. Collect 100-200 SFT pairs from past tailored resumes
  2. Train LoRA adapter on Llama 3.1-8B
  3. Deploy via vLLM or Ollama
  4. A/B test against base model

Deliverables

1. Data Collection

  • Source: Past tailored resumes + job descriptions
  • Format: JSONL with instruction, input, output
  • Target: 100-200 pairs across 5 task types:
    • Technical role tailoring
    • Leadership role tailoring
    • Career transition tailoring
    • Skill emphasis tailoring
    • Industry-specific tailoring
  • Quality: Manual review and curation

2. Training Configuration

  • Model: Llama 3.1-8B
  • Framework: Unsloth or Axolotl
  • LoRA Config:
    • Rank (r): 16
    • Alpha: 32
    • Target modules: q_proj, v_proj
  • Training Params:
    • Epochs: 2
    • Learning rate: 2e-4
    • Batch size: 4 (per GPU)
    • Max seq length: 2048
    • Precision: bf16

3. Training Pipeline

  • File: scripts/train_lora_adapter.py
  • Features:
    • Data loading and preprocessing
    • Training loop with validation
    • Checkpoint saving
    • Loss tracking and logging
    • Hardware detection (GPU/CPU)

4. Serving Setup

  • Option A: vLLM with LoRA

    • Endpoint: http://localhost:8000/v1/chat/completions
    • Loads base model + LoRA adapter
    • Latency: < 5s per request
  • Option B: Ollama with LoRA

    • Endpoint: http://localhost:11434/api/generate
    • Custom model with LoRA weights
    • Latency: < 5s per request

5. A/B Testing

  • Test Set: 20 diverse job descriptions
  • Metrics:
    • Quality rating (manual eval)
    • Latency (inference time)
    • Consistency ("Sidney voice")
    • Hallucination rate
  • Success: LoRA model > base model on all metrics

6. Documentation

  • File: docs/TIER_B_LORA_TRAINING.md
  • Contents:
    • Data collection process
    • Training procedure
    • Serving setup
    • A/B test results
    • Deployment guide

Success Criteria

  • ✅ 100-200 SFT pairs collected and curated
  • ✅ LoRA adapter trained successfully
  • ✅ Training loss converges
  • ✅ Serving endpoint responds < 5s
  • ✅ A/B test shows improvement on all metrics
  • ✅ Quality rating: 4.5/5 or higher
  • ✅ Latency: < 15s end-to-end
  • ✅ Hallucination rate: < 0.5%
  • ✅ Documentation complete

Demonstrable Improvements

  1. Quality: 20-30% additional improvement over Tier A
  2. Speed: 2x faster inference (< 15s vs < 30s)
  3. Consistency: Consistent "Sidney voice" across all resumes
  4. Reliability: Lower hallucination rate (< 0.5%)
  5. Customization: Model learns your specific tailoring style

Implementation Guide

See docs/TIER_B_LORA_TRAINING.md for detailed instructions.

Estimated Effort

  • Time: 3-4 weeks
  • Difficulty: High
  • Dependencies: Tier A complete
  • Hardware: Single GPU (24GB+) or cloud rental

Files to Create

  • data/sft_pairs.jsonl - Training data
  • scripts/train_lora_adapter.py - Training script
  • scripts/serve_lora_adapter.py - Serving script
  • docs/TIER_B_LORA_TRAINING.md - Documentation

Files to Modify

  • n8n/n8n/workflows/tailor.json - Update LLM endpoint (optional)

Related Issues

Acceptance Criteria

  • SFT pairs collected and curated
  • LoRA adapter trained
  • Serving endpoint deployed
  • A/B test completed
  • All success criteria met
  • Documentation complete
  • Code reviewed and merged
  • Ready for Tier C (optional)

Notes

  • This is optional and can be done after Tier A
  • Requires GPU with 24GB+ VRAM or cloud rental
  • Training time: 2-4 hours on single GPU
  • Can be parallelized across multiple GPUs

Labels

  • enhancement
  • rag
  • n8n
  • tier-b
  • fine-tuning
  • lora
  • optional

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions