Tier B: LoRA Fine-tuning for Llama 3.1

## Overview

Fine-tune a LoRA adapter on Llama 3.1-8B using past tailored resumes to improve quality and speed.

## Problem

After Tier A (40-50% improvement), further improvements require:
- Model-specific knowledge of resume tailoring patterns
- Consistent "Sidney voice" across all resumes
- Faster inference (2x speedup)
- Better handling of edge cases

## Solution

Train a LoRA adapter using Supervised Fine-Tuning (SFT):
1. Collect 100-200 SFT pairs from past tailored resumes
2. Train LoRA adapter on Llama 3.1-8B
3. Deploy via vLLM or Ollama
4. A/B test against base model

## Deliverables

### 1. Data Collection
- **Source**: Past tailored resumes + job descriptions
- **Format**: JSONL with instruction, input, output
- **Target**: 100-200 pairs across 5 task types:
  - Technical role tailoring
  - Leadership role tailoring
  - Career transition tailoring
  - Skill emphasis tailoring
  - Industry-specific tailoring
- **Quality**: Manual review and curation

### 2. Training Configuration
- **Model**: Llama 3.1-8B
- **Framework**: Unsloth or Axolotl
- **LoRA Config**:
  - Rank (r): 16
  - Alpha: 32
  - Target modules: q_proj, v_proj
- **Training Params**:
  - Epochs: 2
  - Learning rate: 2e-4
  - Batch size: 4 (per GPU)
  - Max seq length: 2048
  - Precision: bf16

### 3. Training Pipeline
- **File**: `scripts/train_lora_adapter.py`
- **Features**:
  - Data loading and preprocessing
  - Training loop with validation
  - Checkpoint saving
  - Loss tracking and logging
  - Hardware detection (GPU/CPU)

### 4. Serving Setup
- **Option A**: vLLM with LoRA
  - Endpoint: `http://localhost:8000/v1/chat/completions`
  - Loads base model + LoRA adapter
  - Latency: < 5s per request

- **Option B**: Ollama with LoRA
  - Endpoint: `http://localhost:11434/api/generate`
  - Custom model with LoRA weights
  - Latency: < 5s per request

### 5. A/B Testing
- **Test Set**: 20 diverse job descriptions
- **Metrics**:
  - Quality rating (manual eval)
  - Latency (inference time)
  - Consistency ("Sidney voice")
  - Hallucination rate
- **Success**: LoRA model > base model on all metrics

### 6. Documentation
- **File**: `docs/TIER_B_LORA_TRAINING.md`
- **Contents**:
  - Data collection process
  - Training procedure
  - Serving setup
  - A/B test results
  - Deployment guide

## Success Criteria

- ✅ 100-200 SFT pairs collected and curated
- ✅ LoRA adapter trained successfully
- ✅ Training loss converges
- ✅ Serving endpoint responds < 5s
- ✅ A/B test shows improvement on all metrics
- ✅ Quality rating: 4.5/5 or higher
- ✅ Latency: < 15s end-to-end
- ✅ Hallucination rate: < 0.5%
- ✅ Documentation complete

## Demonstrable Improvements

1. **Quality**: 20-30% additional improvement over Tier A
2. **Speed**: 2x faster inference (< 15s vs < 30s)
3. **Consistency**: Consistent "Sidney voice" across all resumes
4. **Reliability**: Lower hallucination rate (< 0.5%)
5. **Customization**: Model learns your specific tailoring style

## Implementation Guide

See `docs/TIER_B_LORA_TRAINING.md` for detailed instructions.

## Estimated Effort

- **Time**: 3-4 weeks
- **Difficulty**: High
- **Dependencies**: Tier A complete
- **Hardware**: Single GPU (24GB+) or cloud rental

## Files to Create

- `data/sft_pairs.jsonl` - Training data
- `scripts/train_lora_adapter.py` - Training script
- `scripts/serve_lora_adapter.py` - Serving script
- `docs/TIER_B_LORA_TRAINING.md` - Documentation

## Files to Modify

- `n8n/n8n/workflows/tailor.json` - Update LLM endpoint (optional)

## Related Issues

- Parent: #60 (Main enhancement issue)
- Previous: #63 (Phase 3 - End-to-End Testing)
- Optional Next: #64 (Tier C - Advanced Features)

## Acceptance Criteria

- [ ] SFT pairs collected and curated
- [ ] LoRA adapter trained
- [ ] Serving endpoint deployed
- [ ] A/B test completed
- [ ] All success criteria met
- [ ] Documentation complete
- [ ] Code reviewed and merged
- [ ] Ready for Tier C (optional)

## Notes

- This is optional and can be done after Tier A
- Requires GPU with 24GB+ VRAM or cloud rental
- Training time: 2-4 hours on single GPU
- Can be parallelized across multiple GPUs

## Labels

- enhancement
- rag
- n8n
- tier-b
- fine-tuning
- lora
- optional

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tier B: LoRA Fine-tuning for Llama 3.1 #64

Overview

Problem

Solution

Deliverables

1. Data Collection

2. Training Configuration

3. Training Pipeline

4. Serving Setup

5. A/B Testing

6. Documentation

Success Criteria

Demonstrable Improvements

Implementation Guide

Estimated Effort

Files to Create

Files to Modify

Related Issues

Acceptance Criteria

Notes

Labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tier B: LoRA Fine-tuning for Llama 3.1 #64

Description

Overview

Problem

Solution

Deliverables

1. Data Collection

2. Training Configuration

3. Training Pipeline

4. Serving Setup

5. A/B Testing

6. Documentation

Success Criteria

Demonstrable Improvements

Implementation Guide

Estimated Effort

Files to Create

Files to Modify

Related Issues

Acceptance Criteria

Notes

Labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions