Skip to content

Commit 7cad615

Browse files
committed
fix tldr lines
1 parent 9727e9e commit 7cad615

6 files changed

Lines changed: 6 additions & 12 deletions

docs/immune/finetuning_evaluation.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
### Fine-Tuning Evaluation Methodology
22

3-
**Keywords:** LLM-as-Judge, SFT Evaluation, Win Rate, Comparative Ranking
43

5-
**TL;DR:** Fine-tuned models are evaluated using the same LLM-as-judge framework used for baseline comparison, extended to include the finetuned model as a fifth competitor. The methodology — blind comparative ranking, three metrics, two breakdown dimensions — is identical across tasks.
4+
**Summary:** Fine-tuned models are evaluated using the same LLM-as-judge framework used for baseline comparison, extended to include the finetuned model as a fifth competitor. The methodology — blind comparative ranking, three metrics, two breakdown dimensions — is identical across tasks.
65

76
---
87

docs/immune/finetuning_frameworks_rpi_5.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
### Fine-Tuning Frameworks for GGUF Deployment on Raspberry Pi 5
22

3-
**Keywords:** Local Inference, GGUF Format, Raspberry Pi 5
43

5-
**TL;DR:** Among the evaluated frameworks, Unsloth stands out as the best fit due to its integrated GGUF export capabilities, minimal workflow complexity, and hardware-optimized quantization support, aligning perfectly with the IMMUNE project's goals and the Raspberry Pi 5’s limitations.
4+
**Summary:** Among the evaluated frameworks, Unsloth stands out as the best fit due to its integrated GGUF export capabilities, minimal workflow complexity, and hardware-optimized quantization support, aligning perfectly with the IMMUNE project's goals and the Raspberry Pi 5’s limitations.
65

76

87
### Index

docs/immune/finetuning_procedure.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
### Fine-Tuning Approach for Slips Immune
22

3-
**Keywords:** SFT, LoRA, Unsloth, GGUF, Raspberry Pi 5, Qwen2.5
43

5-
**TL;DR:** Task-specific fine-tuning of compact models (1.5B parameters) using LoRA + Unsloth, exported to GGUF for CPU inference on the Raspberry Pi 5. The same training pipeline applies across tasks; only the dataset and system prompt are task-specific.
4+
**Summary:** Task-specific fine-tuning of compact models (1.5B parameters) using LoRA + Unsloth, exported to GGUF for CPU inference on the Raspberry Pi 5. The same training pipeline applies across tasks; only the dataset and system prompt are task-specific.
65

76
---
87

docs/immune/finetuning_quantization.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
### Quantization and Deployment for Finetuned Models
22

3-
**Keywords:** GGUF, Quantization, Ollama, imatrix, Raspberry Pi 5, Deployment
43

5-
**TL;DR:** Finetuned models are converted to GGUF and published to Ollama in three quantization variants (q4_k_m, q5_k_m, q8_0). Quality degrades gracefully: ~19% loss at q8_0, ~25% at q5_k_m, ~33% at q4_k_m. q5_k_m offers the best quality/size trade-off for CPU/RPi deployment; 16-bit is recommended when a GPU is available.
4+
**Summary:** Finetuned models are converted to GGUF and published to Ollama in three quantization variants (q4_k_m, q5_k_m, q8_0). Quality degrades gracefully: ~19% loss at q8_0, ~25% at q5_k_m, ~33% at q4_k_m. q5_k_m offers the best quality/size trade-off for CPU/RPi deployment; 16-bit is recommended when a GPU is available.
65

76
> **Evaluation basis:** performance numbers in this document were measured on the [finetuned summarization model](finetuning_results.md) (47 held-out incidents, judge: gpt-oss-120b). The conversion and publication methodology applies to any finetuned model in this pipeline.
87

docs/immune/finetuning_results.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
### Summarization Fine-Tuned Model: Evaluation Results
22

3-
**Keywords:** Qwen2.5-1.5B, Incident Summarization, SFT, LLM-as-Judge, Win Rate
43

5-
**TL;DR:** The Qwen2.5-1.5B model fine-tuned for Slips incident summarization ranks 1st overall with a 7.73 avg score and 74.5% win rate — well above GPT-4o-mini — across simple and medium incidents. The primary weakness is a hard failure on very large incidents (>4000 events) caused by input truncation.
4+
**Summary:** The Qwen2.5-1.5B model fine-tuned for Slips incident summarization ranks 1st overall with a 7.73 avg score and 74.5% win rate — well above GPT-4o-mini — across simple and medium incidents. The primary weakness is a hard failure on very large incidents (>4000 events) caused by input truncation.
65

76
**Model:** [stratosphere/qwen2.5-1.5b-slips-immune](https://huggingface.co/stratosphere/qwen2.5-1.5b-slips-immune)
87
**Judge:** gpt-oss-120b | **Incidents evaluated:** 47 (44 scored, 3 missing) | **Date:** 2026-04-05

docs/immune/finetuning_summarization_procedure.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
### Summarization Fine-Tuning: Dataset and Training Procedure
22

3-
**Keywords:** Incident Summarization, SFT, LoRA, Dataset Filtering, Qwen2.5-1.5B
43

5-
**TL;DR:** The summarization model is trained on a quality-filtered subset of the Slips summarization dataset, using the highest-scoring model response per incident as the training target. The same general LoRA+Unsloth pipeline applies; this document covers the summarization-specific dataset preparation and system prompt.
4+
**Summary:** The summarization model is trained on a quality-filtered subset of the Slips summarization dataset, using the highest-scoring model response per incident as the training target. The same general LoRA+Unsloth pipeline applies; this document covers the summarization-specific dataset preparation and system prompt.
65

76
---
87

0 commit comments

Comments
 (0)