VuLLM

Fine-tuning Large Language Models for C Vulnerability Detection and Classification (CWE) with Structured Reasoning

Caution

Ongoing experimental work — agentic RAG pipeline with a CWE knowledge base over the full MITRE corpus, vector retrieval, multi-pass inference (LangGraph), and an MCP server — lives on the agent branch. main is the frozen fine-tuning baseline.

Overview

This repository contains the code and experimental pipeline for my master's thesis on using fine-tuned LLMs for automated vulnerability detection and classification in C code. The approach generates structured JSON outputs containing both natural language security reasoning and CWE (Common Weakness Enumeration) classifications.

Key findings:

Pessimistic assumptions combined with CWE guidance achieve 4.3× higher F1-score than Random Forest baseline
Neither assumptions alone nor CWE guidance alone achieves substantial improvement—only their combination is effective
Training data quality (not prompt design) is the primary bottleneck, with a 15.5% recall ceiling across all configurations
Diagnostic suite validation shows 91.7% accuracy on handcrafted test cases

Repository Structure

VuLLM/
├── .clang-format               # Clang-format config file
├── deepspeed                   # Deepspeed files
├── DoneBot                     # Submodule for async notifications
├── LICENSE
├── pixi.toml                   # Dependency management
├── pyproject.toml
├── README.md
├── rusty                       # Rust implementation
│   ├── Cargo.toml
│   ├── pixi.toml
│   ├── tests
│   ├── src
│   │   ├── lib.rs
│   │   ├── main.rs
│   │   ├── mitre               # MITRE db entries
│   │   └── processor_lib       # Tree-sitter parsing, GCC repair, AST validation, CWE enrichment
├── src                         # Python
│   ├── core
│   │   ├── cot                 # CoT generation and Jury for quality assessment
│   │   ├── cot_training        # Fine-tuning
│   │   └── random_forest       # RFC baseline
│   ├── dataset                 # Dataset utilities
│   └── test_env_integrity      # Environment validation
└── text_prompts                # Prompts applied

Requirements

Preprocessing: Rust 1.89+
Training/Evaluation: Python 3.12
Hardware: NVIDIA L40s GPU with 48GB VRAM for training
Dependencies: Managed via pixi

Installation

# Clone the repository with submodules
git clone --recurse-submodules https://github.com/MatteoGuglielmi-tech/VuLLM.git
cd VuLLM

# If already cloned without submodules, initialize them:
# git submodule update --init --recursive

# Install dependencies
pixi install

# Build Rust preprocessing pipeline
cd rusty && cargo build --release

Usage

Each component includes built-in argument parsing. Use --help for available options and usage examples.

Component	Command
Preprocessing	`cd rusty && cargo run --release -- --help`
Training/Evaluation	`pixi run python python -m src.core.cot_training.main --help`

Experimental Configurations

The thesis evaluates 6 configurations in a 3×2 factorial design:

Config	Assumption Mode	CWE Guidance	F1-Score	Recall
1	Free	No	15.1%	8.7%
2	Free	Yes	15.9%	9.1%
3	Optimistic	No	12.9%	7.2%
4	Optimistic	Yes	17.3%	9.8%
5	Pessimistic	No	15.2%	8.7%
6	Pessimistic	Yes	24.8%	15.5%

Dataset

This work uses the DiverseVul dataset. Due to licensing, we cannot redistribute processed data. The preprocessing pipeline can be applied to the publicly available dataset to reproduce our results.

Final dataset: 5888 samples (4302 train / 743 val / 843 test)

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

DiverseVul dataset authors for making vulnerability data publicly available
Qwen team for the Qwen2.5-Coder model
Unsloth for efficient fine-tuning infrastructure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VuLLM

Overview

Repository Structure

Requirements

Installation

Usage

Experimental Configurations

Dataset

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
DoneBot @ c227fc3		DoneBot @ c227fc3
deepspeed		deepspeed
rusty		rusty
src		src
text_prompts		text_prompts
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

VuLLM

Overview

Repository Structure

Requirements

Installation

Usage

Experimental Configurations

Dataset

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages