Operations and deployment guide for running llm-batch-pipeline in production.
- Python 3.13+
uvpackage manager- For OpenAI backend:
OPENAI_API_KEYenvironment variable - For Ollama backend: One or more Ollama servers with models pulled
uv sync # Install all dependencies
uv sync --group dev # Include dev tools (pytest, ruff, pylint)All dependencies are pinned in uv.lock. The project follows a strict supply chain policy:
- Versions and hashes are pinned
- Packages published < 48 hours ago are flagged
- Maintainer account changes are flagged
- Typosquatting names are checked against popular packages
- GPL/AGPL dependencies are permitted
| Package | Purpose | License |
|---|---|---|
httpx |
Ollama HTTP client | BSD-3 |
openai |
OpenAI API client | Apache-2.0 |
openpyxl |
XLSX export | MIT |
prometheus-client |
Metrics (zero transitive deps) | Apache-2.0 |
pydantic |
Schema validation | MIT |
python-dotenv |
Environment file loading | BSD-3 |
rich |
Terminal UI | MIT |
selectolax |
HTML parsing | MIT |
charset-normalizer |
Charset fallback | MIT |
mail-parser-reply |
Reply chain stripping | Apache-2.0 |
# Pull model
ollama pull llama3.1:8b
# Run pipeline
uv run llm-batch-pipeline run \
--batch-dir batches/batch_001_test \
--plugin spam_detection \
--backend ollama \
--model llama3.1:8b \
--num-parallel-jobs 4 \
--auto-approveFor servers with multiple GPUs (each running an Ollama instance):
uv run llm-batch-pipeline run \
--batch-dir batches/batch_001_test \
--plugin spam_detection \
--backend ollama \
--base-url http://gpu1:11434 \
--base-url http://gpu2:11434 \
--base-url http://gpu3:11434 \
--num-parallel-jobs 4 \
--model llama3.1:70b \
--auto-approveThe pipeline automatically:
- Shards the JSONL across servers (round-robin)
- Creates per-shard thread pools
- Aggregates results back into a single output
export OPENAI_API_KEY="sk-..."
uv run llm-batch-pipeline run \
--batch-dir batches/batch_001_test \
--plugin spam_detection \
--backend openai \
--model gpt-4o-mini \
--poll-interval 30 \
--auto-approveOpenAI batches have a 24h completion window. Use --no-wait to submit and check later:
# Submit without waiting
uv run llm-batch-pipeline submit \
--batch-dir batches/batch_001_test \
--backend openai \
--no-wait
# Resume monitoring later
uv run llm-batch-pipeline submit \
--batch-dir batches/batch_001_test \
--backend openai \
--resume-batch-id batch_abc123Enable metrics collection with --metrics-port:
uv run llm-batch-pipeline run --metrics-port 9090 ...Add to your prometheus.yml:
scrape_configs:
- job_name: 'llm-batch-pipeline'
static_configs:
- targets: ['localhost:9090']Available metrics:
| Metric | Type | Description |
|---|---|---|
pipeline_stage_duration_seconds |
Histogram | Duration per pipeline stage |
pipeline_requests_total |
Counter | Total processed requests |
pipeline_requests_failed_total |
Counter | Failed requests |
pipeline_active_requests |
Gauge | Currently in-flight requests |
All runs produce structured JSONL logs in the batch's logs/ directory:
logs/
├── pipeline.jsonl # Structured event log (every step)
└── metrics.json # Aggregated timing and count metrics
Each log line contains:
{
"timestamp": "2026-04-08T12:34:56.789Z",
"level": "info",
"logger": "llm_batch_pipeline.stages",
"step": "discover",
"status": "ok",
"duration_ms": 42.5,
"message": "Discovered 500 files"
}By default, all HTTPS connections verify TLS certificates. Use --insecure / -k only for development:
# Development only — disables TLS verification
uv run llm-batch-pipeline submit --insecure ...Store API keys in .env files (loaded via python-dotenv) or environment variables. Never commit .env files.
Output directories are created with default permissions. On shared systems, consider restricting access:
chmod 700 batches/batch_001_sensitive/1. init → Creates batch_NNN_name/ with input/, evaluation/, config.toml
2. populate → User copies input files into input/
3. run/render → Creates job/ with JSONL shards
4. submit → Creates output/ with LLM responses
5. validate → Creates results/ with validated JSON
6. evaluate → Creates export/evaluation.json
7. export → Creates export/*.xlsx
8. archive → User archives or deletes the batch directory
Batch directories are self-contained. To remove a completed batch:
rm -rf batches/batch_001_test/"No plugins registered" — Ensure you installed the package (uv sync). Built-in plugins (spam_detection, gdpr_detection) are auto-registered on import.
"Input directory not found" — The input/ subdirectory must exist in the batch directory and contain files the plugin's reader can handle.
"Batch directory not found" — The --batch-dir argument is resolved relative to --batch-jobs-root (default: ./batches). You can also pass an absolute or relative path directly.
OpenAI rate limits — The Batch API has its own rate limits separate from the real-time API. Check your OpenAI dashboard for quota.
Ollama timeouts — Increase --request-timeout for large models or slow hardware. The default is 600 seconds.
Schema validation failures — Ensure the LLM model supports structured output. Check results/validated.json for per-row error details.