Skip to content

EC-DIGIT-CSIRC/llm_batch_pipeliner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Batch Pipeline

llm-batch-pipeline is a generic LLM batch processing pipeline. It discovers and parses input files via a plugin system, renders OpenAI Batch API (or Ollama) requests, validates structured JSON outputs with a Pydantic schema, evaluates predictions against ground truth, and exports results to XLSX/JSON.

See the Getting Started Guide for a tested end-to-end walkthrough with OpenAI Batch API and a 3-way sharded Ollama setup (docs/getting-started.md). See the User Guide for installation and CLI reference (docs/user-guide.md). See the Admin Guide for installation/deployment, and the Developer Guide for how to extend the pipeline with custom plugins, prompts, schemas, and evaluation.

Workflow

flowchart TD
    A[Input Files] --> B[1. Discover]
    B --> C[2. Filter - pre]
    C --> D[3. Transform]
    D --> E[4. Filter - post]
    E --> F[5. Render JSONL]
    F --> G{6. Human Review}
    G -->|Approved| H[7. Submit to Backend]
    G -->|--auto-approve| H
    G -->|Rejected| Z[Abort]
    H --> I[8. Validate Results]
    I --> J[9. Evaluate]
    J --> K[10. Export]

    subgraph Backends
        H --> H1[OpenAI Batch API]
        H --> H2[Ollama Local]
    end

    H1 --> I
    H2 --> I

    subgraph Outputs
        K --> K1[results.xlsx]
        K --> K2[evaluation.xlsx]
        K --> K3[evaluation.json]
        K --> K4[metrics.json]
    end
Loading

Admin / Install

Requirements

  • OpenAI backend: set OPENAI_API_KEY.
  • Local LLM via Ollama: run an Ollama server (pull the model), then use --backend ollama --base-url http://HOST:11434 (repeat --base-url for multi-server sharding).
  • OpenAI API compatible local server (if supported by your server): use --backend openai and configure the OpenAI SDK base URL (commonly via OPENAI_BASE_URL).

Getting Started

  • End-to-end walkthrough: docs/getting-started.md
  • The getting-started guide was tested against live OpenAI Batch and Ollama services.

Quick Test (offline)

Run the unit test suite (no external LLM services):

uv sync --group dev
uv run pytest -q

Plugins

List registered plugins:

uv run llm-batch-pipeline list

The built-in examples include spam_detection and gdpr_detection.

Test / Benchmark

Architecture

Extend (plugins)

Monitor

License

About

A generic approach to submit a folder of files and apply an LLM prompt (+ structured JSON output checking) on each file via an LLM. Think "lambda functions"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages