YourBench provides a rich command-line interface for generating evaluation datasets from your documents.
# Install with uv (recommended)
uv pip install yourbench
# Or run directly without installing
uvx --from yourbench yourbench --help| Command | Description |
|---|---|
run |
Run the full pipeline with a config file |
validate |
Check a config file without running |
estimate |
Estimate token usage before running |
init |
Generate a starter config interactively |
stages |
List all available pipeline stages |
version |
Show YourBench version |
Run the YourBench pipeline with a configuration file.
yourbench run <config_path> [OPTIONS]Arguments:
config_path- Path to your YAML configuration file (required)
Options:
--debug, -d- Enable debug logging (shows detailed progress)--quiet, -q- Minimal output (only errors)--no-banner- Hide the startup banner
Examples:
# Basic run
yourbench run config.yaml
# With debug output
yourbench run config.yaml --debug
# Quiet mode for scripts
yourbench run config.yaml --quietOutput:
- Progress bars for each pipeline stage
- Token usage statistics per stage
- Final dataset location (Hub URL or local path)
Validate a configuration file without running the pipeline. Useful for catching errors before a long run.
yourbench validate <config_path>Arguments:
config_path- Path to YAML config file to validate (required)
Examples:
yourbench validate config.yamlOutput:
✓ Configuration is valid!
Configuration Summary
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Setting ┃ Value ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Dataset │ my-benchmark │
│ Push to Hub │ ✓ │
│ Private │ ✗ │
│ Models │ openai/gpt-4o-mini │
│ Stages │ ingestion, summarization, chunking, ... │
└─────────────┴────────────────────────────────────────────────────────────────┘
Enabled stages (5):
1. ingestion
2. summarization
3. chunking
4. single_hop_question_generation
5. prepare_lighteval
Checks performed:
- YAML syntax validity
- Required fields present
- Model configuration correct
- Stage dependencies satisfied
- Environment variables resolved
Estimate token usage for a pipeline run before executing it. Helps with cost planning.
yourbench estimate <config_path>Arguments:
config_path- Path to YAML config file (required)
Examples:
yourbench estimate config.yamlOutput:
Source Documents:
Files: 3
Estimated tokens: 15.2K
Token Estimation by Stage
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Stage ┃ Input Tokens ┃ Output Tokens ┃ API Calls ┃ Notes ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ Ingestion │ - │ - │ - │ No LLM calls │
│ Summarization │ 4.5K │ 6.0K │ 3 │ │
│ Chunking │ - │ - │ - │ No LLM calls │
│ Single Hop QG │ 27.6K │ 4.5K │ 3 │ │
└─────────────────┴──────────────┴───────────────┴───────────┴─────────────────┘
╭─────── Summary ────────╮
│ Total Estimated Usage: │
│ Input tokens: 32.1K │
│ Output tokens: 10.5K │
│ Total: 42.6K │
╰────────────────────────╯
Notes:
- Estimates use tiktoken for accurate token counting
- Actual usage may vary based on model responses
- Stages without LLM calls (ingestion, chunking) show "-"
Generate a starter configuration file interactively.
yourbench init [OPTIONS]Options:
--output, -o- Output file path (default:config.yaml)--force, -f- Overwrite existing file without prompting
Examples:
# Create config.yaml in current directory
yourbench init
# Create with custom name
yourbench init -o my-project/config.yaml
# Overwrite existing
yourbench init -o config.yaml --forceInteractive prompts:
- Dataset name for HuggingFace Hub
- Model provider (OpenAI, HuggingFace, local vLLM, custom)
- Source documents directory
- Pipeline stages to enable
- Output preferences (Hub push, local save)
Display all available pipeline stages with descriptions.
yourbench stagesOutput:
Pipeline Stages
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ # ┃ Stage ┃ Description ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1 │ ingestion │ Process source documents │
│ 2 │ summarization │ Generate summaries │
│ 3 │ chunking │ Split into chunks │
│ 4 │ single_hop_question_generation │ Generate standalone Q&A pairs │
│ 5 │ multi_hop_question_generation │ Multi-chunk questions │
│ 6 │ cross_document_question_generation │ Cross-document questions │
│ 7 │ question_rewriting │ Rewrite for clarity │
│ 8 │ prepare_lighteval │ Format for LightEval │
│ 9 │ citation_score_filtering │ Filter by citation quality │
└─────┴────────────────────────────────────┴───────────────────────────────────┘
Stage details:
| Stage | LLM Required | Description |
|---|---|---|
ingestion |
No | Parse PDFs, Word docs, HTML into Markdown |
summarization |
Yes | Generate document summaries |
chunking |
No | Split documents into semantic chunks |
single_hop_question_generation |
Yes | Q&A pairs from individual chunks |
multi_hop_question_generation |
Yes | Questions requiring multiple chunks |
cross_document_question_generation |
Yes | Questions spanning documents |
question_rewriting |
Yes | Improve question clarity |
prepare_lighteval |
No | Format for evaluation framework |
citation_score_filtering |
No | Filter low-quality citations |
Show the installed YourBench version.
yourbench versionOutput:
YourBench v0.9.0
The CLI respects these environment variables (can also be set in .env):
| Variable | Description |
|---|---|
HF_TOKEN |
HuggingFace token for Hub operations |
HF_ORGANIZATION |
Default organization for dataset uploads |
OPENAI_API_KEY |
OpenAI API key |
OPENAI_BASE_URL |
Custom OpenAI-compatible endpoint |
OPENAI_MODEL |
Default model name |
Use $VAR_NAME syntax in config files to reference environment variables:
model_list:
- model_name: $OPENAI_MODEL
api_key: $OPENAI_API_KEY
base_url: $OPENAI_BASE_URLTypical workflow for generating a benchmark:
# 1. Generate starter config
yourbench init -o my-benchmark/config.yaml
# 2. Edit config as needed
vim my-benchmark/config.yaml
# 3. Validate before running
yourbench validate my-benchmark/config.yaml
# 4. Estimate costs
yourbench estimate my-benchmark/config.yaml
# 5. Run the pipeline
yourbench run my-benchmark/config.yaml --debug"Config validation failed"
- Run
yourbench validate config.yamlfor detailed error messages - Check that all required environment variables are set
"No documents found"
- Verify
source_documents_dirpath exists - Check file extensions are supported (.pdf, .md, .txt, .docx, .html)
"API rate limit exceeded"
- Reduce
max_concurrent_requestsin model config - Add delays between runs
"Token limit exceeded"
- Use
yourbench estimateto check token usage - Reduce chunk size or number of questions per chunk
See FAQ for more troubleshooting tips.