Quick Start Guide

Get Started in 5 Minutes

Prerequisites

Python 3.8+
pip
API key (OpenAI or Anthropic)

Step 1: Clone & Install

# Clone repository
git clone https://github.com/hoadm-net/ViPERSQL.git
cd ViPERSQL

# Install dependencies
pip install -r requirements.txt

# Install spaCy models (for POS tagging)
python -m spacy download en_core_web_sm

Step 2: Configure API Keys

# Copy environment template
cp .env.example .env

# Edit .env and add API keys
nano .env

Minimum configuration in .env:

# Choose one (or both)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Optional: Choose default model
DEFAULT_MODEL=gpt-4o

Step 3: Run First Evaluation

Option A: Zero-shot (Simplest)

python vipersql.py --samples 10

Expected output:

🚀 ViPERSQL - Vietnamese Text-to-SQL System
Strategy: ZERO-SHOT
Model: gpt-4o
Samples: 10

Processing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%

✅ Evaluation Complete!
Exact Match: 70.0%
Avg F1: 82.5%

Results saved to: results/zero-shot_10_20251201_103045/

Option B: Few-shot with ViR2 (Recommended)

python vipersql.py \
  --strategy few-shot \
  --example-selection-strategy vir2 \
  --samples 10

Expected output:

🚀 ViPERSQL - Vietnamese Text-to-SQL System
Strategy: FEW-SHOT
Selector: ViR2 (Two-Stage)
Model: gpt-4o
Samples: 10

[ViR2] Loading meaning pool... ✓
[ViR2] Loaded 1000 training examples

Processing sample 1/10...
[ViR2] Stage 1: Retrieved 50 candidates
[ViR2] Stage 2: Selected 3 examples
Generated SQL: SELECT...

...

✅ Evaluation Complete!
Exact Match: 85.0%
Avg F1: 91.2%

Results saved to: results/few-shot-vir2_10_20251201_103112/

Step 4: View Results

cd results/few-shot-vir2_10_20251201_103112/

# View human-readable report
cat eval_report.txt

# View detailed metrics (JSON)
cat eval_results.json

# View all predictions
cat predictions.json

Common Use Cases

1. Quick Test (10 samples)

python vipersql.py --samples 10

2. Full Evaluation (100+ samples)

python vipersql.py \
  --strategy few-shot \
  --example-selection-strategy vir2 \
  --samples 100

3. Different Models

# OpenAI GPT-4o
python vipersql.py --model gpt-4o --samples 10

# OpenAI GPT-4o-mini (cheaper)
python vipersql.py --model gpt-4o-mini --samples 10

# Anthropic Claude
python vipersql.py --model claude-3-5-sonnet-20241022 --samples 10

4. Different Dataset Levels

# Standard level (default)
python vipersql.py --level std --samples 10

# Syllable level
python vipersql.py --level syllable --samples 10

# Word level
python vipersql.py --level word --samples 10

5. Different Splits

# Dev split (default)
python vipersql.py --split dev --samples 10

# Test split
python vipersql.py --split test --samples 10

Parameter Reference

Essential Parameters

Parameter	Values	Default	Description
`--samples`	Integer	all	Number of samples to process
`--strategy`	`zero-shot`, `few-shot`, `cot`	`zero-shot`	Generation strategy
`--model`	`gpt-4o`, `gpt-4o-mini`, `claude-3-5-sonnet-20241022`	`gpt-4o`	LLM model

Few-shot Parameters

Parameter	Values	Default	Description
`--example-selection-strategy`	`random`, `dicl`, `astres`, `skill_knn`, `vir2`	`random`	Selector method
`--few-shot-examples`	Integer	3	Number of examples

ViR2 Parameters

Parameter	Values	Default	Description
`--vir2-candidate-pool-size`	Integer	50	Stage 1 pool size (M)
`--vir2-beam-size`	Integer	5	Beam search width (B)
`--vir2-diversity-weight`	Float [0-1]	0.3	Diversity weight (λ)

Dataset Parameters

Parameter	Values	Default	Description
`--level`	`std`, `syllable`, `word`	`std`	Text segmentation level
`--split`	`dev`, `test`	`dev`	Dataset split

Troubleshooting

Error: API key not found

Solution:

# Check .env file exists
ls -la .env

# Verify API key is set
cat .env | grep API_KEY

Error: Module not found

Solution:

# Reinstall dependencies
pip install -r requirements.txt

# Install spaCy models
python -m spacy download en_core_web_sm

Error: Pre-computed embeddings not found

Solution:

# Build DICL candidates (for ViR2)
python scripts/build_dicl_candidates.py \
  --dataset-path dataset/ViText2SQL \
  --level std

Slow performance

Solutions:

Use fewer samples: --samples 10
Use faster model: --model gpt-4o-mini
Use simpler strategy: --strategy zero-shot
Use simpler selector: --example-selection-strategy dicl

High API costs

Solutions:

Use gpt-4o-mini instead of gpt-4o
Use zero-shot instead of few-shot
Reduce --samples
Lower --few-shot-examples

Next Steps

Learn More

Architecture - Understand system design
ViR2 Method - Deep dive into ViR2
Strategies - Compare strategies
Configuration - All parameters

Advanced Usage

Usage Examples - Real-world scenarios
Ablation Experiments - Test components
Extending System - Add new features

Development

API Reference - Complete API docs
Multi-Language Support - Vietnamese & English

Common Commands Cheat Sheet

# Quick test
python vipersql.py --samples 10

# Full ViR2 evaluation
python vipersql.py --strategy few-shot --example-selection-strategy vir2 --samples 100

# Budget-friendly
python vipersql.py --model gpt-4o-mini --samples 100

# Chain-of-thought
python vipersql.py --strategy cot --samples 50

# Custom ViR2 parameters
python vipersql.py \
  --strategy few-shot \
  --example-selection-strategy vir2 \
  --vir2-candidate-pool-size 100 \
  --vir2-beam-size 10 \
  --samples 100

# Ablation: ViR2 without POS
python vipersql.py --strategy few-shot --example-selection-strategy vir2-no-pos --samples 100

# BIRD dataset (if available)
python bird_vi_vir2_fewshot.py --samples 100

Need Help?

Check Configuration for all options
See Usage Examples for scenarios
Read Troubleshooting section above

You're ready to go! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide

Get Started in 5 Minutes

Prerequisites

Step 1: Clone & Install

Step 2: Configure API Keys

Step 3: Run First Evaluation

Option A: Zero-shot (Simplest)

Option B: Few-shot with ViR2 (Recommended)

Step 4: View Results

Common Use Cases

1. Quick Test (10 samples)

2. Full Evaluation (100+ samples)

3. Different Models

4. Different Dataset Levels

5. Different Splits

Parameter Reference

Essential Parameters

Few-shot Parameters

ViR2 Parameters

Dataset Parameters

Troubleshooting

Error: API key not found

Error: Module not found

Error: Pre-computed embeddings not found

Slow performance

High API costs

Next Steps

Learn More

Advanced Usage

Development

Common Commands Cheat Sheet

Need Help?

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Quick Start Guide

Get Started in 5 Minutes

Prerequisites

Step 1: Clone & Install

Step 2: Configure API Keys

Step 3: Run First Evaluation

Option A: Zero-shot (Simplest)

Option B: Few-shot with ViR2 (Recommended)

Step 4: View Results

Common Use Cases

1. Quick Test (10 samples)

2. Full Evaluation (100+ samples)

3. Different Models

4. Different Dataset Levels

5. Different Splits

Parameter Reference

Essential Parameters

Few-shot Parameters

ViR2 Parameters

Dataset Parameters

Troubleshooting

Error: API key not found

Error: Module not found

Error: Pre-computed embeddings not found

Slow performance

High API costs

Next Steps

Learn More

Advanced Usage

Development

Common Commands Cheat Sheet

Need Help?