Skip to content

Commit fd8ccf4

Browse files
committed
Release v0.2.1
1 parent e82c822 commit fd8ccf4

11 files changed

Lines changed: 71 additions & 213 deletions

File tree

.github/workflows/test.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
name: Tests
22
on:
33
push:
4-
branches: [main]
4+
branches: [main, dev]
55
pull_request:
6-
branches: [main]
6+
branches: [main, dev]
77

88
jobs:
99
test:

README.md

Lines changed: 20 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,18 @@
44

55
---
66

7-
[![GitHub Stars](https://img.shields.io/github/stars/ChicagoHAI/AutoChecklist?style=flat-square)](https://github.com/ChicagoHAI/AutoChecklist)
8-
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
9-
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=flat-square)](LICENSE)
7+
<p align="center">
8+
<a href="https://github.com/ChicagoHAI/AutoChecklist"><img src="https://img.shields.io/github/stars/ChicagoHAI/AutoChecklist?style=flat-square" alt="GitHub Stars"></a>
9+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg?style=flat-square" alt="Python 3.10+"></a>
10+
<a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-green.svg?style=flat-square" alt="License"></a>
11+
<a href="https://autochecklist.github.io/"><img src="https://img.shields.io/badge/site-autochecklist.github.io-purple?style=flat-square" alt="Site"></a>
12+
</p>
13+
14+
`AutoChecklist` is an open-source library that unifies LLM-based checklist evaluation into composable pipelines, in a `pip`-installable Python package (`autochecklist`) with CLI and UI features.
1015

11-
`AutoChecklist` is an open-source library that unifies LLM-based checklist evaluation into composable pipelines, in a `pip`-installable Python package (`autochecklist`) with CLI and UI features.
16+
<p align="center">
17+
<img src="docs/frames.gif" alt="AutoChecklist demo" width="700">
18+
</p>
1219

1320
### Features
1421
- **Five checklist generator abstractions** that organize methods from research by their reasoning strategies for deriving evaluation criteria
@@ -114,124 +121,39 @@ Refiners are pipeline stages that clean up raw checklists before scoring. They'r
114121

115122

116123

117-
## Using the Package
118-
119-
### Custom Prompts
120-
121-
Write a prompt template and generate a checklist:
122-
123-
```python
124-
from autochecklist import DirectGenerator, ChecklistScorer
125-
126-
gen = DirectGenerator(
127-
custom_prompt="You are an expert evaluator. Generate yes/no checklist questions to score:\n\n{input}",
128-
model="openai/gpt-5-mini",
129-
)
130-
checklist = gen.generate(input="Write a haiku about autumn.")
131-
132-
scorer = ChecklistScorer(mode="batch", model="openai/gpt-5-mini")
133-
score = scorer.score(checklist, target="Leaves fall gently down...")
134-
print(f"Pass rate: {score.pass_rate:.0%}")
135-
```
136-
137-
Scorers also take custom prompts. Prompts can also be loaded from `.md` files — see [Custom Prompts](docs/user-guide/custom-prompts.md) for the full guide (placeholders, custom scorers, registration).
138-
139-
### Custom Pipelines
140-
141-
Register a custom pipeline (generator + scorer + prompts) as a reusable unit:
142-
143-
```python
144-
from autochecklist import register_custom_pipeline, pipeline
145-
146-
# Register from config
147-
register_custom_pipeline(
148-
"my_eval",
149-
generator_prompt="Generate yes/no questions for:\n\n{input}",
150-
scorer="weighted",
151-
)
152-
pipe = pipeline("my_eval", generator_model="openai/gpt-5-mini")
153-
154-
# Or register from an existing pipeline instance
155-
register_custom_pipeline("my_eval_v2", pipe)
156-
157-
# Save/load pipeline configs as JSON
158-
from autochecklist import save_pipeline_config, load_pipeline_config
159-
save_pipeline_config("my_eval", "my_eval.json")
160-
load_pipeline_config("my_eval.json") # registers and returns the name
161-
```
162-
163-
### Built-in Pipelines
164-
165-
The library includes pipelines implementing methods from research papers. Use them via `method_name` or the `pipeline()` shorthand:
124+
## Quick Start
166125

167126
```python
168127
from autochecklist import pipeline
169128

170129
pipe = pipeline("tick", generator_model="openai/gpt-5-mini", scorer_model="openai/gpt-5-mini")
171-
result = pipe(input="Write a haiku about autumn", target="Leaves fall gently...")
130+
result = pipe(input="Write a haiku about autumn.", target="Leaves fall gently down...")
172131
print(f"Pass rate: {result.pass_rate:.0%}")
173132
```
174133

175-
See [Supported Pipelines](docs/user-guide/supported-pipelines.md) for the full list of pipelines, paper details, and configuration options.
176-
177-
### Batch Evaluation
178-
179-
```python
180-
data = [
181-
{"input": "Write a haiku", "target": "Leaves fall..."},
182-
{"input": "Write a limerick", "target": "There once was..."},
183-
]
184-
result = pipe.run_batch(data, show_progress=True)
185-
print(f"Macro pass rate: {result.macro_pass_rate:.0%}")
186-
```
187-
188-
For pipeline composition, provider configuration, and the full API, see the [Pipeline Guide](docs/user-guide/pipeline.md).
134+
See the [Quick Start guide](https://autochecklist.github.io/getting-started/quickstart/) for custom prompts, batch evaluation, and more.
189135

190-
### Command-Line Interface
191-
192-
Run evaluations directly from the terminal:
136+
### CLI
193137

194138
```bash
195-
# Full evaluation (generate + score)
196139
autochecklist run --pipeline tick --data eval_data.jsonl -o results.jsonl \
197140
--generator-model openai/gpt-4o-mini --scorer-model openai/gpt-4o-mini
198-
199-
# Generate checklists only
200-
autochecklist generate --pipeline tick --data inputs.jsonl -o checklists.jsonl \
201-
--generator-model openai/gpt-4o-mini
202-
203-
# Score with existing checklist
204-
autochecklist score --data eval_data.jsonl --checklist checklist.json \
205-
-o results.jsonl --scorer-model openai/gpt-4o-mini
206-
207-
# List available pipelines
208-
autochecklist list
209141
```
210142

211-
API keys can be set via `--api-key`, environment variables (`OPENROUTER_API_KEY`), or a `.env` file. See the [CLI Guide](docs/user-guide/cli.md) for full details.
212-
213-
### Examples
214-
215-
Detailed examples with runnable code:
216-
217-
- **[custom_components_tutorial.ipynb](examples/custom_components_tutorial.ipynb)** - Create your own generators, scorers, and refiners
218-
- **[pipeline_demo.ipynb](examples/pipeline_demo.ipynb)** - Pipeline API, registry, batch evaluation, export
219-
- **[instance_level_demo.ipynb](examples/instance_level_demo.ipynb)** - DirectGenerator, ContrastiveGenerator (per-input checklists)
220-
- **[corpus_level_demo.ipynb](examples/corpus_level_demo.ipynb)** - InductiveGenerator, DeductiveGenerator, InteractiveGenerator (per-dataset checklists)
143+
See the [CLI guide](https://autochecklist.github.io/user-guide/cli/) for all commands.
221144

222145

223146
## UI
224147

225148
A web interface for demonstrating `autochecklist` methods. See [ui/README.md](ui/README.md) for details.
226149

227-
**Quick Start:**
228150
```bash
229-
cd ui
230-
./launch_ui.sh
231-
# Frontend: http://localhost:7860
232-
# Backend: http://localhost:7861
151+
autochecklist ui # or: cd ui && ./launch_ui.sh
152+
autochecklist ui --dev # development mode (hot-reload)
233153
```
234154

155+
> The `ui` subcommand is only available from a source checkout.
156+
235157
## Testing
236158

237159
> [!WARNING]

README.pypi.md

Lines changed: 13 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
# AutoChecklist
22

3-
[![GitHub Stars](https://img.shields.io/github/stars/ChicagoHAI/AutoChecklist?style=flat-square)](https://github.com/ChicagoHAI/AutoChecklist)
4-
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
5-
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=flat-square)](LICENSE)
3+
<p align="center">
4+
<a href="https://github.com/ChicagoHAI/AutoChecklist"><img src="https://img.shields.io/github/stars/ChicagoHAI/AutoChecklist?style=flat-square" alt="GitHub Stars"></a>
5+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-blue.svg?style=flat-square" alt="Python 3.10+"></a>
6+
<a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-green.svg?style=flat-square" alt="License"></a>
7+
<a href="https://autochecklist.github.io/"><img src="https://img.shields.io/badge/site-autochecklist.github.io-purple?style=flat-square" alt="Site"></a>
8+
</p>
69

710
A library of composable pipelines for generating and scoring checklist criteria.
811

@@ -33,7 +36,7 @@ Each generator is customizable via prompt templates (`.md` files with `{input}`,
3336

3437
### Built-in Pipelines
3538

36-
The library includes built-in pipelines implementing methods from research papers ([TICK](https://arxiv.org/abs/2410.03608), [RocketEval](https://arxiv.org/abs/2503.05142), [RLCF](https://arxiv.org/abs/2507.18624), [CheckEval](https://arxiv.org/abs/2403.18771), [InteractEval](https://arxiv.org/abs/2409.07355), and more). See [Supported Pipelines](https://github.com/ChicagoHAI/AutoChecklist/blob/main/docs/user-guide/pipelines.md) for the full list and configuration details.
39+
The library includes built-in pipelines implementing methods from research papers ([TICK](https://arxiv.org/abs/2410.03608), [RocketEval](https://arxiv.org/abs/2503.05142), [RLCF](https://arxiv.org/abs/2507.18624), [OpenRubrics](https://arxiv.org/abs/2510.07743), [CheckEval](https://arxiv.org/abs/2403.18771), [InteractEval](https://arxiv.org/abs/2409.07355), and more). See [Supported Pipelines](https://autochecklist.github.io/user-guide/supported-pipelines/) for the full list and configuration details.
3740

3841
### Scoring
3942

@@ -78,114 +81,30 @@ pip install "autochecklist[all]"
7881

7982
For development installation from source, see the [GitHub repository](https://github.com/ChicagoHAI/AutoChecklist).
8083

81-
## Using the Package
82-
83-
### Custom Prompts
84-
85-
Write a prompt template and generate a checklist:
86-
87-
```python
88-
from autochecklist import DirectGenerator, ChecklistScorer
89-
90-
gen = DirectGenerator(
91-
custom_prompt="You are an expert evaluator. Generate yes/no checklist questions to score:\n\n{input}",
92-
model="openai/gpt-5-mini",
93-
)
94-
checklist = gen.generate(input="Write a haiku about autumn.")
95-
96-
scorer = ChecklistScorer(mode="batch", model="openai/gpt-5-mini")
97-
score = scorer.score(checklist, target="Leaves fall gently down...")
98-
print(f"Pass rate: {score.pass_rate:.0%}")
99-
```
100-
101-
Scorers also take custom prompts. Prompts can also be loaded from `.md` files — see [Custom Prompts](https://github.com/ChicagoHAI/AutoChecklist/blob/main/docs/user-guide/custom-prompts.md) for the full guide (placeholders, custom scorers, registration).
102-
103-
### Custom Pipelines
104-
105-
Register a custom pipeline (generator + scorer + prompts) as a reusable unit:
106-
107-
```python
108-
from autochecklist import register_custom_pipeline, pipeline
109-
110-
# Register from config
111-
register_custom_pipeline(
112-
"my_eval",
113-
generator_prompt="Generate yes/no questions for:\n\n{input}",
114-
scorer="weighted",
115-
)
116-
pipe = pipeline("my_eval", generator_model="openai/gpt-5-mini")
117-
118-
# Or register from an existing pipeline instance
119-
register_custom_pipeline("my_eval_v2", pipe)
120-
121-
# Save/load pipeline configs as JSON
122-
from autochecklist import save_pipeline_config, load_pipeline_config
123-
save_pipeline_config("my_eval", "my_eval.json")
124-
load_pipeline_config("my_eval.json") # registers and returns the name
125-
```
126-
127-
### Built-in Pipelines
128-
129-
The library includes pipelines implementing methods from research papers. Use them via `method_name` or the `pipeline()` shorthand:
84+
## Quick Start
13085

13186
```python
13287
from autochecklist import pipeline
13388

13489
pipe = pipeline("tick", generator_model="openai/gpt-5-mini", scorer_model="openai/gpt-5-mini")
135-
result = pipe(input="Write a haiku about autumn", target="Leaves fall gently...")
90+
result = pipe(input="Write a haiku about autumn.", target="Leaves fall gently down...")
13691
print(f"Pass rate: {result.pass_rate:.0%}")
13792
```
13893

139-
See [Supported Pipelines](https://github.com/ChicagoHAI/AutoChecklist/blob/main/docs/user-guide/pipelines.md) for the full list of pipelines, paper details, and configuration options.
140-
141-
### Batch Evaluation
142-
143-
```python
144-
data = [
145-
{"input": "Write a haiku", "target": "Leaves fall..."},
146-
{"input": "Write a limerick", "target": "There once was..."},
147-
]
148-
result = pipe.run_batch(data, show_progress=True)
149-
print(f"Macro pass rate: {result.macro_pass_rate:.0%}")
150-
```
151-
152-
For pipeline composition, provider configuration, and the full API, see the [Pipeline Guide](https://github.com/ChicagoHAI/AutoChecklist/blob/main/docs/user-guide/pipeline.md).
94+
See the [Quick Start guide](https://autochecklist.github.io/getting-started/quickstart/) for custom prompts, batch evaluation, and more.
15395

154-
### Command-Line Interface
155-
156-
Run evaluations directly from the terminal:
96+
### CLI
15797

15898
```bash
159-
# Full evaluation (generate + score)
16099
autochecklist run --pipeline tick --data eval_data.jsonl -o results.jsonl \
161100
--generator-model openai/gpt-4o-mini --scorer-model openai/gpt-4o-mini
162-
163-
# Generate checklists only
164-
autochecklist generate --pipeline tick --data inputs.jsonl -o checklists.jsonl \
165-
--generator-model openai/gpt-4o-mini
166-
167-
# Score with existing checklist
168-
autochecklist score --data eval_data.jsonl --checklist checklist.json \
169-
-o results.jsonl --scorer-model openai/gpt-4o-mini
170-
171-
# List available pipelines
172-
autochecklist list
173101
```
174102

175-
API keys can be set via `--api-key`, environment variables (`OPENROUTER_API_KEY`), or a `.env` file. See the [CLI Guide](https://github.com/ChicagoHAI/AutoChecklist/blob/main/docs/user-guide/cli.md) for full details.
176-
177-
### Examples
178-
179-
Detailed examples with runnable code:
180-
181-
- **[custom_components_tutorial.ipynb](https://github.com/ChicagoHAI/AutoChecklist/blob/main/examples/custom_components_tutorial.ipynb)** - Create your own generators, scorers, and refiners
182-
- **[pipeline_demo.ipynb](https://github.com/ChicagoHAI/AutoChecklist/blob/main/examples/pipeline_demo.ipynb)** - Pipeline API, registry, batch evaluation, export
183-
- **[instance_level_demo.ipynb](https://github.com/ChicagoHAI/AutoChecklist/blob/main/examples/instance_level_demo.ipynb)** - DirectGenerator, ContrastiveGenerator (per-input checklists)
184-
- **[corpus_level_demo.ipynb](https://github.com/ChicagoHAI/AutoChecklist/blob/main/examples/corpus_level_demo.ipynb)** - InductiveGenerator, DeductiveGenerator, InteractiveGenerator (per-dataset checklists)
103+
See the [CLI guide](https://autochecklist.github.io/user-guide/cli/) for all commands.
185104

186105
## Links
187106

188-
<!-- - [Full Documentation](https://autochecklist.github.io) -->
107+
- [Documentation](https://autochecklist.github.io)
189108
- [GitHub Repository](https://github.com/ChicagoHAI/AutoChecklist) — contributing, UI, dev setup
190109
- [Bug Tracker](https://github.com/ChicagoHAI/AutoChecklist/issues)
191110

autochecklist/__init__.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,8 @@
4949
list_refiners_with_info,
5050
)
5151

52-
__version__ = "0.1.0"
52+
from importlib.metadata import version as _pkg_version
53+
__version__ = _pkg_version("autochecklist")
5354

5455
__all__ = [
5556
# Models
@@ -63,6 +64,10 @@
6364
"DeductiveInput",
6465
"FeedbackInput",
6566
"InteractiveInput",
67+
"ChecklistResponse",
68+
"WeightedChecklistResponse",
69+
"CategorizedChecklistResponse",
70+
"GeneratedCategorizedQuestion",
6671
# Config
6772
"configure",
6873
"get_config",

autochecklist/cli.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -185,12 +185,27 @@ def cmd_list(args: argparse.Namespace) -> None:
185185
print(f"{r['name']:<20} {r.get('description', '')}")
186186

187187

188+
def _find_repo_root() -> Path | None:
189+
"""Find the repo root by walking up from the package dir."""
190+
_dir = Path(__file__).resolve().parent
191+
for _ in range(5):
192+
_dir = _dir.parent
193+
if (_dir / "ui" / "launch_ui.sh").exists() and (_dir / "pyproject.toml").exists():
194+
return _dir
195+
return None
196+
197+
188198
def cmd_ui(args: argparse.Namespace) -> None:
189199
"""Launch the AutoChecklist UI."""
190200
import os
191201
import subprocess
192202

193-
repo_root = Path(__file__).resolve().parent.parent
203+
repo_root = _find_repo_root()
204+
if repo_root is None:
205+
print("Error: could not find AutoChecklist source tree. "
206+
"The 'ui' command is only available from a source checkout.", file=sys.stderr)
207+
sys.exit(1)
208+
194209
cmd = [str(repo_root / "ui" / "launch_ui.sh")]
195210
if args.dev:
196211
cmd.append("--dev")
@@ -281,9 +296,7 @@ def main(argv: list[str] | None = None) -> None:
281296
list_parser.set_defaults(func=cmd_list)
282297

283298
# --- ui (only available in source checkout) ---
284-
pkg_dir = Path(__file__).resolve().parent
285-
ui_script = pkg_dir.parent / "ui" / "launch_ui.sh"
286-
if ui_script.exists():
299+
if _find_repo_root() is not None:
287300
ui_parser = subparsers.add_parser("ui", help="Launch the AutoChecklist UI")
288301
ui_parser.add_argument("--dev", action="store_true", help="Run in development mode (hot-reload)")
289302
ui_parser.set_defaults(func=cmd_ui)

docs/frames.gif

887 KB
Loading

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
[![GitHub Stars](https://img.shields.io/github/stars/ChicagoHAI/AutoChecklist?style=flat-square)](https://github.com/ChicagoHAI/AutoChecklist)
44
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
55
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=flat-square)](LICENSE)
6+
[![Site](https://img.shields.io/badge/site-autochecklist.github.io-purple?style=flat-square)](https://autochecklist.github.io/)
67

78
`AutoChecklist` is an open-source library that unifies LLM-based checklist evaluation into composable pipelines, in a `pip`-installable Python package (`autochecklist`) with CLI and UI features.
89

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "autochecklist"
3-
version = "0.2.0"
3+
version = "0.2.1"
44
description = "A library of checklist generation and scoring methods for LLM evaluation"
55
authors = [{name = "ChicagoHAI"}]
66
readme = "README.pypi.md"
@@ -75,6 +75,7 @@ dev = [
7575
"nbconvert>=7.17.0",
7676
"pytest>=9.0.2",
7777
"pytest-asyncio>=1.3.0",
78+
"ruff>=0.14.14",
7879
]
7980

8081
[tool.pytest.ini_options]

0 commit comments

Comments
 (0)