Skip to content

Commit 3e5b6a3

Browse files
committed
doc: fix typos in readme
1 parent d7702d5 commit 3e5b6a3

File tree

1 file changed

+14
-11
lines changed

1 file changed

+14
-11
lines changed

README.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Evaluating Program Semantics Reasoning with Type Inference in System _F_
88
![evaluation workflow](./imgs/tfb.png)
99

1010
If you find this work useful, please cite us as:
11+
1112
```bibtex
1213
@inproceedings{he2025tfbench,
1314
author = {He, Yifeng and Yang, Luning and Gonzalo, Christopher and Chen, Hao},
@@ -22,7 +23,7 @@ If you find this work useful, please cite us as:
2223

2324
### Python
2425

25-
We use Python 3.11.
26+
We use Python 3.12.
2627
We recommend using [uv](https://docs.astral.sh/uv/getting-started/installation/) to manage your Python dependencies.
2728

2829
```sh
@@ -71,7 +72,7 @@ For details, please check out the README of [alpharewrite](https://github.com/Se
7172

7273
## Download pre-built benchmark
7374

74-
You can also use TF-Bench on HuggingFace datasets.
75+
You can also use TF-Bench via HuggingFace datasets.
7576

7677
```python
7778
from datasets import load_dataset
@@ -96,10 +97,9 @@ cd TF-Bench
9697
uv sync
9798
```
9899

99-
Please have your API key ready in `.env`.
100-
101100
### Proprietary models
102101

102+
Please have your API key ready in `.env`.
103103
We use each provider's official SDK to access their models.
104104
You can check our pre-supported models in `tfbench.lm` module.
105105

@@ -111,7 +111,7 @@ print(supported_models)
111111
To run single model, which runs both `base` and `pure` splits:
112112

113113
```sh
114-
uv run main.py -m gpt-5-2025-08-07
114+
uv run src/main.py -m gpt-5-2025-08-07
115115
```
116116

117117
### Open-weights models with Ollama
@@ -153,7 +153,7 @@ uv run src/main.py Qwen/Qwen3-4B-Instruct-2507 # or other models
153153
Note that our `main.py` uses a pre-defined model router,
154154
which routes all un-recognized model names to HuggingFace.
155155
We use the `</think>` token to parse thinking process,
156-
if the model do it differently, please see the next section.
156+
if the model do it differently, please see [Supporting other customized models].
157157

158158
### Running your own model
159159

@@ -190,14 +190,14 @@ from tfbench.lm import OpenAIResponse
190190
from tfbench import run_one_model
191191

192192
model = "gpt-4.1"
193-
split = "pure"
194-
client = OpenAIResponses(model_name=model, pure=split == "pure", effort=None)
195-
eval_result = run_one_model(client, pure=split == "pure", effort=None)
193+
pure = True
194+
client = OpenAIResponses(model_name=model, pure=pure, effort=None)
195+
eval_result = run_one_model(client, pure=pure)
196196
```
197197

198-
### Support other customized models
198+
### Supporting other customized models
199199

200-
You may implement an `LM` instance.
200+
Implementing an `LM` instance is all your need.
201201

202202
```python
203203
from tfbench.lm._types import LM, LMAnswer
@@ -211,4 +211,7 @@ class YourLM(LM):
211211
def _gen(self, prompt: str) -> LMAnswer:
212212
"""your generation logic here"""
213213
return LMAnswer(answer=content, reasoning_steps=thinking_content)
214+
215+
client = YourLM("xxx")
216+
eval_result = run_one_model(client)
214217
```

0 commit comments

Comments
 (0)