Skip to content
This repository was archived by the owner on Apr 1, 2026. It is now read-only.

Commit 7de0d20

Browse files
authored
Update README.md
1 parent 8035c25 commit 7de0d20

File tree

1 file changed

+56
-31
lines changed

1 file changed

+56
-31
lines changed

README.md

Lines changed: 56 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ Tiny Recursion Model (TRM) recursively improves its predicted answer y with a ti
2020

2121
### Requirements
2222

23+
Installation should take a few minutes. For the smallest experiments on Sudoku-Extreme (pretrain_mlp_t_sudoku), you need 1 GPU with enough memory. With 1 L40S (48Gb Ram), it takes around 18h to finish. In case that you run into issues due to library versions, here is the requirements with the exact versions used: [specific_requirements.txt](https://github.com/SamsungSAILMontreal/TinyRecursiveModels/blob/main/specific_requirements.txt).
24+
2325
- Python 3.10 (or similar)
2426
- Cuda 12.6.0 (or similar)
2527

@@ -59,36 +61,6 @@ python dataset/build_maze_dataset.py # 1000 examples, 8 augments
5961

6062
## Experiments
6163

62-
### ARC-AGI-1 (assuming 4 H-100 GPUs):
63-
64-
```bash
65-
run_name="pretrain_att_arc1concept_4"
66-
torchrun --nproc-per-node 4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 pretrain.py \
67-
arch=trm \
68-
data_paths="[data/arc1concept-aug-1000]" \
69-
arch.L_layers=2 \
70-
arch.H_cycles=3 arch.L_cycles=4 \
71-
+run_name=${run_name} ema=True
72-
73-
```
74-
75-
*Runtime:* ~3 days
76-
77-
### ARC-AGI-2 (assuming 4 H-100 GPUs):
78-
79-
```bash
80-
run_name="pretrain_att_arc2concept_4"
81-
torchrun --nproc-per-node 4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 pretrain.py \
82-
arch=trm \
83-
data_paths="[data/arc2concept-aug-1000]" \
84-
arch.L_layers=2 \
85-
arch.H_cycles=3 arch.L_cycles=4 \
86-
+run_name=${run_name} ema=True
87-
88-
```
89-
90-
*Runtime:* ~3 days
91-
9264
### Sudoku-Extreme (assuming 1 L40S GPU):
9365

9466
```bash
@@ -104,6 +76,8 @@ arch.L_layers=2 \
10476
arch.H_cycles=3 arch.L_cycles=6 \
10577
+run_name=${run_name} ema=True
10678

79+
Expected: Around 87% exact-accuracy (+- 2%)
80+
10781
run_name="pretrain_att_sudoku"
10882
python pretrain.py \
10983
arch=trm \
@@ -116,7 +90,9 @@ arch.H_cycles=3 arch.L_cycles=6 \
11690
+run_name=${run_name} ema=True
11791
```
11892

119-
*Runtime:* < 36 hours
93+
Expected: Around 75% exact-accuracy (+- 2%)
94+
95+
*Runtime:* < 20 hours
12096

12197
### Maze-Hard (assuming 4 L40S GPUs):
12298

@@ -135,6 +111,55 @@ arch.H_cycles=3 arch.L_cycles=4 \
135111

136112
*Runtime:* < 24 hours
137113

114+
Actually, you can run Maze-Hard with 1 L40S GPU by reducing the batch-size with no noticable loss in performance:
115+
116+
```bash
117+
run_name="pretrain_att_maze30x30_1gpu"
118+
python pretrain.py \
119+
arch=trm \
120+
data_paths="[data/maze-30x30-hard-1k]" \
121+
evaluators="[]" \
122+
epochs=50000 eval_interval=5000 \
123+
lr=1e-4 puzzle_emb_lr=1e-4 weight_decay=1.0 puzzle_emb_weight_decay=1.0 global_batch_size=128 \
124+
arch.L_layers=2 \
125+
arch.H_cycles=3 arch.L_cycles=4 \
126+
+run_name=${run_name} ema=True
127+
```
128+
129+
*Runtime:* < 24 hours
130+
131+
132+
### ARC-AGI-1 (assuming 4 H-100 GPUs):
133+
134+
```bash
135+
run_name="pretrain_att_arc1concept_4"
136+
torchrun --nproc-per-node 4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 pretrain.py \
137+
arch=trm \
138+
data_paths="[data/arc1concept-aug-1000]" \
139+
arch.L_layers=2 \
140+
arch.H_cycles=3 arch.L_cycles=4 \
141+
+run_name=${run_name} ema=True
142+
143+
```
144+
145+
*Runtime:* ~3 days
146+
147+
### ARC-AGI-2 (assuming 4 H-100 GPUs):
148+
149+
```bash
150+
run_name="pretrain_att_arc2concept_4"
151+
torchrun --nproc-per-node 4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 pretrain.py \
152+
arch=trm \
153+
data_paths="[data/arc2concept-aug-1000]" \
154+
arch.L_layers=2 \
155+
arch.H_cycles=3 arch.L_cycles=4 \
156+
+run_name=${run_name} ema=True
157+
158+
```
159+
160+
*Runtime:* ~3 days
161+
162+
138163
## Reference
139164

140165
If you find our work useful, please consider citing:

0 commit comments

Comments
 (0)