You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: stage_advantage/README.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,8 @@ This module implements a pipeline for training an **Advantage Estimator** and us
22
22
23
23
**End-to-end order for AWBC:** (1) Stage 0 on data with `progress` → optional for Stage 1. (2) Stage 1 → train estimator. (3) Stage 2 → run eval on your dataset so it gets `data_PI06_100000/` or `data_KAI0_100000/` with advantage columns. (4) Run Stage 0 again with `--advantage-source absolute_advantage` on that dataset (e.g. via `gt_labeling.sh` with `DATA_PATH` = the repo you ran eval on, and source subdirs `data_PI06_100000` / `data_KAI0_100000`). (5) Point AWBC config `repo_id` at the resulting advantage-labeled directory and run Stage 3 training.
24
24
25
+
**Pre-annotated data:** The downloaded dataset includes **`data/Task_A/advantage`**, a fully annotated advantage dataset that can be used **directly for AWBC training** (Stage 3) without running Stage 0–2. Set the AWBC config `repo_id` to that path and run training.
26
+
25
27
---
26
28
27
29
## Stage 0: GT Data Labeling
@@ -287,6 +289,8 @@ So during AWBC training the model is conditioned on prompts that explicitly incl
287
289
288
290
At **inference** time you must use the **same prompt format** as in training. To run the policy in the high-advantage regime, pass the **positive**-advantage prompt, e.g. `"<task>, Advantage: positive"` (with the same `<task>` wording as in your `tasks.jsonl`). Using a different format or omitting the advantage part can hurt performance, since the model was trained to condition on this exact style of prompt.
289
291
292
+
**Where to set the prompt when deploying:** The language prompt is set in the **inference code** (e.g. the `lang_embeddings` variable in the Agilex inference scripts). See the [train_deploy_alignment/inference README](../train_deploy_alignment/inference/README.md) and [Agilex README — Prompt and AWBC](../train_deploy_alignment/inference/agilex/README.md#prompt-and-awbc-important) for how to configure it so it matches your training and, for AWBC, uses the positive-advantage format above.
293
+
290
294
### How it works (data flow)
291
295
292
296
1.**Data**: The advantage dataset must contain `task_index` in each parquet and `meta/tasks.jsonl` mapping `task_index` → prompt string. This is produced by running Stage 2 (eval) to get advantage columns, then Stage 0 (`gt_label.py --advantage-source absolute_advantage`) to discretize into `task_index` and write `tasks.jsonl`.
Copy file name to clipboardExpand all lines: stage_advantage/awbc/README.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,11 @@ Each uses `base_config=DataConfig(prompt_from_task=True)` so that the dataset’
17
17
## Prerequisites
18
18
19
19
1.**Advantage dataset**
20
-
The data must have `task_index` in each parquet and `meta/tasks.jsonl` (prompt strings per `task_index`). To build it:
20
+
The data must have `task_index` in each parquet and `meta/tasks.jsonl` (prompt strings per `task_index`).
21
+
22
+
**Pre-annotated data:** The downloaded dataset includes **`data/Task_A/advantage`**, a fully annotated advantage dataset that can be used **directly for AWBC training** (no need to run Stage 0–2 first). Set the AWBC config `repo_id` to that path and run the training commands below.
23
+
24
+
To build your own advantage dataset instead:
21
25
- Run **Stage 2** (eval) on your dataset → get `data_PI06_100000/` or `data_KAI0_100000/` with advantage columns.
22
26
- Run **Stage 0** on that output: `gt_label.py --advantage-source absolute_advantage` (or `gt_labeling.sh` with `DATA_PATH` = the eval repo). The resulting directory (with `data/`, `meta/tasks.jsonl`, `videos/`) is your advantage dataset.
23
27
- Place or link it at e.g. `./data/FlattenFold/advantage` and set `repo_id` in config to that path.
This directory contains three modules used to align training data and deployment/inference:
4
+
5
+
| Module | Description |
6
+
|--------|-------------|
7
+
|**dagger**| DAgger-style data collection (policy-in-the-loop, intervention, save). See [dagger/README.md](dagger/README.md) for ARX and Agilex. |
8
+
|**inference**| Deployment and inference code, including ARX, Agilex. |
9
+
|**data_augment**| Data augmentation and format conversion (time scaling, space mirroring, HDF5 → LeRobot). See [data_augment/README.md](data_augment/README.md). |
0 commit comments