Skip to content

Commit 72d9fd9

Browse files
Merge pull request #4183 from AI-Hypercomputer:xfgu-rtd
PiperOrigin-RevId: 933334876
2 parents d15c1b0 + c353eff commit 72d9fd9

1 file changed

Lines changed: 18 additions & 0 deletions

File tree

docs/tutorials/posttraining/rl_qwen3_30b.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,24 @@ kubectl logs -f <POD_NAME>
143143

144144
Alternatively, after running the bash script, you will also get a link to the Google Cloud Console to view your workload logs. Follow the link to view logs and monitor your workload's progress in the Cloud Console.
145145

146+
### Monitor RL Metrics
147+
148+
During RL training, you can monitor key metrics to track model convergence, reward trends, and hardware performance.
149+
150+
To enable Tunix-managed metrics measurement, set `enable_tunix_perf_metrics` to `true` in `src/maxtext/configs/post_train/rl.yml`. Note that this flag is already set to `True` by default in the [scripts/run_qwen3_30b_rl.sh](../../../scripts/run_qwen3_30b_rl.sh) script for this tutorial workload. When enabled, Tunix automatically collects and uploads these metrics to TensorBoard.
151+
152+
For a complete list of collected metrics, see the [Tunix Metrics Documentation](https://tunix.readthedocs.io/en/latest/metrics.html). Key metrics to monitor include:
153+
154+
- **Model Quality & Reward Metrics:**
155+
- `rewards/mean`: The average reward across the batch (crucial for tracking learning progress).
156+
- `score/mean`: The average raw score from the reward model before applying the KL penalty.
157+
- **Rollout & Generation Metrics:**
158+
- `rollout_time`: How long each rollout step takes.
159+
- `completions/mean_length`: The average token length of generated completions.
160+
- `actor_dequeue_time`: The time spent waiting for data from the rollout workers (relevant when async rollout is enabled).
161+
- **Performance & Efficiency Metrics:**
162+
- `step_time_sec`: The execution time for a single training step.
163+
146164
## Convert Checkpoint to Hugging Face Format
147165

148166
After training, you may want to convert your MaxText checkpoint back to Hugging Face format. Use the following script to perform the conversion:

0 commit comments

Comments
 (0)