Skip to content

[Performance] Add _skip_maybe_reset flag to bypass auto-reset in step_and_maybe_reset#3560

Closed
vmoens wants to merge 2 commits intogh/vmoens/241/basefrom
gh/vmoens/241/head
Closed

[Performance] Add _skip_maybe_reset flag to bypass auto-reset in step_and_maybe_reset#3560
vmoens wants to merge 2 commits intogh/vmoens/241/basefrom
gh/vmoens/241/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 23, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3560

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 14 Pending

As of commit db0dc99 with merge base a4301ee (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 83.0031μs 82.2934μs 12.1516 KOps/s 12.3255 KOps/s $\color{#d91a1a}-1.41\%$
test_tensor_to_bytestream_speed[torch.save] 0.1478ms 0.1462ms 6.8386 KOps/s 7.0491 KOps/s $\color{#d91a1a}-2.99\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1106s 0.1101s 9.0830 Ops/s 8.8721 Ops/s $\color{#35bf28}+2.38\%$
test_tensor_to_bytestream_speed[numpy] 2.6900μs 2.6855μs 372.3723 KOps/s 394.9506 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_tensor_to_bytestream_speed[safetensors] 37.4513μs 37.2260μs 26.8630 KOps/s 27.4249 KOps/s $\color{#d91a1a}-2.05\%$
test_simple 0.5460s 0.5450s 1.8349 Ops/s 1.7410 Ops/s $\textbf{\color{#35bf28}+5.39\%}$
test_transformed 1.0925s 1.0898s 0.9176 Ops/s 0.8901 Ops/s $\color{#35bf28}+3.10\%$
test_serial 1.7057s 1.6960s 0.5896 Ops/s 0.5783 Ops/s $\color{#35bf28}+1.95\%$
test_parallel 1.0283s 1.0230s 0.9775 Ops/s 0.9698 Ops/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[True-True-True-True-True] 0.3354ms 41.9960μs 23.8118 KOps/s 23.9635 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[True-True-True-True-False] 49.3210μs 23.2128μs 43.0797 KOps/s 43.7010 KOps/s $\color{#d91a1a}-1.42\%$
test_step_mdp_speed[True-True-True-False-True] 71.0520μs 23.4910μs 42.5696 KOps/s 41.9813 KOps/s $\color{#35bf28}+1.40\%$
test_step_mdp_speed[True-True-True-False-False] 40.0210μs 12.7626μs 78.3541 KOps/s 77.3343 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[True-True-False-True-True] 73.7810μs 43.8408μs 22.8098 KOps/s 22.7904 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[True-True-False-True-False] 69.2510μs 25.6485μs 38.9886 KOps/s 39.6125 KOps/s $\color{#d91a1a}-1.58\%$
test_step_mdp_speed[True-True-False-False-True] 57.3710μs 26.0510μs 38.3863 KOps/s 38.6520 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[True-True-False-False-False] 45.3710μs 15.5654μs 64.2452 KOps/s 65.8822 KOps/s $\color{#d91a1a}-2.48\%$
test_step_mdp_speed[True-False-True-True-True] 76.8910μs 47.4002μs 21.0969 KOps/s 21.6968 KOps/s $\color{#d91a1a}-2.76\%$
test_step_mdp_speed[True-False-True-True-False] 65.5510μs 28.9358μs 34.5592 KOps/s 35.9540 KOps/s $\color{#d91a1a}-3.88\%$
test_step_mdp_speed[True-False-True-False-True] 62.3110μs 26.5893μs 37.6092 KOps/s 38.6890 KOps/s $\color{#d91a1a}-2.79\%$
test_step_mdp_speed[True-False-True-False-False] 52.6810μs 15.7163μs 63.6283 KOps/s 64.8689 KOps/s $\color{#d91a1a}-1.91\%$
test_step_mdp_speed[True-False-False-True-True] 89.2710μs 49.8309μs 20.0679 KOps/s 20.2866 KOps/s $\color{#d91a1a}-1.08\%$
test_step_mdp_speed[True-False-False-True-False] 64.4920μs 31.1695μs 32.0827 KOps/s 32.4490 KOps/s $\color{#d91a1a}-1.13\%$
test_step_mdp_speed[True-False-False-False-True] 61.7420μs 29.0510μs 34.4222 KOps/s 35.0114 KOps/s $\color{#d91a1a}-1.68\%$
test_step_mdp_speed[True-False-False-False-False] 48.0010μs 17.9982μs 55.5613 KOps/s 56.3126 KOps/s $\color{#d91a1a}-1.33\%$
test_step_mdp_speed[False-True-True-True-True] 74.3920μs 46.8265μs 21.3554 KOps/s 21.0114 KOps/s $\color{#35bf28}+1.64\%$
test_step_mdp_speed[False-True-True-True-False] 68.1220μs 28.2646μs 35.3799 KOps/s 36.1060 KOps/s $\color{#d91a1a}-2.01\%$
test_step_mdp_speed[False-True-True-False-True] 2.3469ms 30.3672μs 32.9302 KOps/s 33.9840 KOps/s $\color{#d91a1a}-3.10\%$
test_step_mdp_speed[False-True-True-False-False] 50.4310μs 17.5561μs 56.9602 KOps/s 59.1063 KOps/s $\color{#d91a1a}-3.63\%$
test_step_mdp_speed[False-True-False-True-True] 0.1256ms 49.4498μs 20.2225 KOps/s 20.5808 KOps/s $\color{#d91a1a}-1.74\%$
test_step_mdp_speed[False-True-False-True-False] 61.5310μs 31.0492μs 32.2070 KOps/s 33.0431 KOps/s $\color{#d91a1a}-2.53\%$
test_step_mdp_speed[False-True-False-False-True] 62.6810μs 32.1827μs 31.0726 KOps/s 31.7941 KOps/s $\color{#d91a1a}-2.27\%$
test_step_mdp_speed[False-True-False-False-False] 57.7710μs 19.7330μs 50.6766 KOps/s 51.9227 KOps/s $\color{#d91a1a}-2.40\%$
test_step_mdp_speed[False-False-True-True-True] 80.3810μs 51.8173μs 19.2986 KOps/s 19.2604 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[False-False-True-True-False] 61.0410μs 33.4831μs 29.8658 KOps/s 31.2584 KOps/s $\color{#d91a1a}-4.46\%$
test_step_mdp_speed[False-False-True-False-True] 62.4210μs 32.0299μs 31.2208 KOps/s 31.4430 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[False-False-True-False-False] 48.7710μs 19.4108μs 51.5177 KOps/s 51.5474 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[False-False-False-True-True] 85.8910μs 53.9050μs 18.5511 KOps/s 18.7227 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-False-False-True-False] 67.0510μs 36.1931μs 27.6296 KOps/s 28.1999 KOps/s $\color{#d91a1a}-2.02\%$
test_step_mdp_speed[False-False-False-False-True] 61.2020μs 33.8740μs 29.5211 KOps/s 28.8336 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[False-False-False-False-False] 53.0310μs 22.0807μs 45.2884 KOps/s 46.0184 KOps/s $\color{#d91a1a}-1.59\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8534s 0.7436s 1.3448 Ops/s 1.3396 Ops/s $\color{#35bf28}+0.39\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7156s 0.6038s 1.6561 Ops/s 1.6328 Ops/s $\color{#35bf28}+1.43\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7285s 1.6437s 0.6084 Ops/s 0.6044 Ops/s $\color{#35bf28}+0.66\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5128s 1.4287s 0.6999 Ops/s 0.7011 Ops/s $\color{#d91a1a}-0.17\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9781s 1.8978s 0.5269 Ops/s 0.5226 Ops/s $\color{#35bf28}+0.82\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7679s 1.6815s 0.5947 Ops/s 0.5915 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6833s 4.6231s 0.2163 Ops/s 0.2145 Ops/s $\color{#35bf28}+0.84\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5118s 4.4196s 0.2263 Ops/s 0.2262 Ops/s $\color{#35bf28}+0.04\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9888s 1.9011s 0.5260 Ops/s 0.5364 Ops/s $\color{#d91a1a}-1.93\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6654s 1.5898s 0.6290 Ops/s 0.6326 Ops/s $\color{#d91a1a}-0.57\%$
test_values[generalized_advantage_estimate-True-True] 10.6565ms 10.2464ms 97.5954 Ops/s 99.4797 Ops/s $\color{#d91a1a}-1.89\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.2830ms 17.3880ms 57.5109 Ops/s 57.3714 Ops/s $\color{#35bf28}+0.24\%$
test_values[td0_return_estimate-False-False] 0.2013ms 0.1316ms 7.5970 KOps/s 7.6171 KOps/s $\color{#d91a1a}-0.26\%$
test_values[td1_return_estimate-False-False] 29.1241ms 28.0120ms 35.6989 Ops/s 35.8448 Ops/s $\color{#d91a1a}-0.41\%$
test_values[vec_td1_return_estimate-False-False] 17.7579ms 17.4786ms 57.2128 Ops/s 56.9585 Ops/s $\color{#35bf28}+0.45\%$
test_values[td_lambda_return_estimate-True-False] 43.4091ms 41.5788ms 24.0507 Ops/s 24.5953 Ops/s $\color{#d91a1a}-2.21\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.7168ms 17.5589ms 56.9511 Ops/s 57.2624 Ops/s $\color{#d91a1a}-0.54\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.1119ms 9.0168ms 110.9042 Ops/s 113.1399 Ops/s $\color{#d91a1a}-1.98\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.0260ms 1.5504ms 645.0011 Ops/s 644.5285 Ops/s $\color{#35bf28}+0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6028ms 0.4319ms 2.3153 KOps/s 2.3954 KOps/s $\color{#d91a1a}-3.34\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.5900ms 34.5702ms 28.9266 Ops/s 29.0534 Ops/s $\color{#d91a1a}-0.44\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.1904ms 1.7229ms 580.4208 Ops/s 583.8918 Ops/s $\color{#d91a1a}-0.59\%$
test_dqn_speed[False-None] 1.6161ms 1.4161ms 706.1730 Ops/s 709.5376 Ops/s $\color{#d91a1a}-0.47\%$
test_dqn_speed[False-backward] 2.0699ms 2.0195ms 495.1781 Ops/s 507.8040 Ops/s $\color{#d91a1a}-2.49\%$
test_dqn_speed[True-None] 0.8557ms 0.5910ms 1.6921 KOps/s 1.7081 KOps/s $\color{#d91a1a}-0.93\%$
test_dqn_speed[True-backward] 1.0902ms 1.0367ms 964.6133 Ops/s 944.3419 Ops/s $\color{#35bf28}+2.15\%$
test_dqn_speed[reduce-overhead-None] 0.9197ms 0.5547ms 1.8027 KOps/s 1.7333 KOps/s $\color{#35bf28}+4.00\%$
test_ddpg_speed[False-None] 3.2035ms 2.8492ms 350.9760 Ops/s 351.3395 Ops/s $\color{#d91a1a}-0.10\%$
test_ddpg_speed[False-backward] 4.1774ms 4.0553ms 246.5917 Ops/s 242.3480 Ops/s $\color{#35bf28}+1.75\%$
test_ddpg_speed[True-None] 8.3865ms 1.5468ms 646.5090 Ops/s 677.2813 Ops/s $\color{#d91a1a}-4.54\%$
test_ddpg_speed[True-backward] 2.5580ms 2.4706ms 404.7657 Ops/s 366.7891 Ops/s $\textbf{\color{#35bf28}+10.35\%}$
test_ddpg_speed[reduce-overhead-None] 2.1942ms 1.4510ms 689.1965 Ops/s 684.2492 Ops/s $\color{#35bf28}+0.72\%$
test_sac_speed[False-None] 9.7774ms 8.1842ms 122.1860 Ops/s 121.8226 Ops/s $\color{#35bf28}+0.30\%$
test_sac_speed[False-backward] 11.8295ms 11.3154ms 88.3752 Ops/s 87.3953 Ops/s $\color{#35bf28}+1.12\%$
test_sac_speed[True-None] 2.3100ms 2.1918ms 456.2420 Ops/s 450.2571 Ops/s $\color{#35bf28}+1.33\%$
test_sac_speed[True-backward] 4.3844ms 4.1145ms 243.0400 Ops/s 238.1631 Ops/s $\color{#35bf28}+2.05\%$
test_sac_speed[reduce-overhead-None] 2.3000ms 2.1770ms 459.3426 Ops/s 443.3157 Ops/s $\color{#35bf28}+3.62\%$
test_redq_speed[False-None] 16.1258ms 10.7872ms 92.7028 Ops/s 93.9334 Ops/s $\color{#d91a1a}-1.31\%$
test_redq_speed[False-backward] 23.8786ms 17.8547ms 56.0075 Ops/s 55.2967 Ops/s $\color{#35bf28}+1.29\%$
test_redq_speed[True-None] 4.7296ms 4.5217ms 221.1536 Ops/s 211.7969 Ops/s $\color{#35bf28}+4.42\%$
test_redq_speed[reduce-overhead-None] 4.7502ms 4.5008ms 222.1851 Ops/s 222.0178 Ops/s $\color{#35bf28}+0.08\%$
test_redq_deprec_speed[False-None] 11.7401ms 11.0010ms 90.9007 Ops/s 89.4470 Ops/s $\color{#35bf28}+1.63\%$
test_redq_deprec_speed[False-backward] 16.1630ms 15.7735ms 63.3975 Ops/s 62.0689 Ops/s $\color{#35bf28}+2.14\%$
test_redq_deprec_speed[True-None] 4.0490ms 3.5793ms 279.3873 Ops/s 260.4865 Ops/s $\textbf{\color{#35bf28}+7.26\%}$
test_redq_deprec_speed[True-backward] 7.5626ms 7.1323ms 140.2074 Ops/s 119.1163 Ops/s $\textbf{\color{#35bf28}+17.71\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.8674ms 3.5153ms 284.4728 Ops/s 268.9070 Ops/s $\textbf{\color{#35bf28}+5.79\%}$
test_td3_speed[False-None] 8.4518ms 8.1851ms 122.1734 Ops/s 120.9637 Ops/s $\color{#35bf28}+1.00\%$
test_td3_speed[False-backward] 11.2576ms 10.9838ms 91.0434 Ops/s 89.7200 Ops/s $\color{#35bf28}+1.48\%$
test_td3_speed[True-None] 1.8570ms 1.8192ms 549.7024 Ops/s 540.1317 Ops/s $\color{#35bf28}+1.77\%$
test_td3_speed[True-backward] 3.6853ms 3.5785ms 279.4485 Ops/s 254.4475 Ops/s $\textbf{\color{#35bf28}+9.83\%}$
test_td3_speed[reduce-overhead-None] 1.8789ms 1.7866ms 559.7096 Ops/s 541.2705 Ops/s $\color{#35bf28}+3.41\%$
test_cql_speed[False-None] 29.6532ms 26.7860ms 37.3330 Ops/s 37.9403 Ops/s $\color{#d91a1a}-1.60\%$
test_cql_speed[False-backward] 38.7165ms 35.5538ms 28.1264 Ops/s 28.4453 Ops/s $\color{#d91a1a}-1.12\%$
test_cql_speed[True-None] 12.8383ms 12.1422ms 82.3572 Ops/s 77.9540 Ops/s $\textbf{\color{#35bf28}+5.65\%}$
test_cql_speed[True-backward] 17.5938ms 17.2964ms 57.8156 Ops/s 57.0957 Ops/s $\color{#35bf28}+1.26\%$
test_cql_speed[reduce-overhead-None] 12.7340ms 12.3334ms 81.0810 Ops/s 80.5148 Ops/s $\color{#35bf28}+0.70\%$
test_a2c_speed[False-None] 5.4260ms 5.2592ms 190.1432 Ops/s 182.4159 Ops/s $\color{#35bf28}+4.24\%$
test_a2c_speed[False-backward] 13.9422ms 11.9277ms 83.8384 Ops/s 84.6310 Ops/s $\color{#d91a1a}-0.94\%$
test_a2c_speed[True-None] 4.0401ms 3.7881ms 263.9822 Ops/s 259.4143 Ops/s $\color{#35bf28}+1.76\%$
test_a2c_speed[True-backward] 9.1763ms 8.7509ms 114.2737 Ops/s 102.5565 Ops/s $\textbf{\color{#35bf28}+11.43\%}$
test_a2c_speed[reduce-overhead-None] 4.1627ms 3.7979ms 263.3028 Ops/s 259.6615 Ops/s $\color{#35bf28}+1.40\%$
test_ppo_speed[False-None] 6.1430ms 5.9201ms 168.9155 Ops/s 167.1764 Ops/s $\color{#35bf28}+1.04\%$
test_ppo_speed[False-backward] 12.7589ms 12.3957ms 80.6733 Ops/s 79.6698 Ops/s $\color{#35bf28}+1.26\%$
test_ppo_speed[True-None] 4.3479ms 3.8079ms 262.6129 Ops/s 264.7291 Ops/s $\color{#d91a1a}-0.80\%$
test_ppo_speed[True-backward] 9.1122ms 8.6737ms 115.2908 Ops/s 115.8063 Ops/s $\color{#d91a1a}-0.45\%$
test_ppo_speed[reduce-overhead-None] 4.2466ms 3.7669ms 265.4714 Ops/s 265.4588 Ops/s $+0.00\%$
test_reinforce_speed[False-None] 4.7938ms 4.6093ms 216.9524 Ops/s 219.2279 Ops/s $\color{#d91a1a}-1.04\%$
test_reinforce_speed[False-backward] 7.6102ms 7.4403ms 134.4028 Ops/s 134.9365 Ops/s $\color{#d91a1a}-0.40\%$
test_reinforce_speed[True-None] 3.4744ms 3.0155ms 331.6228 Ops/s 329.9157 Ops/s $\color{#35bf28}+0.52\%$
test_reinforce_speed[True-backward] 8.0224ms 7.8814ms 126.8817 Ops/s 119.3689 Ops/s $\textbf{\color{#35bf28}+6.29\%}$
test_reinforce_speed[reduce-overhead-None] 3.4169ms 2.9714ms 336.5437 Ops/s 331.4936 Ops/s $\color{#35bf28}+1.52\%$
test_iql_speed[False-None] 25.0992ms 20.1844ms 49.5431 Ops/s 49.6090 Ops/s $\color{#d91a1a}-0.13\%$
test_iql_speed[False-backward] 35.1038ms 30.2875ms 33.0170 Ops/s 32.7932 Ops/s $\color{#35bf28}+0.68\%$
test_iql_speed[True-None] 9.3924ms 8.4893ms 117.7947 Ops/s 114.8729 Ops/s $\color{#35bf28}+2.54\%$
test_iql_speed[True-backward] 16.9542ms 16.5516ms 60.4170 Ops/s 58.7231 Ops/s $\color{#35bf28}+2.88\%$
test_iql_speed[reduce-overhead-None] 9.0235ms 8.5177ms 117.4025 Ops/s 116.6854 Ops/s $\color{#35bf28}+0.61\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2851ms 6.1558ms 162.4494 Ops/s 161.5793 Ops/s $\color{#35bf28}+0.54\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.9385ms 0.3444ms 2.9036 KOps/s 2.7077 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5753ms 0.3184ms 3.1410 KOps/s 2.9225 KOps/s $\textbf{\color{#35bf28}+7.47\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1880ms 5.8996ms 169.5034 Ops/s 168.6989 Ops/s $\color{#35bf28}+0.48\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1257ms 0.2873ms 3.4802 KOps/s 2.8332 KOps/s $\textbf{\color{#35bf28}+22.84\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4420ms 0.2662ms 3.7571 KOps/s 3.5067 KOps/s $\textbf{\color{#35bf28}+7.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6542ms 1.2809ms 780.7152 Ops/s 692.6229 Ops/s $\textbf{\color{#35bf28}+12.72\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5483ms 1.2078ms 827.9822 Ops/s 752.8565 Ops/s $\textbf{\color{#35bf28}+9.98\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.3003ms 6.0956ms 164.0524 Ops/s 164.7594 Ops/s $\color{#d91a1a}-0.43\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0770ms 0.4718ms 2.1198 KOps/s 2.3108 KOps/s $\textbf{\color{#d91a1a}-8.27\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8269ms 0.4604ms 2.1719 KOps/s 2.3703 KOps/s $\textbf{\color{#d91a1a}-8.37\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0249ms 5.8648ms 170.5091 Ops/s 167.3423 Ops/s $\color{#35bf28}+1.89\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0263ms 0.3620ms 2.7623 KOps/s 3.2880 KOps/s $\textbf{\color{#d91a1a}-15.99\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6140ms 0.3209ms 3.1165 KOps/s 3.0994 KOps/s $\color{#35bf28}+0.55\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0904ms 5.8324ms 171.4565 Ops/s 168.7798 Ops/s $\color{#35bf28}+1.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9633ms 0.3565ms 2.8052 KOps/s 2.8378 KOps/s $\color{#d91a1a}-1.15\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5110ms 0.3280ms 3.0483 KOps/s 3.1855 KOps/s $\color{#d91a1a}-4.31\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1284ms 6.0018ms 166.6170 Ops/s 165.6041 Ops/s $\color{#35bf28}+0.61\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2142ms 0.5006ms 1.9978 KOps/s 1.9929 KOps/s $\color{#35bf28}+0.24\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7752ms 0.4917ms 2.0336 KOps/s 2.0952 KOps/s $\color{#d91a1a}-2.94\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4917ms 5.0950ms 196.2710 Ops/s 44.9245 Ops/s $\textbf{\color{#35bf28}+336.89\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.5661ms 1.9364ms 516.4285 Ops/s 511.8734 Ops/s $\color{#35bf28}+0.89\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.3390ms 1.2505ms 799.6655 Ops/s 755.3821 Ops/s $\textbf{\color{#35bf28}+5.86\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6410s 17.9936ms 55.5754 Ops/s 194.6503 Ops/s $\textbf{\color{#d91a1a}-71.45\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.9657ms 1.8332ms 545.4859 Ops/s 562.6339 Ops/s $\color{#d91a1a}-3.05\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.1923ms 1.1040ms 905.7853 Ops/s 1.1206 KOps/s $\textbf{\color{#d91a1a}-19.17\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.3642ms 5.3559ms 186.7117 Ops/s 187.1775 Ops/s $\color{#d91a1a}-0.25\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.0170ms 1.9389ms 515.7572 Ops/s 464.0667 Ops/s $\textbf{\color{#35bf28}+11.14\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 12.4662ms 1.4917ms 670.3980 Ops/s 880.4847 Ops/s $\textbf{\color{#d91a1a}-23.86\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.7203ms 39.9823ms 25.0110 Ops/s 25.2850 Ops/s $\color{#d91a1a}-1.08\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.8388ms 18.8778ms 52.9724 Ops/s 54.3072 Ops/s $\color{#d91a1a}-2.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 45.2076ms 40.9583ms 24.4151 Ops/s 24.3534 Ops/s $\color{#35bf28}+0.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 21.0835ms 19.4907ms 51.3065 Ops/s 53.5378 Ops/s $\color{#d91a1a}-4.17\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.8262ms 42.6058ms 23.4710 Ops/s 23.2749 Ops/s $\color{#35bf28}+0.84\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 0.5768s 31.6836ms 31.5620 Ops/s 48.6429 Ops/s $\textbf{\color{#d91a1a}-35.11\%}$
test_storage_write_lazystack[50-img_shape0-small] 0.8524ms 0.2235ms 4.4734 KOps/s 4.2682 KOps/s $\color{#35bf28}+4.81\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5795ms 1.3818ms 723.6887 Ops/s 714.3018 Ops/s $\color{#35bf28}+1.31\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5436ms 2.3926ms 417.9604 Ops/s 420.4667 Ops/s $\color{#d91a1a}-0.60\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1065ms 2.9238ms 342.0222 Ops/s 342.7436 Ops/s $\color{#d91a1a}-0.21\%$
test_storage_write_contiguous[50-img_shape0-small] 0.6079ms 0.1407ms 7.1052 KOps/s 7.3010 KOps/s $\color{#d91a1a}-2.68\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3443ms 0.1922ms 5.2034 KOps/s 5.1638 KOps/s $\color{#35bf28}+0.77\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9544ms 1.7887ms 559.0755 Ops/s 575.4885 Ops/s $\color{#d91a1a}-2.85\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.7487ms 1.3080ms 764.5083 Ops/s 780.9584 Ops/s $\color{#d91a1a}-2.11\%$
test_collector_stack_then_write[50-img_shape0-small] 1.5812ms 1.1377ms 878.9487 Ops/s 874.0932 Ops/s $\color{#35bf28}+0.56\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7776ms 3.5747ms 279.7437 Ops/s 275.6482 Ops/s $\color{#35bf28}+1.49\%$
test_collector_stack_then_write[100-img_shape2-large_img] 10.3692ms 5.6396ms 177.3165 Ops/s 176.6251 Ops/s $\color{#35bf28}+0.39\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4975ms 7.3213ms 136.5884 Ops/s 142.1230 Ops/s $\color{#d91a1a}-3.89\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4306ms 0.2802ms 3.5695 KOps/s 3.4806 KOps/s $\color{#35bf28}+2.55\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6336ms 1.4941ms 669.2985 Ops/s 654.3839 Ops/s $\color{#35bf28}+2.28\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6341ms 2.5097ms 398.4565 Ops/s 397.0481 Ops/s $\color{#35bf28}+0.35\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4747ms 3.1412ms 318.3459 Ops/s 317.8926 Ops/s $\color{#35bf28}+0.14\%$
test_collector_without_rb[100-img_shape0-atari] 33.7115ms 32.8472ms 30.4440 Ops/s 30.1779 Ops/s $\color{#35bf28}+0.88\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.0133ms 64.7006ms 15.4558 Ops/s 15.3073 Ops/s $\color{#35bf28}+0.97\%$
test_collector_with_rb[100-img_shape0-atari] 38.2435ms 37.7225ms 26.5094 Ops/s 26.2956 Ops/s $\color{#35bf28}+0.81\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.2895ms 73.6565ms 13.5765 Ops/s 13.4424 Ops/s $\color{#35bf28}+1.00\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.1249μs 79.6404μs 12.5564 KOps/s 12.5510 KOps/s $\color{#35bf28}+0.04\%$
test_tensor_to_bytestream_speed[torch.save] 0.1420ms 0.1390ms 7.1929 KOps/s 7.2414 KOps/s $\color{#d91a1a}-0.67\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1026s 0.1023s 9.7749 Ops/s 9.8832 Ops/s $\color{#d91a1a}-1.10\%$
test_tensor_to_bytestream_speed[numpy] 2.4447μs 2.4387μs 410.0493 KOps/s 411.4385 KOps/s $\color{#d91a1a}-0.34\%$
test_tensor_to_bytestream_speed[safetensors] 37.1041μs 36.1976μs 27.6261 KOps/s 27.8511 KOps/s $\color{#d91a1a}-0.81\%$
test_simple 0.8960s 0.8071s 1.2389 Ops/s 1.2546 Ops/s $\color{#d91a1a}-1.25\%$
test_transformed 1.3514s 1.3491s 0.7412 Ops/s 0.7246 Ops/s $\color{#35bf28}+2.30\%$
test_serial 2.2579s 2.2547s 0.4435 Ops/s 0.4339 Ops/s $\color{#35bf28}+2.22\%$
test_parallel 1.8949s 1.8018s 0.5550 Ops/s 0.5592 Ops/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[True-True-True-True-True] 0.2887ms 40.9994μs 24.3906 KOps/s 25.4311 KOps/s $\color{#d91a1a}-4.09\%$
test_step_mdp_speed[True-True-True-True-False] 52.4900μs 22.3297μs 44.7835 KOps/s 44.6835 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-True-True-False-True] 56.9310μs 23.1007μs 43.2888 KOps/s 43.7983 KOps/s $\color{#d91a1a}-1.16\%$
test_step_mdp_speed[True-True-True-False-False] 43.5500μs 12.3895μs 80.7134 KOps/s 80.4517 KOps/s $\color{#35bf28}+0.33\%$
test_step_mdp_speed[True-True-False-True-True] 95.2310μs 42.9382μs 23.2893 KOps/s 23.3487 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[True-True-False-True-False] 46.9200μs 24.4670μs 40.8714 KOps/s 40.4733 KOps/s $\color{#35bf28}+0.98\%$
test_step_mdp_speed[True-True-False-False-True] 54.8410μs 24.9740μs 40.0416 KOps/s 39.4676 KOps/s $\color{#35bf28}+1.45\%$
test_step_mdp_speed[True-True-False-False-False] 51.1210μs 15.0362μs 66.5060 KOps/s 66.3630 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-False-True-True-True] 0.1169ms 45.2808μs 22.0844 KOps/s 22.2409 KOps/s $\color{#d91a1a}-0.70\%$
test_step_mdp_speed[True-False-True-True-False] 50.0800μs 27.6261μs 36.1977 KOps/s 36.3466 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[True-False-True-False-True] 57.0210μs 25.8395μs 38.7004 KOps/s 39.0335 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-False-True-False-False] 45.4700μs 14.9445μs 66.9142 KOps/s 66.0216 KOps/s $\color{#35bf28}+1.35\%$
test_step_mdp_speed[True-False-False-True-True] 84.5220μs 47.7170μs 20.9569 KOps/s 20.8074 KOps/s $\color{#35bf28}+0.72\%$
test_step_mdp_speed[True-False-False-True-False] 67.9510μs 29.7404μs 33.6243 KOps/s 33.5771 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[True-False-False-False-True] 70.5200μs 27.8301μs 35.9323 KOps/s 35.9143 KOps/s $\color{#35bf28}+0.05\%$
test_step_mdp_speed[True-False-False-False-False] 56.3810μs 17.5774μs 56.8912 KOps/s 56.4334 KOps/s $\color{#35bf28}+0.81\%$
test_step_mdp_speed[False-True-True-True-True] 90.1810μs 46.4392μs 21.5336 KOps/s 21.8989 KOps/s $\color{#d91a1a}-1.67\%$
test_step_mdp_speed[False-True-True-True-False] 66.5910μs 27.1918μs 36.7758 KOps/s 36.5022 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[False-True-True-False-True] 2.6985ms 29.6342μs 33.7448 KOps/s 35.3020 KOps/s $\color{#d91a1a}-4.41\%$
test_step_mdp_speed[False-True-True-False-False] 59.5510μs 16.6872μs 59.9263 KOps/s 60.3925 KOps/s $\color{#d91a1a}-0.77\%$
test_step_mdp_speed[False-True-False-True-True] 84.7310μs 47.6480μs 20.9872 KOps/s 21.0610 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-True-False-True-False] 58.7110μs 29.8448μs 33.5067 KOps/s 33.5428 KOps/s $\color{#d91a1a}-0.11\%$
test_step_mdp_speed[False-True-False-False-True] 56.9310μs 31.8912μs 31.3567 KOps/s 32.2791 KOps/s $\color{#d91a1a}-2.86\%$
test_step_mdp_speed[False-True-False-False-False] 51.1710μs 18.9820μs 52.6814 KOps/s 52.5777 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[False-False-True-True-True] 97.0110μs 50.0281μs 19.9888 KOps/s 19.7864 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[False-False-True-True-False] 99.4810μs 32.3648μs 30.8978 KOps/s 30.9879 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-False-True-False-True] 62.6800μs 30.9468μs 32.3135 KOps/s 32.3082 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[False-False-True-False-False] 52.6000μs 19.1659μs 52.1759 KOps/s 52.9331 KOps/s $\color{#d91a1a}-1.43\%$
test_step_mdp_speed[False-False-False-True-True] 93.0110μs 52.3100μs 19.1168 KOps/s 19.0272 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[False-False-False-True-False] 61.6710μs 34.7115μs 28.8089 KOps/s 28.8541 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[False-False-False-False-True] 98.2810μs 33.7607μs 29.6202 KOps/s 30.5035 KOps/s $\color{#d91a1a}-2.90\%$
test_step_mdp_speed[False-False-False-False-False] 57.0310μs 21.4839μs 46.5464 KOps/s 47.2159 KOps/s $\color{#d91a1a}-1.42\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7143s 0.7080s 1.4125 Ops/s 1.3702 Ops/s $\color{#35bf28}+3.09\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6993s 0.5964s 1.6767 Ops/s 1.6571 Ops/s $\color{#35bf28}+1.18\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7096s 1.6152s 0.6191 Ops/s 0.6197 Ops/s $\color{#d91a1a}-0.09\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4859s 1.4013s 0.7136 Ops/s 0.7114 Ops/s $\color{#35bf28}+0.31\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9495s 1.8603s 0.5375 Ops/s 0.5396 Ops/s $\color{#d91a1a}-0.38\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7396s 1.6528s 0.6050 Ops/s 0.6030 Ops/s $\color{#35bf28}+0.34\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6046s 4.5202s 0.2212 Ops/s 0.2220 Ops/s $\color{#d91a1a}-0.34\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5246s 4.3748s 0.2286 Ops/s 0.2300 Ops/s $\color{#d91a1a}-0.60\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9806s 1.8746s 0.5334 Ops/s 0.5438 Ops/s $\color{#d91a1a}-1.90\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6324s 1.5500s 0.6451 Ops/s 0.6308 Ops/s $\color{#35bf28}+2.28\%$
test_values[generalized_advantage_estimate-True-True] 20.4069ms 19.4693ms 51.3628 Ops/s 52.1930 Ops/s $\color{#d91a1a}-1.59\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1341s 3.5902ms 278.5374 Ops/s 270.5801 Ops/s $\color{#35bf28}+2.94\%$
test_values[td0_return_estimate-False-False] 0.1046ms 81.1038μs 12.3299 KOps/s 12.1055 KOps/s $\color{#35bf28}+1.85\%$
test_values[td1_return_estimate-False-False] 48.5151ms 46.9693ms 21.2905 Ops/s 21.3312 Ops/s $\color{#d91a1a}-0.19\%$
test_values[vec_td1_return_estimate-False-False] 1.4032ms 1.0768ms 928.7004 Ops/s 926.1204 Ops/s $\color{#35bf28}+0.28\%$
test_values[td_lambda_return_estimate-True-False] 79.4752ms 75.9731ms 13.1626 Ops/s 13.1692 Ops/s $\color{#d91a1a}-0.05\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3125ms 1.0697ms 934.7992 Ops/s 936.5424 Ops/s $\color{#d91a1a}-0.19\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.5029ms 19.6391ms 50.9188 Ops/s 50.7323 Ops/s $\color{#35bf28}+0.37\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9982ms 0.7374ms 1.3561 KOps/s 1.3529 KOps/s $\color{#35bf28}+0.24\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7153ms 0.6620ms 1.5105 KOps/s 1.5127 KOps/s $\color{#d91a1a}-0.14\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5644ms 1.4741ms 678.3610 Ops/s 681.3210 Ops/s $\color{#d91a1a}-0.43\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7419ms 0.6775ms 1.4761 KOps/s 1.4905 KOps/s $\color{#d91a1a}-0.96\%$
test_dqn_speed[False-None] 1.6943ms 1.5564ms 642.5054 Ops/s 642.3170 Ops/s $\color{#35bf28}+0.03\%$
test_dqn_speed[False-backward] 2.2867ms 2.1989ms 454.7667 Ops/s 457.4076 Ops/s $\color{#d91a1a}-0.58\%$
test_dqn_speed[True-None] 0.6675ms 0.5909ms 1.6922 KOps/s 1.6232 KOps/s $\color{#35bf28}+4.25\%$
test_dqn_speed[True-backward] 1.1867ms 1.1335ms 882.2596 Ops/s 780.7849 Ops/s $\textbf{\color{#35bf28}+13.00\%}$
test_dqn_speed[reduce-overhead-None] 0.7056ms 0.6228ms 1.6057 KOps/s 1.6092 KOps/s $\color{#d91a1a}-0.22\%$
test_ddpg_speed[False-None] 3.3488ms 2.9671ms 337.0350 Ops/s 340.7469 Ops/s $\color{#d91a1a}-1.09\%$
test_ddpg_speed[False-backward] 4.5974ms 4.2053ms 237.7958 Ops/s 231.9708 Ops/s $\color{#35bf28}+2.51\%$
test_ddpg_speed[True-None] 1.4991ms 1.3632ms 733.5653 Ops/s 728.5087 Ops/s $\color{#35bf28}+0.69\%$
test_ddpg_speed[True-backward] 2.4335ms 2.3829ms 419.6515 Ops/s 388.2468 Ops/s $\textbf{\color{#35bf28}+8.09\%}$
test_ddpg_speed[reduce-overhead-None] 1.5249ms 1.3540ms 738.5645 Ops/s 697.3843 Ops/s $\textbf{\color{#35bf28}+5.90\%}$
test_sac_speed[False-None] 8.9975ms 8.3323ms 120.0149 Ops/s 118.1177 Ops/s $\color{#35bf28}+1.61\%$
test_sac_speed[False-backward] 11.5740ms 11.2474ms 88.9092 Ops/s 86.6503 Ops/s $\color{#35bf28}+2.61\%$
test_sac_speed[True-None] 2.4082ms 1.9115ms 523.1494 Ops/s 521.2789 Ops/s $\color{#35bf28}+0.36\%$
test_sac_speed[True-backward] 3.5948ms 3.5278ms 283.4605 Ops/s 264.8247 Ops/s $\textbf{\color{#35bf28}+7.04\%}$
test_sac_speed[reduce-overhead-None] 17.1291ms 10.0688ms 99.3172 Ops/s 100.4175 Ops/s $\color{#d91a1a}-1.10\%$
test_redq_deprec_speed[False-None] 10.3279ms 9.3594ms 106.8449 Ops/s 107.3034 Ops/s $\color{#d91a1a}-0.43\%$
test_redq_deprec_speed[False-backward] 12.8856ms 12.4404ms 80.3831 Ops/s 79.3917 Ops/s $\color{#35bf28}+1.25\%$
test_redq_deprec_speed[True-None] 3.2999ms 2.6879ms 372.0424 Ops/s 357.0060 Ops/s $\color{#35bf28}+4.21\%$
test_redq_deprec_speed[True-backward] 4.6305ms 4.1954ms 238.3548 Ops/s 225.8239 Ops/s $\textbf{\color{#35bf28}+5.55\%}$
test_redq_deprec_speed[reduce-overhead-None] 14.2275ms 9.4801ms 105.4847 Ops/s 103.7991 Ops/s $\color{#35bf28}+1.62\%$
test_td3_speed[False-None] 8.4676ms 8.2445ms 121.2930 Ops/s 121.7305 Ops/s $\color{#d91a1a}-0.36\%$
test_td3_speed[False-backward] 11.0832ms 10.5786ms 94.5307 Ops/s 92.8426 Ops/s $\color{#35bf28}+1.82\%$
test_td3_speed[True-None] 1.7170ms 1.6844ms 593.6771 Ops/s 587.5772 Ops/s $\color{#35bf28}+1.04\%$
test_td3_speed[True-backward] 3.1435ms 3.0537ms 327.4714 Ops/s 302.5590 Ops/s $\textbf{\color{#35bf28}+8.23\%}$
test_td3_speed[reduce-overhead-None] 48.7199ms 25.1290ms 39.7947 Ops/s 38.5228 Ops/s $\color{#35bf28}+3.30\%$
test_cql_speed[False-None] 17.5966ms 17.2967ms 57.8144 Ops/s 57.5407 Ops/s $\color{#35bf28}+0.48\%$
test_cql_speed[False-backward] 23.0059ms 22.4946ms 44.4552 Ops/s 43.8223 Ops/s $\color{#35bf28}+1.44\%$
test_cql_speed[True-None] 3.5657ms 3.4003ms 294.0959 Ops/s 286.0417 Ops/s $\color{#35bf28}+2.82\%$
test_cql_speed[True-backward] 5.6506ms 5.5410ms 180.4716 Ops/s 178.1185 Ops/s $\color{#35bf28}+1.32\%$
test_cql_speed[reduce-overhead-None] 18.3221ms 11.9471ms 83.7020 Ops/s 82.8251 Ops/s $\color{#35bf28}+1.06\%$
test_a2c_speed[False-None] 3.7093ms 3.2655ms 306.2287 Ops/s 306.8623 Ops/s $\color{#d91a1a}-0.21\%$
test_a2c_speed[False-backward] 6.6226ms 6.1376ms 162.9289 Ops/s 164.5039 Ops/s $\color{#d91a1a}-0.96\%$
test_a2c_speed[True-None] 1.5463ms 1.4632ms 683.4365 Ops/s 682.2850 Ops/s $\color{#35bf28}+0.17\%$
test_a2c_speed[True-backward] 3.1902ms 3.1266ms 319.8362 Ops/s 302.9035 Ops/s $\textbf{\color{#35bf28}+5.59\%}$
test_a2c_speed[reduce-overhead-None] 1.2316ms 1.0776ms 927.9786 Ops/s 916.7603 Ops/s $\color{#35bf28}+1.22\%$
test_ppo_speed[False-None] 3.9746ms 3.8938ms 256.8189 Ops/s 252.8948 Ops/s $\color{#35bf28}+1.55\%$
test_ppo_speed[False-backward] 7.4749ms 6.9794ms 143.2795 Ops/s 138.1828 Ops/s $\color{#35bf28}+3.69\%$
test_ppo_speed[True-None] 1.7005ms 1.5836ms 631.4625 Ops/s 623.7318 Ops/s $\color{#35bf28}+1.24\%$
test_ppo_speed[True-backward] 3.2939ms 3.2473ms 307.9441 Ops/s 285.2499 Ops/s $\textbf{\color{#35bf28}+7.96\%}$
test_ppo_speed[reduce-overhead-None] 1.2260ms 1.1370ms 879.5421 Ops/s 857.4530 Ops/s $\color{#35bf28}+2.58\%$
test_reinforce_speed[False-None] 3.2239ms 2.3771ms 420.6820 Ops/s 429.8048 Ops/s $\color{#d91a1a}-2.12\%$
test_reinforce_speed[False-backward] 3.5572ms 3.3732ms 296.4580 Ops/s 289.4510 Ops/s $\color{#35bf28}+2.42\%$
test_reinforce_speed[True-None] 1.6311ms 1.4261ms 701.2365 Ops/s 703.6131 Ops/s $\color{#d91a1a}-0.34\%$
test_reinforce_speed[True-backward] 3.1637ms 3.1075ms 321.8006 Ops/s 301.8917 Ops/s $\textbf{\color{#35bf28}+6.59\%}$
test_reinforce_speed[reduce-overhead-None] 0.6842s 10.4747ms 95.4678 Ops/s 113.2954 Ops/s $\textbf{\color{#d91a1a}-15.74\%}$
test_iql_speed[False-None] 10.0669ms 9.5489ms 104.7242 Ops/s 103.2378 Ops/s $\color{#35bf28}+1.44\%$
test_iql_speed[False-backward] 13.7118ms 13.2463ms 75.4926 Ops/s 74.3089 Ops/s $\color{#35bf28}+1.59\%$
test_iql_speed[True-None] 2.3840ms 2.2761ms 439.3496 Ops/s 430.1899 Ops/s $\color{#35bf28}+2.13\%$
test_iql_speed[True-backward] 4.8900ms 4.8001ms 208.3273 Ops/s 196.9331 Ops/s $\textbf{\color{#35bf28}+5.79\%}$
test_iql_speed[reduce-overhead-None] 16.7716ms 10.1661ms 98.3660 Ops/s 101.1574 Ops/s $\color{#d91a1a}-2.76\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3478ms 5.8926ms 169.7034 Ops/s 173.0552 Ops/s $\color{#d91a1a}-1.94\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9582ms 0.3478ms 2.8751 KOps/s 2.9041 KOps/s $\color{#d91a1a}-1.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6204ms 0.4049ms 2.4697 KOps/s 3.0501 KOps/s $\textbf{\color{#d91a1a}-19.03\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9803ms 5.6930ms 175.6538 Ops/s 178.8836 Ops/s $\color{#d91a1a}-1.81\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.3178ms 0.3709ms 2.6961 KOps/s 3.3019 KOps/s $\textbf{\color{#d91a1a}-18.35\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5798ms 0.3564ms 2.8060 KOps/s 3.4096 KOps/s $\textbf{\color{#d91a1a}-17.70\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7354ms 1.4091ms 709.6632 Ops/s 766.7663 Ops/s $\textbf{\color{#d91a1a}-7.45\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6368ms 1.3468ms 742.5216 Ops/s 829.0745 Ops/s $\textbf{\color{#d91a1a}-10.44\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.1218ms 6.0294ms 165.8529 Ops/s 175.5531 Ops/s $\textbf{\color{#d91a1a}-5.53\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1928ms 0.4834ms 2.0686 KOps/s 2.0491 KOps/s $\color{#35bf28}+0.95\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7124ms 0.4441ms 2.2517 KOps/s 2.3720 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8274ms 5.6901ms 175.7445 Ops/s 181.5940 Ops/s $\color{#d91a1a}-3.22\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6999ms 0.3871ms 2.5833 KOps/s 2.7348 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6253ms 0.3653ms 2.7375 KOps/s 2.8707 KOps/s $\color{#d91a1a}-4.64\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8297ms 5.6133ms 178.1486 Ops/s 178.3309 Ops/s $\color{#d91a1a}-0.10\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8228ms 0.3901ms 2.5636 KOps/s 2.7502 KOps/s $\textbf{\color{#d91a1a}-6.79\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6004ms 0.3722ms 2.6867 KOps/s 2.8750 KOps/s $\textbf{\color{#d91a1a}-6.55\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2418ms 5.7389ms 174.2486 Ops/s 170.7448 Ops/s $\color{#35bf28}+2.05\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9519s 1.9006ms 526.1577 Ops/s 1.9583 KOps/s $\textbf{\color{#d91a1a}-73.13\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7012ms 0.4712ms 2.1222 KOps/s 2.1265 KOps/s $\color{#d91a1a}-0.20\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.5863ms 5.0606ms 197.6035 Ops/s 196.8185 Ops/s $\color{#35bf28}+0.40\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.8454ms 1.7918ms 558.1117 Ops/s 447.2875 Ops/s $\textbf{\color{#35bf28}+24.78\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.6008ms 1.0136ms 986.5922 Ops/s 1.0074 KOps/s $\color{#d91a1a}-2.07\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.4806ms 4.9791ms 200.8387 Ops/s 195.5451 Ops/s $\color{#35bf28}+2.71\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 12.9221ms 2.0617ms 485.0315 Ops/s 462.4501 Ops/s $\color{#35bf28}+4.88\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.5353ms 1.1966ms 835.7167 Ops/s 1.0263 KOps/s $\textbf{\color{#d91a1a}-18.57\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.6893s 18.9502ms 52.7699 Ops/s 43.2767 Ops/s $\textbf{\color{#35bf28}+21.94\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.9971ms 2.1070ms 474.6122 Ops/s 477.4768 Ops/s $\color{#d91a1a}-0.60\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.4360ms 1.1330ms 882.6385 Ops/s 847.7341 Ops/s $\color{#35bf28}+4.12\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.2770ms 38.6516ms 25.8721 Ops/s 25.0401 Ops/s $\color{#35bf28}+3.32\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3710ms 17.8813ms 55.9244 Ops/s 53.7021 Ops/s $\color{#35bf28}+4.14\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.0957ms 40.1461ms 24.9090 Ops/s 24.3893 Ops/s $\color{#35bf28}+2.13\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3111ms 18.3609ms 54.4636 Ops/s 52.5077 Ops/s $\color{#35bf28}+3.72\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.4449ms 42.1447ms 23.7278 Ops/s 23.4866 Ops/s $\color{#35bf28}+1.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.1909ms 19.8633ms 50.3441 Ops/s 49.3443 Ops/s $\color{#35bf28}+2.03\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9094ms 0.2292ms 4.3634 KOps/s 4.3282 KOps/s $\color{#35bf28}+0.82\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7187ms 1.4769ms 677.0880 Ops/s 662.6937 Ops/s $\color{#35bf28}+2.17\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7637ms 2.3938ms 417.7520 Ops/s 405.6280 Ops/s $\color{#35bf28}+2.99\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.2769ms 3.0750ms 325.2041 Ops/s 319.0962 Ops/s $\color{#35bf28}+1.91\%$
test_storage_write_contiguous[50-img_shape0-small] 0.3737ms 0.1598ms 6.2596 KOps/s 5.9488 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_storage_write_contiguous[100-img_shape1-atari] 0.3430ms 0.2330ms 4.2926 KOps/s 3.9655 KOps/s $\textbf{\color{#35bf28}+8.25\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1669ms 1.9613ms 509.8537 Ops/s 513.6646 Ops/s $\color{#d91a1a}-0.74\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.7421ms 1.4251ms 701.6900 Ops/s 653.6124 Ops/s $\textbf{\color{#35bf28}+7.36\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.4566ms 1.1350ms 881.0616 Ops/s 882.4118 Ops/s $\color{#d91a1a}-0.15\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8706ms 3.6817ms 271.6132 Ops/s 261.5268 Ops/s $\color{#35bf28}+3.86\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.3146ms 6.0288ms 165.8712 Ops/s 165.3997 Ops/s $\color{#35bf28}+0.29\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 8.8078ms 7.5948ms 131.6689 Ops/s 136.0147 Ops/s $\color{#d91a1a}-3.20\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.5451ms 0.2799ms 3.5725 KOps/s 3.4922 KOps/s $\color{#35bf28}+2.30\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.8352ms 1.5791ms 633.2842 Ops/s 633.1781 Ops/s $\color{#35bf28}+0.02\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.7774ms 2.5278ms 395.6001 Ops/s 385.0150 Ops/s $\color{#35bf28}+2.75\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4734ms 3.2591ms 306.8362 Ops/s 300.1090 Ops/s $\color{#35bf28}+2.24\%$
test_collector_without_rb[100-img_shape0-atari] 34.2071ms 32.9983ms 30.3046 Ops/s 29.7993 Ops/s $\color{#35bf28}+1.70\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.1660ms 64.7396ms 15.4465 Ops/s 15.3107 Ops/s $\color{#35bf28}+0.89\%$
test_collector_with_rb[100-img_shape0-atari] 37.5790ms 37.0860ms 26.9644 Ops/s 26.1044 Ops/s $\color{#35bf28}+3.29\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.0881ms 73.4548ms 13.6138 Ops/s 12.5112 Ops/s $\textbf{\color{#35bf28}+8.81\%}$
test_collector_without_rb_cuda[100-img_shape0-atari] 56.9760ms 56.3142ms 17.7575 Ops/s 17.5602 Ops/s $\color{#35bf28}+1.12\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1128s 0.1117s 8.9506 Ops/s 8.8288 Ops/s $\color{#35bf28}+1.38\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 58.7702ms 57.7742ms 17.3088 Ops/s 16.9363 Ops/s $\color{#35bf28}+2.20\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1170s 0.1153s 8.6724 Ops/s 8.5919 Ops/s $\color{#35bf28}+0.94\%$

Comment thread torchrl/envs/common.py
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want that.
The proper way of doing auto-reset with torch compile should be to ALWAYS compute the reset and then mask reset and non-rest in tensordict_ using torch.where. We should have a toy auto-reset env that we can compile as an example and we should discuss how to make an extension point out of this but we should NOT skip maybe_reset entirely. I would rather make it a no-op if auto-reset is used with the masking I just talked about, or implement the masking in that maybe_reset which I would find more natural.

[ghstack-poisoned]
@vmoens vmoens closed this Apr 11, 2026
@vmoens vmoens deleted the gh/vmoens/241/head branch April 20, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance Performance issue or suggestion for improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant