Skip to content

[Performance] Add _trust_step_output flag to skip step validation overhead#3562

Closed
vmoens wants to merge 2 commits intogh/vmoens/243/basefrom
gh/vmoens/243/head
Closed

[Performance] Add _trust_step_output flag to skip step validation overhead#3562
vmoens wants to merge 2 commits intogh/vmoens/243/basefrom
gh/vmoens/243/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 23, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3562

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 16 Pending

As of commit 786eb82 with merge base a4301ee (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 86.0372μs 84.7566μs 11.7985 KOps/s 12.2644 KOps/s $\color{#d91a1a}-3.80\%$
test_tensor_to_bytestream_speed[torch.save] 0.1434ms 0.1429ms 7.0001 KOps/s 7.0658 KOps/s $\color{#d91a1a}-0.93\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1070s 0.1068s 9.3636 Ops/s 9.3995 Ops/s $\color{#d91a1a}-0.38\%$
test_tensor_to_bytestream_speed[numpy] 2.5884μs 2.5858μs 386.7318 KOps/s 386.1219 KOps/s $\color{#35bf28}+0.16\%$
test_tensor_to_bytestream_speed[safetensors] 39.5894μs 39.3182μs 25.4335 KOps/s 26.7399 KOps/s $\color{#d91a1a}-4.89\%$
test_simple 0.5643s 0.5607s 1.7835 Ops/s 1.7287 Ops/s $\color{#35bf28}+3.17\%$
test_transformed 1.0984s 1.0957s 0.9127 Ops/s 0.8914 Ops/s $\color{#35bf28}+2.38\%$
test_serial 1.7280s 1.7033s 0.5871 Ops/s 0.5758 Ops/s $\color{#35bf28}+1.96\%$
test_parallel 1.0315s 1.0220s 0.9785 Ops/s 0.9751 Ops/s $\color{#35bf28}+0.34\%$
test_step_mdp_speed[True-True-True-True-True] 0.2651ms 41.7879μs 23.9304 KOps/s 24.4792 KOps/s $\color{#d91a1a}-2.24\%$
test_step_mdp_speed[True-True-True-True-False] 66.1240μs 22.7265μs 44.0016 KOps/s 43.8433 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[True-True-True-False-True] 52.8830μs 23.9898μs 41.6843 KOps/s 42.1662 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[True-True-True-False-False] 53.9130μs 12.7729μs 78.2905 KOps/s 78.3702 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-True-False-True-True] 0.1176ms 44.0245μs 22.7146 KOps/s 22.4697 KOps/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[True-True-False-True-False] 56.0530μs 25.6258μs 39.0232 KOps/s 39.6960 KOps/s $\color{#d91a1a}-1.69\%$
test_step_mdp_speed[True-True-False-False-True] 63.9940μs 25.9455μs 38.5423 KOps/s 37.7393 KOps/s $\color{#35bf28}+2.13\%$
test_step_mdp_speed[True-True-False-False-False] 39.3920μs 15.3889μs 64.9820 KOps/s 64.8722 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[True-False-True-True-True] 82.9150μs 46.0494μs 21.7158 KOps/s 21.4381 KOps/s $\color{#35bf28}+1.30\%$
test_step_mdp_speed[True-False-True-True-False] 57.6640μs 28.0892μs 35.6008 KOps/s 35.2399 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[True-False-True-False-True] 54.2240μs 25.9423μs 38.5471 KOps/s 38.0448 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[True-False-True-False-False] 46.0030μs 15.3678μs 65.0710 KOps/s 65.0183 KOps/s $\color{#35bf28}+0.08\%$
test_step_mdp_speed[True-False-False-True-True] 82.8650μs 49.5875μs 20.1664 KOps/s 20.1948 KOps/s $\color{#d91a1a}-0.14\%$
test_step_mdp_speed[True-False-False-True-False] 69.6840μs 30.7576μs 32.5123 KOps/s 32.7766 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[True-False-False-False-True] 63.5630μs 28.3465μs 35.2777 KOps/s 34.6932 KOps/s $\color{#35bf28}+1.68\%$
test_step_mdp_speed[True-False-False-False-False] 61.3540μs 17.7419μs 56.3638 KOps/s 55.4090 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[False-True-True-True-True] 0.1610ms 46.3408μs 21.5792 KOps/s 21.3817 KOps/s $\color{#35bf28}+0.92\%$
test_step_mdp_speed[False-True-True-True-False] 56.8830μs 28.2309μs 35.4221 KOps/s 35.5847 KOps/s $\color{#d91a1a}-0.46\%$
test_step_mdp_speed[False-True-True-False-True] 2.3910ms 29.9954μs 33.3384 KOps/s 33.3324 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[False-True-True-False-False] 46.5420μs 17.0391μs 58.6885 KOps/s 59.0616 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[False-True-False-True-True] 79.0150μs 49.6901μs 20.1247 KOps/s 20.1794 KOps/s $\color{#d91a1a}-0.27\%$
test_step_mdp_speed[False-True-False-True-False] 0.1002ms 30.5696μs 32.7122 KOps/s 32.3992 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[False-True-False-False-True] 67.1040μs 32.8027μs 30.4853 KOps/s 30.7272 KOps/s $\color{#d91a1a}-0.79\%$
test_step_mdp_speed[False-True-False-False-False] 59.0530μs 19.3093μs 51.7884 KOps/s 51.2968 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[False-False-True-True-True] 81.3450μs 51.6072μs 19.3771 KOps/s 19.2920 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[False-False-True-True-False] 61.5330μs 33.1977μs 30.1226 KOps/s 29.9803 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[False-False-True-False-True] 75.1550μs 32.0529μs 31.1984 KOps/s 30.6788 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-False-True-False-False] 95.3260μs 19.6252μs 50.9548 KOps/s 50.5330 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[False-False-False-True-True] 89.5960μs 54.1786μs 18.4575 KOps/s 18.6614 KOps/s $\color{#d91a1a}-1.09\%$
test_step_mdp_speed[False-False-False-True-False] 79.3850μs 35.8749μs 27.8747 KOps/s 27.6896 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-False-False-False-True] 62.4630μs 34.3402μs 29.1204 KOps/s 28.9573 KOps/s $\color{#35bf28}+0.56\%$
test_step_mdp_speed[False-False-False-False-False] 53.7230μs 22.1560μs 45.1346 KOps/s 45.3273 KOps/s $\color{#d91a1a}-0.43\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8510s 0.7485s 1.3359 Ops/s 1.3264 Ops/s $\color{#35bf28}+0.72\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7164s 0.6095s 1.6407 Ops/s 1.6346 Ops/s $\color{#35bf28}+0.37\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7362s 1.6507s 0.6058 Ops/s 0.6024 Ops/s $\color{#35bf28}+0.57\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5159s 1.4277s 0.7004 Ops/s 0.6966 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9795s 1.8964s 0.5273 Ops/s 0.5236 Ops/s $\color{#35bf28}+0.72\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7701s 1.6823s 0.5944 Ops/s 0.5931 Ops/s $\color{#35bf28}+0.22\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6891s 4.6076s 0.2170 Ops/s 0.2167 Ops/s $\color{#35bf28}+0.16\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6311s 4.4187s 0.2263 Ops/s 0.2291 Ops/s $\color{#d91a1a}-1.24\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9590s 1.8682s 0.5353 Ops/s 0.5286 Ops/s $\color{#35bf28}+1.27\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7080s 1.6015s 0.6244 Ops/s 0.6237 Ops/s $\color{#35bf28}+0.11\%$
test_values[generalized_advantage_estimate-True-True] 10.3446ms 10.2084ms 97.9586 Ops/s 97.8444 Ops/s $\color{#35bf28}+0.12\%$
test_values[vec_generalized_advantage_estimate-True-True] 18.1056ms 11.5220ms 86.7908 Ops/s 57.0470 Ops/s $\textbf{\color{#35bf28}+52.14\%}$
test_values[td0_return_estimate-False-False] 0.2108ms 0.1286ms 7.7751 KOps/s 4.5465 KOps/s $\textbf{\color{#35bf28}+71.01\%}$
test_values[td1_return_estimate-False-False] 28.2432ms 27.8297ms 35.9328 Ops/s 36.0489 Ops/s $\color{#d91a1a}-0.32\%$
test_values[vec_td1_return_estimate-False-False] 11.5136ms 11.0858ms 90.2057 Ops/s 56.9249 Ops/s $\textbf{\color{#35bf28}+58.46\%}$
test_values[td_lambda_return_estimate-True-False] 41.8561ms 41.1752ms 24.2865 Ops/s 24.2651 Ops/s $\color{#35bf28}+0.09\%$
test_values[vec_td_lambda_return_estimate-True-False] 17.6643ms 11.5681ms 86.4444 Ops/s 56.0327 Ops/s $\textbf{\color{#35bf28}+54.27\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.1809ms 9.0169ms 110.9033 Ops/s 111.2528 Ops/s $\color{#d91a1a}-0.31\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7579ms 1.5454ms 647.0635 Ops/s 631.9814 Ops/s $\color{#35bf28}+2.39\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6104ms 0.4183ms 2.3904 KOps/s 2.3287 KOps/s $\color{#35bf28}+2.65\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.7033ms 29.8044ms 33.5521 Ops/s 28.6996 Ops/s $\textbf{\color{#35bf28}+16.91\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8274ms 1.7179ms 582.1003 Ops/s 582.1909 Ops/s $\color{#d91a1a}-0.02\%$
test_dqn_speed[False-None] 1.7875ms 1.4254ms 701.5759 Ops/s 703.2658 Ops/s $\color{#d91a1a}-0.24\%$
test_dqn_speed[False-backward] 2.0430ms 1.9684ms 508.0181 Ops/s 506.9607 Ops/s $\color{#35bf28}+0.21\%$
test_dqn_speed[True-None] 0.6100ms 0.5650ms 1.7698 KOps/s 1.6636 KOps/s $\textbf{\color{#35bf28}+6.38\%}$
test_dqn_speed[True-backward] 1.0576ms 1.0326ms 968.4102 Ops/s 842.3378 Ops/s $\textbf{\color{#35bf28}+14.97\%}$
test_dqn_speed[reduce-overhead-None] 0.7148ms 0.5491ms 1.8211 KOps/s 1.6833 KOps/s $\textbf{\color{#35bf28}+8.19\%}$
test_ddpg_speed[False-None] 3.2238ms 2.8422ms 351.8463 Ops/s 348.0861 Ops/s $\color{#35bf28}+1.08\%$
test_ddpg_speed[False-backward] 4.1554ms 4.0544ms 246.6435 Ops/s 241.9321 Ops/s $\color{#35bf28}+1.95\%$
test_ddpg_speed[True-None] 1.5742ms 1.4408ms 694.0717 Ops/s 670.9353 Ops/s $\color{#35bf28}+3.45\%$
test_ddpg_speed[True-backward] 2.4948ms 2.4453ms 408.9540 Ops/s 395.5947 Ops/s $\color{#35bf28}+3.38\%$
test_ddpg_speed[reduce-overhead-None] 1.4870ms 1.4318ms 698.4046 Ops/s 684.9056 Ops/s $\color{#35bf28}+1.97\%$
test_sac_speed[False-None] 8.7616ms 8.1427ms 122.8088 Ops/s 122.5031 Ops/s $\color{#35bf28}+0.25\%$
test_sac_speed[False-backward] 11.9270ms 11.3520ms 88.0903 Ops/s 87.5088 Ops/s $\color{#35bf28}+0.66\%$
test_sac_speed[True-None] 2.3328ms 2.2051ms 453.5035 Ops/s 447.9594 Ops/s $\color{#35bf28}+1.24\%$
test_sac_speed[True-backward] 4.2686ms 4.1605ms 240.3532 Ops/s 211.7225 Ops/s $\textbf{\color{#35bf28}+13.52\%}$
test_sac_speed[reduce-overhead-None] 2.3972ms 2.1925ms 456.0919 Ops/s 445.5255 Ops/s $\color{#35bf28}+2.37\%$
test_redq_speed[False-None] 14.7175ms 10.8216ms 92.4076 Ops/s 93.6461 Ops/s $\color{#d91a1a}-1.32\%$
test_redq_speed[False-backward] 19.3297ms 18.2279ms 54.8609 Ops/s 54.9879 Ops/s $\color{#d91a1a}-0.23\%$
test_redq_speed[True-None] 4.9563ms 4.6853ms 213.4327 Ops/s 219.1912 Ops/s $\color{#d91a1a}-2.63\%$
test_redq_speed[reduce-overhead-None] 4.7529ms 4.5666ms 218.9797 Ops/s 210.6189 Ops/s $\color{#35bf28}+3.97\%$
test_redq_deprec_speed[False-None] 12.0167ms 11.3800ms 87.8737 Ops/s 89.4868 Ops/s $\color{#d91a1a}-1.80\%$
test_redq_deprec_speed[False-backward] 16.5829ms 16.2368ms 61.5887 Ops/s 61.5749 Ops/s $\color{#35bf28}+0.02\%$
test_redq_deprec_speed[True-None] 3.9934ms 3.7230ms 268.6037 Ops/s 269.1337 Ops/s $\color{#d91a1a}-0.20\%$
test_redq_deprec_speed[True-backward] 7.7402ms 7.4899ms 133.5135 Ops/s 130.7984 Ops/s $\color{#35bf28}+2.08\%$
test_redq_deprec_speed[reduce-overhead-None] 3.8436ms 3.6038ms 277.4847 Ops/s 270.5583 Ops/s $\color{#35bf28}+2.56\%$
test_td3_speed[False-None] 8.1902ms 8.1257ms 123.0670 Ops/s 121.8472 Ops/s $\color{#35bf28}+1.00\%$
test_td3_speed[False-backward] 11.3887ms 10.9912ms 90.9822 Ops/s 90.0619 Ops/s $\color{#35bf28}+1.02\%$
test_td3_speed[True-None] 1.8730ms 1.8406ms 543.3044 Ops/s 532.2440 Ops/s $\color{#35bf28}+2.08\%$
test_td3_speed[True-backward] 3.8096ms 3.6523ms 273.7985 Ops/s 251.9391 Ops/s $\textbf{\color{#35bf28}+8.68\%}$
test_td3_speed[reduce-overhead-None] 1.8218ms 1.8018ms 555.0106 Ops/s 548.9339 Ops/s $\color{#35bf28}+1.11\%$
test_cql_speed[False-None] 29.2308ms 26.4748ms 37.7717 Ops/s 37.7744 Ops/s $-0.01\%$
test_cql_speed[False-backward] 36.4147ms 35.6593ms 28.0432 Ops/s 27.8155 Ops/s $\color{#35bf28}+0.82\%$
test_cql_speed[True-None] 15.5482ms 12.7106ms 78.6746 Ops/s 78.5293 Ops/s $\color{#35bf28}+0.19\%$
test_cql_speed[True-backward] 18.8781ms 18.4095ms 54.3198 Ops/s 55.7262 Ops/s $\color{#d91a1a}-2.52\%$
test_cql_speed[reduce-overhead-None] 13.0494ms 12.7441ms 78.4677 Ops/s 78.6513 Ops/s $\color{#d91a1a}-0.23\%$
test_a2c_speed[False-None] 5.9493ms 5.4764ms 182.6023 Ops/s 188.3884 Ops/s $\color{#d91a1a}-3.07\%$
test_a2c_speed[False-backward] 12.6430ms 12.1059ms 82.6040 Ops/s 84.4302 Ops/s $\color{#d91a1a}-2.16\%$
test_a2c_speed[True-None] 4.2190ms 3.8500ms 259.7402 Ops/s 250.5283 Ops/s $\color{#35bf28}+3.68\%$
test_a2c_speed[True-backward] 8.9874ms 8.8081ms 113.5324 Ops/s 108.0386 Ops/s $\textbf{\color{#35bf28}+5.09\%}$
test_a2c_speed[reduce-overhead-None] 4.3119ms 3.8474ms 259.9189 Ops/s 254.9137 Ops/s $\color{#35bf28}+1.96\%$
test_ppo_speed[False-None] 6.3708ms 5.9162ms 169.0279 Ops/s 163.8547 Ops/s $\color{#35bf28}+3.16\%$
test_ppo_speed[False-backward] 12.8386ms 12.5884ms 79.4379 Ops/s 78.1167 Ops/s $\color{#35bf28}+1.69\%$
test_ppo_speed[True-None] 4.1765ms 3.8046ms 262.8415 Ops/s 253.3577 Ops/s $\color{#35bf28}+3.74\%$
test_ppo_speed[True-backward] 9.1258ms 8.8106ms 113.5003 Ops/s 113.8969 Ops/s $\color{#d91a1a}-0.35\%$
test_ppo_speed[reduce-overhead-None] 4.2405ms 3.7870ms 264.0598 Ops/s 260.4276 Ops/s $\color{#35bf28}+1.39\%$
test_reinforce_speed[False-None] 5.1025ms 4.6156ms 216.6577 Ops/s 215.1302 Ops/s $\color{#35bf28}+0.71\%$
test_reinforce_speed[False-backward] 7.9089ms 7.5016ms 133.3049 Ops/s 132.2834 Ops/s $\color{#35bf28}+0.77\%$
test_reinforce_speed[True-None] 3.1976ms 3.0303ms 330.0027 Ops/s 331.6120 Ops/s $\color{#d91a1a}-0.49\%$
test_reinforce_speed[True-backward] 8.2686ms 8.0395ms 124.3855 Ops/s 123.0542 Ops/s $\color{#35bf28}+1.08\%$
test_reinforce_speed[reduce-overhead-None] 3.1188ms 3.0053ms 332.7467 Ops/s 328.2906 Ops/s $\color{#35bf28}+1.36\%$
test_iql_speed[False-None] 20.9898ms 20.2948ms 49.2736 Ops/s 48.4545 Ops/s $\color{#35bf28}+1.69\%$
test_iql_speed[False-backward] 31.6126ms 30.8550ms 32.4097 Ops/s 32.5117 Ops/s $\color{#d91a1a}-0.31\%$
test_iql_speed[True-None] 8.9166ms 8.6187ms 116.0268 Ops/s 116.7672 Ops/s $\color{#d91a1a}-0.63\%$
test_iql_speed[True-backward] 17.5451ms 17.0771ms 58.5579 Ops/s 60.4359 Ops/s $\color{#d91a1a}-3.11\%$
test_iql_speed[reduce-overhead-None] 8.9675ms 8.6015ms 116.2588 Ops/s 116.0363 Ops/s $\color{#35bf28}+0.19\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2705ms 6.0769ms 164.5569 Ops/s 163.3545 Ops/s $\color{#35bf28}+0.74\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.0303ms 0.3303ms 3.0273 KOps/s 2.8967 KOps/s $\color{#35bf28}+4.51\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6133ms 0.3285ms 3.0444 KOps/s 2.8999 KOps/s $\color{#35bf28}+4.98\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1219ms 5.8666ms 170.4574 Ops/s 169.9871 Ops/s $\color{#35bf28}+0.28\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8562ms 0.3065ms 3.2622 KOps/s 3.5195 KOps/s $\textbf{\color{#d91a1a}-7.31\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5495ms 0.2875ms 3.4781 KOps/s 3.7597 KOps/s $\textbf{\color{#d91a1a}-7.49\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5103ms 1.2828ms 779.5184 Ops/s 784.2925 Ops/s $\color{#d91a1a}-0.61\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4387ms 1.1969ms 835.4785 Ops/s 835.5618 Ops/s $-0.01\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.9850ms 6.1166ms 163.4889 Ops/s 166.9263 Ops/s $\color{#d91a1a}-2.06\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9286ms 0.4770ms 2.0966 KOps/s 2.2781 KOps/s $\textbf{\color{#d91a1a}-7.97\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7815ms 0.4620ms 2.1645 KOps/s 2.3584 KOps/s $\textbf{\color{#d91a1a}-8.22\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2660ms 5.8706ms 170.3398 Ops/s 169.9771 Ops/s $\color{#35bf28}+0.21\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.1381ms 0.3277ms 3.0520 KOps/s 3.1685 KOps/s $\color{#d91a1a}-3.68\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5795ms 0.3215ms 3.1107 KOps/s 3.6866 KOps/s $\textbf{\color{#d91a1a}-15.62\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0762ms 5.8235ms 171.7189 Ops/s 172.5805 Ops/s $\color{#d91a1a}-0.50\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.1933ms 0.3282ms 3.0472 KOps/s 3.4621 KOps/s $\textbf{\color{#d91a1a}-11.98\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5566ms 0.3139ms 3.1855 KOps/s 3.7505 KOps/s $\textbf{\color{#d91a1a}-15.06\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0907ms 5.9896ms 166.9564 Ops/s 167.3182 Ops/s $\color{#d91a1a}-0.22\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1285ms 0.5157ms 1.9390 KOps/s 1.8688 KOps/s $\color{#35bf28}+3.76\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7646ms 0.4682ms 2.1359 KOps/s 2.0404 KOps/s $\color{#35bf28}+4.68\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4090ms 5.0085ms 199.6598 Ops/s 48.8958 Ops/s $\textbf{\color{#35bf28}+308.34\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.5611ms 2.1099ms 473.9573 Ops/s 538.4763 Ops/s $\textbf{\color{#d91a1a}-11.98\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 9.1032ms 1.2415ms 805.4542 Ops/s 814.5706 Ops/s $\color{#d91a1a}-1.12\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6578s 18.1497ms 55.0973 Ops/s 196.1821 Ops/s $\textbf{\color{#d91a1a}-71.92\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.8358ms 1.7693ms 565.1942 Ops/s 571.2325 Ops/s $\color{#d91a1a}-1.06\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.0972ms 0.8793ms 1.1373 KOps/s 806.2823 Ops/s $\textbf{\color{#35bf28}+41.05\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.2017ms 5.2783ms 189.4566 Ops/s 189.7897 Ops/s $\color{#d91a1a}-0.18\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 12.3239ms 2.0886ms 478.7892 Ops/s 454.5964 Ops/s $\textbf{\color{#35bf28}+5.32\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.5757ms 1.1038ms 905.9577 Ops/s 965.0256 Ops/s $\textbf{\color{#d91a1a}-6.12\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 44.6489ms 39.6519ms 25.2195 Ops/s 25.0578 Ops/s $\color{#35bf28}+0.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.1549ms 18.5155ms 54.0089 Ops/s 53.9906 Ops/s $\color{#35bf28}+0.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.7888ms 40.6970ms 24.5718 Ops/s 24.2767 Ops/s $\color{#35bf28}+1.22\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2092ms 18.8031ms 53.1827 Ops/s 53.5079 Ops/s $\color{#d91a1a}-0.61\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.0303ms 42.6548ms 23.4440 Ops/s 23.2238 Ops/s $\color{#35bf28}+0.95\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.3666ms 20.2782ms 49.3140 Ops/s 49.3028 Ops/s $\color{#35bf28}+0.02\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8316ms 0.2225ms 4.4945 KOps/s 4.3682 KOps/s $\color{#35bf28}+2.89\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7670ms 1.4566ms 686.5465 Ops/s 713.7084 Ops/s $\color{#d91a1a}-3.81\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7906ms 2.3515ms 425.2691 Ops/s 432.2654 Ops/s $\color{#d91a1a}-1.62\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1673ms 2.9964ms 333.7357 Ops/s 339.7047 Ops/s $\color{#d91a1a}-1.76\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2156ms 0.1348ms 7.4192 KOps/s 7.3801 KOps/s $\color{#35bf28}+0.53\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3299ms 0.1892ms 5.2865 KOps/s 5.2423 KOps/s $\color{#35bf28}+0.84\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9895ms 1.7981ms 556.1368 Ops/s 572.2530 Ops/s $\color{#d91a1a}-2.82\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6141ms 1.3260ms 754.1444 Ops/s 771.5519 Ops/s $\color{#d91a1a}-2.26\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2551ms 1.1368ms 879.6466 Ops/s 887.4955 Ops/s $\color{#d91a1a}-0.88\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.5836ms 3.6524ms 273.7914 Ops/s 278.3963 Ops/s $\color{#d91a1a}-1.65\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.3359ms 5.8572ms 170.7294 Ops/s 177.3853 Ops/s $\color{#d91a1a}-3.75\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.6612ms 7.4636ms 133.9835 Ops/s 142.0208 Ops/s $\textbf{\color{#d91a1a}-5.66\%}$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4469ms 0.2778ms 3.5997 KOps/s 3.5758 KOps/s $\color{#35bf28}+0.67\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7596ms 1.5672ms 638.0673 Ops/s 652.1202 Ops/s $\color{#d91a1a}-2.15\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.5886ms 2.4529ms 407.6863 Ops/s 408.3134 Ops/s $\color{#d91a1a}-0.15\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5852ms 3.2041ms 312.1033 Ops/s 319.3849 Ops/s $\color{#d91a1a}-2.28\%$
test_collector_without_rb[100-img_shape0-atari] 33.7077ms 33.1879ms 30.1314 Ops/s 30.3389 Ops/s $\color{#d91a1a}-0.68\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.6192ms 65.1363ms 15.3524 Ops/s 15.5123 Ops/s $\color{#d91a1a}-1.03\%$
test_collector_with_rb[100-img_shape0-atari] 38.1512ms 37.5935ms 26.6004 Ops/s 26.6760 Ops/s $\color{#d91a1a}-0.28\%$
test_collector_with_rb[200-img_shape1-large_batch] 96.4762ms 75.3919ms 13.2640 Ops/s 13.6618 Ops/s $\color{#d91a1a}-2.91\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.2575μs 81.1167μs 12.3279 KOps/s 12.2842 KOps/s $\color{#35bf28}+0.36\%$
test_tensor_to_bytestream_speed[torch.save] 0.1432ms 0.1427ms 7.0087 KOps/s 7.0307 KOps/s $\color{#d91a1a}-0.31\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1108s 0.1105s 9.0513 Ops/s 9.2687 Ops/s $\color{#d91a1a}-2.34\%$
test_tensor_to_bytestream_speed[numpy] 2.6707μs 2.6663μs 375.0484 KOps/s 398.4981 KOps/s $\textbf{\color{#d91a1a}-5.88\%}$
test_tensor_to_bytestream_speed[safetensors] 39.3121μs 38.9139μs 25.6978 KOps/s 26.1206 KOps/s $\color{#d91a1a}-1.62\%$
test_simple 0.9209s 0.8185s 1.2217 Ops/s 1.2177 Ops/s $\color{#35bf28}+0.33\%$
test_transformed 1.3967s 1.3960s 0.7163 Ops/s 0.7051 Ops/s $\color{#35bf28}+1.60\%$
test_serial 2.3381s 2.3364s 0.4280 Ops/s 0.4283 Ops/s $\color{#d91a1a}-0.06\%$
test_parallel 1.9307s 1.8509s 0.5403 Ops/s 0.5530 Ops/s $\color{#d91a1a}-2.31\%$
test_step_mdp_speed[True-True-True-True-True] 0.2646ms 41.9590μs 23.8328 KOps/s 23.1951 KOps/s $\color{#35bf28}+2.75\%$
test_step_mdp_speed[True-True-True-True-False] 54.4510μs 22.7519μs 43.9523 KOps/s 41.9674 KOps/s $\color{#35bf28}+4.73\%$
test_step_mdp_speed[True-True-True-False-True] 54.6410μs 24.2034μs 41.3165 KOps/s 42.6744 KOps/s $\color{#d91a1a}-3.18\%$
test_step_mdp_speed[True-True-True-False-False] 43.7210μs 12.7474μs 78.4472 KOps/s 76.6595 KOps/s $\color{#35bf28}+2.33\%$
test_step_mdp_speed[True-True-False-True-True] 83.0320μs 44.9124μs 22.2656 KOps/s 21.0195 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_step_mdp_speed[True-True-False-True-False] 55.8310μs 25.4583μs 39.2800 KOps/s 37.8554 KOps/s $\color{#35bf28}+3.76\%$
test_step_mdp_speed[True-True-False-False-True] 56.7310μs 26.3217μs 37.9914 KOps/s 37.6923 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[True-True-False-False-False] 39.8010μs 15.3794μs 65.0220 KOps/s 62.8031 KOps/s $\color{#35bf28}+3.53\%$
test_step_mdp_speed[True-False-True-True-True] 76.7220μs 47.2328μs 21.1717 KOps/s 21.4171 KOps/s $\color{#d91a1a}-1.15\%$
test_step_mdp_speed[True-False-True-True-False] 48.5110μs 28.0610μs 35.6367 KOps/s 34.7599 KOps/s $\color{#35bf28}+2.52\%$
test_step_mdp_speed[True-False-True-False-True] 53.0210μs 26.2830μs 38.0474 KOps/s 37.0441 KOps/s $\color{#35bf28}+2.71\%$
test_step_mdp_speed[True-False-True-False-False] 39.4600μs 15.3841μs 65.0021 KOps/s 65.3473 KOps/s $\color{#d91a1a}-0.53\%$
test_step_mdp_speed[True-False-False-True-True] 83.8310μs 50.7127μs 19.7189 KOps/s 19.8600 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[True-False-False-True-False] 55.5210μs 30.4639μs 32.8258 KOps/s 32.5239 KOps/s $\color{#35bf28}+0.93\%$
test_step_mdp_speed[True-False-False-False-True] 66.8210μs 28.4324μs 35.1712 KOps/s 35.5484 KOps/s $\color{#d91a1a}-1.06\%$
test_step_mdp_speed[True-False-False-False-False] 60.3110μs 17.9208μs 55.8010 KOps/s 56.1530 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[False-True-True-True-True] 0.1140ms 47.1272μs 21.2192 KOps/s 21.5909 KOps/s $\color{#d91a1a}-1.72\%$
test_step_mdp_speed[False-True-True-True-False] 60.4320μs 28.0136μs 35.6969 KOps/s 35.8358 KOps/s $\color{#d91a1a}-0.39\%$
test_step_mdp_speed[False-True-True-False-True] 2.4575ms 30.3314μs 32.9691 KOps/s 32.4715 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[False-True-True-False-False] 46.5410μs 16.9631μs 58.9516 KOps/s 56.9377 KOps/s $\color{#35bf28}+3.54\%$
test_step_mdp_speed[False-True-False-True-True] 81.2410μs 49.0283μs 20.3964 KOps/s 20.3694 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-True-False-True-False] 64.5810μs 30.5777μs 32.7036 KOps/s 32.7229 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[False-True-False-False-True] 55.9710μs 32.1238μs 31.1296 KOps/s 31.1304 KOps/s $-0.00\%$
test_step_mdp_speed[False-True-False-False-False] 47.6400μs 19.4517μs 51.4094 KOps/s 52.0851 KOps/s $\color{#d91a1a}-1.30\%$
test_step_mdp_speed[False-False-True-True-True] 94.8210μs 52.6257μs 19.0021 KOps/s 19.3066 KOps/s $\color{#d91a1a}-1.58\%$
test_step_mdp_speed[False-False-True-True-False] 60.2910μs 33.3402μs 29.9938 KOps/s 29.9758 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[False-False-True-False-True] 70.4110μs 32.2770μs 30.9819 KOps/s 30.9251 KOps/s $\color{#35bf28}+0.18\%$
test_step_mdp_speed[False-False-True-False-False] 58.9510μs 19.5445μs 51.1653 KOps/s 51.5910 KOps/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[False-False-False-True-True] 0.1078ms 54.3945μs 18.3842 KOps/s 18.7154 KOps/s $\color{#d91a1a}-1.77\%$
test_step_mdp_speed[False-False-False-True-False] 74.5610μs 35.9074μs 27.8494 KOps/s 27.6357 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[False-False-False-False-True] 64.1120μs 33.7812μs 29.6022 KOps/s 29.2675 KOps/s $\color{#35bf28}+1.14\%$
test_step_mdp_speed[False-False-False-False-False] 67.4610μs 21.8977μs 45.6669 KOps/s 46.0585 KOps/s $\color{#d91a1a}-0.85\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7225s 0.7192s 1.3904 Ops/s 1.3261 Ops/s $\color{#35bf28}+4.85\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7213s 0.6118s 1.6345 Ops/s 1.6277 Ops/s $\color{#35bf28}+0.41\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7518s 1.6562s 0.6038 Ops/s 0.6001 Ops/s $\color{#35bf28}+0.61\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5190s 1.4353s 0.6967 Ops/s 0.6927 Ops/s $\color{#35bf28}+0.58\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0008s 1.9151s 0.5222 Ops/s 0.5180 Ops/s $\color{#35bf28}+0.80\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7691s 1.6850s 0.5935 Ops/s 0.5854 Ops/s $\color{#35bf28}+1.38\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7299s 4.6447s 0.2153 Ops/s 0.2178 Ops/s $\color{#d91a1a}-1.15\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5474s 4.4636s 0.2240 Ops/s 0.2242 Ops/s $\color{#d91a1a}-0.06\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9633s 1.8846s 0.5306 Ops/s 0.5318 Ops/s $\color{#d91a1a}-0.22\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7488s 1.6085s 0.6217 Ops/s 0.6191 Ops/s $\color{#35bf28}+0.42\%$
test_values[generalized_advantage_estimate-True-True] 20.8126ms 20.3204ms 49.2115 Ops/s 49.4033 Ops/s $\color{#d91a1a}-0.39\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1500s 3.9249ms 254.7865 Ops/s 283.9569 Ops/s $\textbf{\color{#d91a1a}-10.27\%}$
test_values[td0_return_estimate-False-False] 0.1080ms 85.5092μs 11.6946 KOps/s 11.9019 KOps/s $\color{#d91a1a}-1.74\%$
test_values[td1_return_estimate-False-False] 49.7469ms 48.6738ms 20.5449 Ops/s 20.6547 Ops/s $\color{#d91a1a}-0.53\%$
test_values[vec_td1_return_estimate-False-False] 1.3400ms 1.1002ms 908.9579 Ops/s 907.6572 Ops/s $\color{#35bf28}+0.14\%$
test_values[td_lambda_return_estimate-True-False] 82.3793ms 80.4155ms 12.4354 Ops/s 12.7887 Ops/s $\color{#d91a1a}-2.76\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3053ms 1.0955ms 912.7861 Ops/s 919.4670 Ops/s $\color{#d91a1a}-0.73\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.9860ms 20.6151ms 48.5080 Ops/s 49.6213 Ops/s $\color{#d91a1a}-2.24\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0574ms 0.7624ms 1.3117 KOps/s 1.3244 KOps/s $\color{#d91a1a}-0.96\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7277ms 0.6845ms 1.4609 KOps/s 1.4757 KOps/s $\color{#d91a1a}-1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5308ms 1.4937ms 669.4948 Ops/s 672.8356 Ops/s $\color{#d91a1a}-0.50\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7544ms 0.6979ms 1.4329 KOps/s 1.4368 KOps/s $\color{#d91a1a}-0.27\%$
test_dqn_speed[False-None] 1.6977ms 1.6044ms 623.2677 Ops/s 621.6447 Ops/s $\color{#35bf28}+0.26\%$
test_dqn_speed[False-backward] 2.3122ms 2.2628ms 441.9258 Ops/s 456.4573 Ops/s $\color{#d91a1a}-3.18\%$
test_dqn_speed[True-None] 1.2572ms 0.6149ms 1.6264 KOps/s 1.6221 KOps/s $\color{#35bf28}+0.27\%$
test_dqn_speed[True-backward] 1.2692ms 1.1824ms 845.7433 Ops/s 838.8430 Ops/s $\color{#35bf28}+0.82\%$
test_dqn_speed[reduce-overhead-None] 0.7626ms 0.6310ms 1.5849 KOps/s 1.5804 KOps/s $\color{#35bf28}+0.29\%$
test_ddpg_speed[False-None] 3.5690ms 3.0455ms 328.3494 Ops/s 327.8944 Ops/s $\color{#35bf28}+0.14\%$
test_ddpg_speed[False-backward] 4.9185ms 4.3470ms 230.0451 Ops/s 228.8403 Ops/s $\color{#35bf28}+0.53\%$
test_ddpg_speed[True-None] 1.5313ms 1.4119ms 708.2410 Ops/s 701.2790 Ops/s $\color{#35bf28}+0.99\%$
test_ddpg_speed[True-backward] 2.6950ms 2.4776ms 403.6234 Ops/s 396.8809 Ops/s $\color{#35bf28}+1.70\%$
test_ddpg_speed[reduce-overhead-None] 1.5848ms 1.4007ms 713.9299 Ops/s 698.7423 Ops/s $\color{#35bf28}+2.17\%$
test_sac_speed[False-None] 8.9945ms 8.5516ms 116.9373 Ops/s 114.9701 Ops/s $\color{#35bf28}+1.71\%$
test_sac_speed[False-backward] 12.1354ms 11.5801ms 86.3547 Ops/s 85.9022 Ops/s $\color{#35bf28}+0.53\%$
test_sac_speed[True-None] 2.4973ms 1.9771ms 505.7853 Ops/s 508.3101 Ops/s $\color{#d91a1a}-0.50\%$
test_sac_speed[True-backward] 3.7742ms 3.6458ms 274.2865 Ops/s 261.1211 Ops/s $\textbf{\color{#35bf28}+5.04\%}$
test_sac_speed[reduce-overhead-None] 16.4953ms 10.1071ms 98.9400 Ops/s 99.3396 Ops/s $\color{#d91a1a}-0.40\%$
test_redq_deprec_speed[False-None] 10.4060ms 9.5835ms 104.3464 Ops/s 102.9787 Ops/s $\color{#35bf28}+1.33\%$
test_redq_deprec_speed[False-backward] 13.1829ms 12.6736ms 78.9042 Ops/s 76.8434 Ops/s $\color{#35bf28}+2.68\%$
test_redq_deprec_speed[True-None] 2.9217ms 2.7619ms 362.0726 Ops/s 356.3633 Ops/s $\color{#35bf28}+1.60\%$
test_redq_deprec_speed[True-backward] 4.5174ms 4.3540ms 229.6764 Ops/s 219.3762 Ops/s $\color{#35bf28}+4.70\%$
test_redq_deprec_speed[reduce-overhead-None] 14.6292ms 9.7311ms 102.7630 Ops/s 102.7118 Ops/s $\color{#35bf28}+0.05\%$
test_td3_speed[False-None] 8.4930ms 8.3634ms 119.5682 Ops/s 118.5700 Ops/s $\color{#35bf28}+0.84\%$
test_td3_speed[False-backward] 11.1974ms 10.7638ms 92.9038 Ops/s 90.4225 Ops/s $\color{#35bf28}+2.74\%$
test_td3_speed[True-None] 1.7916ms 1.7440ms 573.3781 Ops/s 572.7710 Ops/s $\color{#35bf28}+0.11\%$
test_td3_speed[True-backward] 3.3498ms 3.1732ms 315.1381 Ops/s 295.5423 Ops/s $\textbf{\color{#35bf28}+6.63\%}$
test_td3_speed[reduce-overhead-None] 50.4621ms 25.9819ms 38.4883 Ops/s 38.3618 Ops/s $\color{#35bf28}+0.33\%$
test_cql_speed[False-None] 18.1022ms 17.8117ms 56.1430 Ops/s 55.8564 Ops/s $\color{#35bf28}+0.51\%$
test_cql_speed[False-backward] 23.4940ms 23.1109ms 43.2695 Ops/s 43.3277 Ops/s $\color{#d91a1a}-0.13\%$
test_cql_speed[True-None] 4.0430ms 3.5388ms 282.5811 Ops/s 285.3158 Ops/s $\color{#d91a1a}-0.96\%$
test_cql_speed[True-backward] 6.2790ms 5.7501ms 173.9088 Ops/s 166.8515 Ops/s $\color{#35bf28}+4.23\%$
test_cql_speed[reduce-overhead-None] 18.3473ms 12.1204ms 82.5057 Ops/s 83.0968 Ops/s $\color{#d91a1a}-0.71\%$
test_a2c_speed[False-None] 3.6100ms 3.3629ms 297.3633 Ops/s 296.7724 Ops/s $\color{#35bf28}+0.20\%$
test_a2c_speed[False-backward] 6.7370ms 6.2752ms 159.3572 Ops/s 152.3635 Ops/s $\color{#35bf28}+4.59\%$
test_a2c_speed[True-None] 1.7067ms 1.5075ms 663.3458 Ops/s 682.5532 Ops/s $\color{#d91a1a}-2.81\%$
test_a2c_speed[True-backward] 3.6425ms 3.2329ms 309.3216 Ops/s 297.7655 Ops/s $\color{#35bf28}+3.88\%$
test_a2c_speed[reduce-overhead-None] 1.3474ms 1.1012ms 908.1052 Ops/s 878.8285 Ops/s $\color{#35bf28}+3.33\%$
test_ppo_speed[False-None] 4.2188ms 4.0014ms 249.9118 Ops/s 246.7424 Ops/s $\color{#35bf28}+1.28\%$
test_ppo_speed[False-backward] 7.6981ms 7.1438ms 139.9809 Ops/s 132.5973 Ops/s $\textbf{\color{#35bf28}+5.57\%}$
test_ppo_speed[True-None] 1.7283ms 1.6137ms 619.6832 Ops/s 610.3851 Ops/s $\color{#35bf28}+1.52\%$
test_ppo_speed[True-backward] 3.9015ms 3.3925ms 294.7659 Ops/s 281.1383 Ops/s $\color{#35bf28}+4.85\%$
test_ppo_speed[reduce-overhead-None] 1.7439ms 1.1631ms 859.7872 Ops/s 833.0794 Ops/s $\color{#35bf28}+3.21\%$
test_reinforce_speed[False-None] 2.5821ms 2.4264ms 412.1297 Ops/s 415.3764 Ops/s $\color{#d91a1a}-0.78\%$
test_reinforce_speed[False-backward] 3.8498ms 3.4452ms 290.2597 Ops/s 278.8834 Ops/s $\color{#35bf28}+4.08\%$
test_reinforce_speed[True-None] 1.5684ms 1.4778ms 676.6671 Ops/s 683.2789 Ops/s $\color{#d91a1a}-0.97\%$
test_reinforce_speed[True-backward] 3.2796ms 3.1968ms 312.8145 Ops/s 294.4772 Ops/s $\textbf{\color{#35bf28}+6.23\%}$
test_reinforce_speed[reduce-overhead-None] 15.4758ms 8.8847ms 112.5528 Ops/s 114.5915 Ops/s $\color{#d91a1a}-1.78\%$
test_iql_speed[False-None] 10.2483ms 9.7879ms 102.1669 Ops/s 102.0706 Ops/s $\color{#35bf28}+0.09\%$
test_iql_speed[False-backward] 0.5631s 24.5130ms 40.7946 Ops/s 72.4051 Ops/s $\textbf{\color{#d91a1a}-43.66\%}$
test_iql_speed[True-None] 2.4743ms 2.3655ms 422.7524 Ops/s 419.0706 Ops/s $\color{#35bf28}+0.88\%$
test_iql_speed[True-backward] 5.0488ms 5.0047ms 199.8130 Ops/s 197.7515 Ops/s $\color{#35bf28}+1.04\%$
test_iql_speed[reduce-overhead-None] 16.9042ms 10.1909ms 98.1267 Ops/s 98.1986 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.6994ms 5.9789ms 167.2559 Ops/s 166.3546 Ops/s $\color{#35bf28}+0.54\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6377ms 0.3801ms 2.6310 KOps/s 2.9395 KOps/s $\textbf{\color{#d91a1a}-10.49\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8740ms 0.3649ms 2.7402 KOps/s 3.1130 KOps/s $\textbf{\color{#d91a1a}-11.98\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0273ms 5.8048ms 172.2718 Ops/s 170.2381 Ops/s $\color{#35bf28}+1.19\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7655ms 0.2903ms 3.4447 KOps/s 3.1889 KOps/s $\textbf{\color{#35bf28}+8.02\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5779ms 0.2713ms 3.6856 KOps/s 3.1720 KOps/s $\textbf{\color{#35bf28}+16.19\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5050ms 1.3043ms 766.6699 Ops/s 755.2352 Ops/s $\color{#35bf28}+1.51\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6603ms 1.3631ms 733.6302 Ops/s 812.1610 Ops/s $\textbf{\color{#d91a1a}-9.67\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.4813ms 6.1493ms 162.6214 Ops/s 165.4090 Ops/s $\color{#d91a1a}-1.69\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0419ms 0.4486ms 2.2294 KOps/s 1.8761 KOps/s $\textbf{\color{#35bf28}+18.83\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7482ms 0.5470ms 1.8280 KOps/s 1.9730 KOps/s $\textbf{\color{#d91a1a}-7.35\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8885ms 5.7842ms 172.8849 Ops/s 169.0694 Ops/s $\color{#35bf28}+2.26\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.8452ms 0.3752ms 2.6651 KOps/s 2.7373 KOps/s $\color{#d91a1a}-2.64\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6126ms 0.3571ms 2.8004 KOps/s 2.8612 KOps/s $\color{#d91a1a}-2.13\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0892ms 5.7731ms 173.2163 Ops/s 171.6735 Ops/s $\color{#35bf28}+0.90\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7328ms 0.2921ms 3.4237 KOps/s 3.2017 KOps/s $\textbf{\color{#35bf28}+6.93\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6432ms 0.2747ms 3.6409 KOps/s 3.5647 KOps/s $\color{#35bf28}+2.14\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4383ms 6.0380ms 165.6169 Ops/s 165.2917 Ops/s $\color{#35bf28}+0.20\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9914ms 0.4961ms 2.0156 KOps/s 1.8103 KOps/s $\textbf{\color{#35bf28}+11.34\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7016ms 0.4889ms 2.0452 KOps/s 1.9438 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.7298s 19.5884ms 51.0506 Ops/s 34.0587 Ops/s $\textbf{\color{#35bf28}+49.89\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.9945ms 1.9301ms 518.1195 Ops/s 530.3645 Ops/s $\color{#d91a1a}-2.31\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.7239ms 1.1929ms 838.3150 Ops/s 997.5489 Ops/s $\textbf{\color{#d91a1a}-15.96\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.3128ms 5.0974ms 196.1783 Ops/s 192.2063 Ops/s $\color{#35bf28}+2.07\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.1599ms 2.0493ms 487.9648 Ops/s 478.4194 Ops/s $\color{#35bf28}+2.00\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.6547ms 1.0549ms 947.9822 Ops/s 987.4034 Ops/s $\color{#d91a1a}-3.99\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.3363ms 5.2910ms 188.9986 Ops/s 187.3823 Ops/s $\color{#35bf28}+0.86\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.4930ms 2.1944ms 455.7137 Ops/s 497.8474 Ops/s $\textbf{\color{#d91a1a}-8.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.7737ms 1.1977ms 834.9189 Ops/s 833.0594 Ops/s $\color{#35bf28}+0.22\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 45.1872ms 39.8857ms 25.0717 Ops/s 25.0248 Ops/s $\color{#35bf28}+0.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 0.6986s 32.1320ms 31.1216 Ops/s 53.6556 Ops/s $\textbf{\color{#d91a1a}-42.00\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.1064ms 41.3476ms 24.1852 Ops/s 23.7914 Ops/s $\color{#35bf28}+1.66\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 21.0462ms 19.3889ms 51.5760 Ops/s 52.0733 Ops/s $\color{#d91a1a}-0.96\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 46.4954ms 44.3287ms 22.5587 Ops/s 22.8979 Ops/s $\color{#d91a1a}-1.48\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.3072ms 20.6907ms 48.3310 Ops/s 47.9671 Ops/s $\color{#35bf28}+0.76\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9138ms 0.2214ms 4.5170 KOps/s 4.2399 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.6353ms 1.4639ms 683.0922 Ops/s 662.4789 Ops/s $\color{#35bf28}+3.11\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7499ms 2.3730ms 421.4123 Ops/s 392.1971 Ops/s $\textbf{\color{#35bf28}+7.45\%}$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.3076ms 3.0673ms 326.0176 Ops/s 321.4083 Ops/s $\color{#35bf28}+1.43\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2628ms 0.1719ms 5.8169 KOps/s 5.8950 KOps/s $\color{#d91a1a}-1.33\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3657ms 0.2268ms 4.4090 KOps/s 3.7344 KOps/s $\textbf{\color{#35bf28}+18.06\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1577ms 1.9078ms 524.1677 Ops/s 531.5343 Ops/s $\color{#d91a1a}-1.39\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6014ms 1.4094ms 709.5325 Ops/s 703.8720 Ops/s $\color{#35bf28}+0.80\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3416ms 1.1615ms 860.9629 Ops/s 847.9789 Ops/s $\color{#35bf28}+1.53\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8299ms 3.6677ms 272.6505 Ops/s 263.8922 Ops/s $\color{#35bf28}+3.32\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.1421ms 5.9304ms 168.6218 Ops/s 167.1376 Ops/s $\color{#35bf28}+0.89\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.8256ms 7.2896ms 137.1820 Ops/s 135.9169 Ops/s $\color{#35bf28}+0.93\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4592ms 0.2782ms 3.5949 KOps/s 3.4339 KOps/s $\color{#35bf28}+4.69\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7347ms 1.5848ms 630.9845 Ops/s 619.1480 Ops/s $\color{#35bf28}+1.91\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.9889ms 2.5345ms 394.5584 Ops/s 375.1797 Ops/s $\textbf{\color{#35bf28}+5.17\%}$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.6505ms 3.2734ms 305.4944 Ops/s 300.8811 Ops/s $\color{#35bf28}+1.53\%$
test_collector_without_rb[100-img_shape0-atari] 34.6266ms 33.5696ms 29.7889 Ops/s 29.5283 Ops/s $\color{#35bf28}+0.88\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.7491ms 66.5166ms 15.0338 Ops/s 15.0158 Ops/s $\color{#35bf28}+0.12\%$
test_collector_with_rb[100-img_shape0-atari] 39.5303ms 38.4213ms 26.0272 Ops/s 25.7734 Ops/s $\color{#35bf28}+0.98\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.6613ms 74.8202ms 13.3654 Ops/s 12.9654 Ops/s $\color{#35bf28}+3.08\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.5358ms 57.5084ms 17.3888 Ops/s 17.3413 Ops/s $\color{#35bf28}+0.27\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1155s 0.1123s 8.9071 Ops/s 8.7252 Ops/s $\color{#35bf28}+2.08\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 59.2432ms 58.1435ms 17.1988 Ops/s 17.0088 Ops/s $\color{#35bf28}+1.12\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1178s 0.1162s 8.6031 Ops/s 8.3942 Ops/s $\color{#35bf28}+2.49\%$

Comment thread torchrl/envs/common.py
Comment on lines +382 to +385
_trust_step_output (bool): if ``True``, :meth:`step` will skip the :meth:`_step_proc_data`
validation (reward shape checks, done-key completion, type checks) after :meth:`_step`.
Set this when the environment guarantees that its :meth:`_step` output always has correct
shapes, all done keys present, and proper dtypes. Defaults to ``False``.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a footgun. I'm kind of ok with it but looking at other PRs down the stack we MUST make sure that all the side effects are super documented, like lack of partial steps support (which require control flow) and such. This should also be marked with a massive "Experimental" flag for anyone who wants to play with it.

[ghstack-poisoned]
@vmoens vmoens closed this Apr 11, 2026
@vmoens vmoens deleted the gh/vmoens/243/head branch April 20, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance Performance issue or suggestion for improvement Transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant