I’m trying to follow/reproduce the Agentic RL part, but some training details are not fully clear from the current docs.
Would you consider open-sourcing the training code/details for the Agentic RL part to make it easier for others to reproduce and build on your results?
Thanks again for the great open-source release!
I’m trying to follow/reproduce the Agentic RL part, but some training details are not fully clear from the current docs.
Would you consider open-sourcing the training code/details for the Agentic RL part to make it easier for others to reproduce and build on your results?
Thanks again for the great open-source release!