#let's play with an implementation of grpo with self play on long episodic tasks
parthh01/rl_stuff
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
| Name | Name | Last commit date | ||
|---|---|---|---|---|
#let's play with an implementation of grpo with self play on long episodic tasks