[NeurIPS 2025] DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park¹, Jeehye Na¹, Jinyoung Kim², Hyunwoo J. Kim¹

¹KAIST ²Korea University

📝Data

We use SEED-Bench-R1 data for optimizing our model. SEED-Bench-R1 consists of a large-scale training set and a hierarchical three-level validation set for in-distribution, cross-environment, and cross-environment-task evaluations. The datasets can be downloaded from HuggingFace.

Specifically, SEED-Bench-R1 is built on our prior works, reusing the training and validation data from our EgoPlan-Bench, as well as the test data from our EgoPlan-Bench2. The validation data from EgoPlan-Bench are used for Level-1 (in-distribution) and Level-2 (OOD, cross-environment) evaluation, while the test data from EgoPlan-Bench2 cover more general domains and are used for Level-3 (OOD, cross-environment-task) evaluation.

🔥Training Models

The training code is based on SEED-Bench-R1 and Video-R1. The training commands below are configured for a node of 4 GPUs. For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.

Post-training Qwen2.5-VL-3B-Instruct

To run Reg-GRPO on Qwen2.5-VL-3B with reggrpo.sh:

bash scripts/reggrpo.sh

Citations

@inproceedings{park2025deepvideo,
  title={DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO},
  author={Park, Jinyoung and Na, Jeehye and Kim, Jinyoung and Kim, Hyunwoo J},
  booktitle={NeurIPS},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
qwen-vl-utils		qwen-vl-utils
scripts		scripts
src		src
Makefile		Makefile
README.md		README.md
infer_vllm.py		infer_vllm.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2025] DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

📝Data

🔥Training Models

Post-training Qwen2.5-VL-3B-Instruct

Citations

About

Uh oh!

Releases

Packages

Languages

mlvlab/DeepVideoR1

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2025] DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

📝Data

🔥Training Models

Post-training Qwen2.5-VL-3B-Instruct

Citations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages