Skip to content

[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"

Notifications You must be signed in to change notification settings

mlvlab/DeepVideoR1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NeurIPS 2025] DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Jinyoung Park1, Jeehye Na1, Jinyoung Kim2, Hyunwoo J. Kim1

1KAIST 2Korea University

📝Data

We use SEED-Bench-R1 data for optimizing our model. SEED-Bench-R1 consists of a large-scale training set and a hierarchical three-level validation set for in-distribution, cross-environment, and cross-environment-task evaluations. The datasets can be downloaded from HuggingFace.

Specifically, SEED-Bench-R1 is built on our prior works, reusing the training and validation data from our EgoPlan-Bench, as well as the test data from our EgoPlan-Bench2. The validation data from EgoPlan-Bench are used for Level-1 (in-distribution) and Level-2 (OOD, cross-environment) evaluation, while the test data from EgoPlan-Bench2 cover more general domains and are used for Level-3 (OOD, cross-environment-task) evaluation.

🔥Training Models

The training code is based on SEED-Bench-R1 and Video-R1. The training commands below are configured for a node of 4 GPUs. For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.

Post-training Qwen2.5-VL-3B-Instruct

  • To run Reg-GRPO on Qwen2.5-VL-3B with reggrpo.sh:
bash scripts/reggrpo.sh

Citations

@inproceedings{park2025deepvideo,
  title={DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO},
  author={Park, Jinyoung and Na, Jeehye and Kim, Jinyoung and Kim, Hyunwoo J},
  booktitle={NeurIPS},
  year={2025}
}

Releases

No releases published

Packages

No packages published