Jinyoung Park1, Jeehye Na1, Jinyoung Kim2, Hyunwoo J. Kim1
1KAIST 2Korea University
We use SEED-Bench-R1 data for optimizing our model. SEED-Bench-R1 consists of a large-scale training set and a hierarchical three-level validation set for in-distribution, cross-environment, and cross-environment-task evaluations. The datasets can be downloaded from HuggingFace.
Specifically, SEED-Bench-R1 is built on our prior works, reusing the training and validation data from our EgoPlan-Bench, as well as the test data from our EgoPlan-Bench2. The validation data from EgoPlan-Bench are used for Level-1 (in-distribution) and Level-2 (OOD, cross-environment) evaluation, while the test data from EgoPlan-Bench2 cover more general domains and are used for Level-3 (OOD, cross-environment-task) evaluation.
The training code is based on SEED-Bench-R1 and Video-R1. The training commands below are configured for a node of 4 GPUs. For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.
- To run Reg-GRPO on Qwen2.5-VL-3B with reggrpo.sh:
bash scripts/reggrpo.sh
@inproceedings{park2025deepvideo,
title={DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO},
author={Park, Jinyoung and Na, Jeehye and Kim, Jinyoung and Kim, Hyunwoo J},
booktitle={NeurIPS},
year={2025}
}