Fix mRoPE position ID crash on Qwen2-VL prompt truncation#482
Open
Mr-Neutr0n wants to merge 1 commit intomicrosoft:mainfrom
Open
Fix mRoPE position ID crash on Qwen2-VL prompt truncation#482Mr-Neutr0n wants to merge 1 commit intomicrosoft:mainfrom
Mr-Neutr0n wants to merge 1 commit intomicrosoft:mainfrom
Conversation
When training Qwen2.5-VL with agent-lightning + verl, prompt truncation changes the token count but image_grid_thw is computed from the original (untruncated) image_urls. This causes get_rope_index to fail with a shape mismatch because it finds fewer image tokens in the truncated input_ids than entries in image_grid_thw. After prompt truncation, count remaining image regions in the truncated token sequence and slice image_urls to match before computing image_grid_thw, ensuring consistency between the token content and the mRoPE spatial metadata. Fixes microsoft#441
bdd1c8d to
ca0be5a
Compare
Author
|
Friendly bump! Let me know if there's anything I should update or improve to help move this forward. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #441
When training Qwen2.5-VL with agent-lightning + verl, the model crashes in
get_rope_indexwith a shape mismatch:fails because
llm_positionslength differs from the attention mask true-count.Root cause: In
get_train_data_batch, prompt truncation (prompt_ids[:max_prompt_length]) changes the token count, potentially removing image placeholder tokens. However,image_grid_thwis computed from the original (untruncated)image_urlslist. Whenget_rope_indexprocesses the truncated sequence, it finds fewer<|vision_start|><|image_pad|>regions thanimage_grid_thwentries, causing the position ID length to diverge from the attention mask count.Fix: After prompt truncation, count the remaining image regions in the truncated token sequence using the same
vision_start_token_id+image_token_idpattern thatget_rope_indexuses, and sliceimage_urlsto match before computingimage_grid_thw._count_images_in_tokens()helper method to detect image regions in token sequencesimage_urlswith truncated promptsTest plan
max_prompt_lengthand contain images no longer crashes inget_rope_indexmax_prompt_lengthis unaffected (no truncation, all images retained)_use_mropeisFalse)