Skip to content

Remove spurious warning for tokenize_and_concatenate#1177

Merged
jlarson4 merged 6 commits intoTransformerLensOrg:devfrom
evcyen:fix/1134-tokenize-and-concatenate-warning
Feb 20, 2026
Merged

Remove spurious warning for tokenize_and_concatenate#1177
jlarson4 merged 6 commits intoTransformerLensOrg:devfrom
evcyen:fix/1134-tokenize-and-concatenate-warning

Conversation

@evcyen
Copy link

@evcyen evcyen commented Feb 19, 2026

Description

Summary: Suppress the spurious "Token indices sequence length is longer than the specified maximum sequence length for this model" warning when using tokenize_and_concatenate with long text and a tokenizer that has a model_max_length.

Context: tokenize_and_concatenate splits text into chunks, tokenizes them in parallel (with padding), then strips padding and reshapes into fixed-length sequences of size max_length. Some intermediate chunks can tokenize to longer than the tokenizer's model_max_length. The Hugging Face tokenizer warns when it produces any sequence longer than that, but TransformerLens ultimately never feeds those long sequences to the model. So the warning is misleading in this use case.

Change: In transformer_lens/utils.py, we temporarily set tokenizer.deprecation_warnings["sequence-length-is-longer-than-the-specified-maximum"] = False before the tokenization step and restore the original value in a finally block so the tokenizer is not left in a modified state. We only touch this when the tokenizer has a deprecation_warnings dict (guarded with hasattr / isinstance).

Fixes #1134

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@jlarson4 jlarson4 changed the base branch from main to dev February 19, 2026 02:57
@jlarson4 jlarson4 merged commit a494811 into TransformerLensOrg:dev Feb 20, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Report] tokenize_and_concatenate issues spurious warnings

2 participants

Comments