Remove spurious warning for tokenize_and_concatenate by evcyen · Pull Request #1177 · TransformerLensOrg/TransformerLens

evcyen · 2026-02-19T01:53:52Z

Description

Summary: Suppress the spurious "Token indices sequence length is longer than the specified maximum sequence length for this model" warning when using tokenize_and_concatenate with long text and a tokenizer that has a model_max_length.

Context: tokenize_and_concatenate splits text into chunks, tokenizes them in parallel (with padding), then strips padding and reshapes into fixed-length sequences of size max_length. Some intermediate chunks can tokenize to longer than the tokenizer's model_max_length. The Hugging Face tokenizer warns when it produces any sequence longer than that, but TransformerLens ultimately never feeds those long sequences to the model. So the warning is misleading in this use case.

Change: In transformer_lens/utils.py, we temporarily set tokenizer.deprecation_warnings["sequence-length-is-longer-than-the-specified-maximum"] = False before the tokenization step and restore the original value in a finally block so the tokenizer is not left in a modified state. We only touch this when the tokenizer has a deprecation_warnings dict (guarded with hasattr / isinstance).

Fixes #1134

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

evcyen added 3 commits February 18, 2026 20:44

set sequence-length-is-longer-than-the-specified-maximum to false

d849e38

add test

61fccf5

fix test

4610358

jlarson4 changed the base branch from main to dev February 19, 2026 02:57

evcyen and others added 3 commits February 19, 2026 16:53

Merge branch 'dev' into fix/1134-tokenize-and-concatenate-warning

6f1a72a

fix tests

34d1d33

fix formatting

86a9467

jlarson4 merged commit a494811 into TransformerLensOrg:dev Feb 20, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove spurious warning for tokenize_and_concatenate#1177

Remove spurious warning for tokenize_and_concatenate#1177
jlarson4 merged 6 commits intoTransformerLensOrg:devfrom
evcyen:fix/1134-tokenize-and-concatenate-warning

evcyen commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

evcyen commented Feb 19, 2026

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments