Strategy for High-Fragmentation Tokenization (Yoruba) in HookedTransformer?

I am currently setting up a safety steering experiment for African languages (Yoruba) using HookedTransformer with GPT-2-medium. 

I've observed that tonal characters (e.g., 'ọ' in 'Atọwọda') trigger extreme fragmentation compared to Latin scripts.

**Example:**
Input: "Oye Atọwọda"
Output: ['O', 'ye', 'ĠAt', 'á', '»', 'į', 'w', 'á', '»', 'į', 'da']

The single word `Atọwọda` is split into 9 tokens, mostly byte-level fallbacks.

**Question:**
For activation patching, this makes it difficult to isolate the "semantic" token. Is there a recommended heuristic in TransformerLens for pooling activations across these fragmented byte-tokens (e.g. taking the mean of the byte-span)? Or is the standard practice to simply ignore the byte-level noise?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategy for High-Fragmentation Tokenization (Yoruba) in HookedTransformer? #1165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strategy for High-Fragmentation Tokenization (Yoruba) in HookedTransformer? #1165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions