Skip to content

Strategy for High-Fragmentation Tokenization (Yoruba) in HookedTransformer? #1165

@saaga23

Description

@saaga23

I am currently setting up a safety steering experiment for African languages (Yoruba) using HookedTransformer with GPT-2-medium.

I've observed that tonal characters (e.g., 'ọ' in 'Atọwọda') trigger extreme fragmentation compared to Latin scripts.

Example:
Input: "Oye Atọwọda"
Output: ['O', 'ye', 'ĠAt', 'á', '»', 'į', 'w', 'á', '»', 'į', 'da']

The single word Atọwọda is split into 9 tokens, mostly byte-level fallbacks.

Question:
For activation patching, this makes it difficult to isolate the "semantic" token. Is there a recommended heuristic in TransformerLens for pooling activations across these fragmented byte-tokens (e.g. taking the mean of the byte-span)? Or is the standard practice to simply ignore the byte-level noise?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions