Skip to content

z-image: use fused swiglu kernel#1302

Draft
kwsp wants to merge 1 commit intoleejet:masterfrom
kwsp:fused-swiglu
Draft

z-image: use fused swiglu kernel#1302
kwsp wants to merge 1 commit intoleejet:masterfrom
kwsp:fused-swiglu

Conversation

@kwsp
Copy link

@kwsp kwsp commented Mar 1, 2026

Use a fused kernel for the swiglu operation in the FFN instead of separate silu and mul operations to improve performance.

Timing

Test command

./build/cuda/bin/Release/sd-cli \
  --diffusion-model ~/code/ComfyUI/models/diffusion_models/z_image_turbo-Q5_K_S.gguf \
  --vae ~/code/ComfyUI/models/vae/ae.safetensors \
  --llm ~/code/ComfyUI/models/text_encoders/Qwen3-4B.i1-Q5_K_S.gguf \
  -p "A cinematic photograph of a solitary hooded figure walking through a rain-slicked city at night, neon reflections on wet asphalt, moody atmospheric" \
  --cfg-scale 1.0 --diffusion-fa -v \
  -H 1024 -W 512 \
  --steps 8 \
  --output output%03d.png

Timing on RTX 2080ti

Run Original (s) Fused SwiGLU (s)
1 8.55 8.52
2 8.60 8.58
3 8.74 8.62
4 8.67 8.65
5 8.70 8.66
Mean 8.652 8.606

Not statistically significant (p-value = 0.07), but still an improvement nevertheless.

@stduhpf
Copy link
Contributor

stduhpf commented Mar 1, 2026

Do you know if this fused OP is supported on most backends?

@daniandtheweb
Copy link
Contributor

This https://github.com/ggml-org/llama.cpp/blob/master/docs/ops.md reports swiglu as mostly supported by all the backends.

@Green-Sky
Copy link
Contributor

Do we know this is not automatically fused?

@kwsp
Copy link
Author

kwsp commented Mar 2, 2026

@Green-Sky How can I check if the ops are auto fused?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants