Replies: 1 comment
-
|
Note that 2179989504 is 2079 MB. Maybe this error was triggered because the VRAM filled up, not because of a single allocation. The inference also needs a working buffer: if you have a ~14G model, and ~15G available VRAM, the ~1G left may not be enough for it. I suggest trying
That message doesn't really reflect
Not on sd.cpp mainline, but there is a PR that implements it: #1184 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to run qwen image on 9070xt (16 gb vram total, 15 actually available)
diffusion model:
qwen-image-Q5_K_M.gguf(14gb)llm:
Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf(6gb)This setup requires 20gb total and it seems that Vulkan impl (release b314d80)) is trying to allocate the whole 20gb in gpu vram and it fails with
ErrorOutOfDeviceMemory. I though that --offload-to-cpu would somehow help with this (I have 64gb ram, around 40 avaliable).Suspicious part is
RAM 0.00 MB- it's like there is no hint that ram can be used.Another question I have: would it be possible to use second gpu for the LLM part? I have dual setup with an older 1070 with 8gb VRAM and
Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.ggufcan easily fit there with somewhat decent speed (llama.cpp Vulkan impl outputs around 35t/s). Would it be hard to implement?Beta Was this translation helpful? Give feedback.
All reactions