-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
When using PyTorch 2.10.0 on an Aarch64 system with more than 128 cores, running the Hugging Face THUDM/CogVideoX-2b model causes PyTorch's built in OpenBLAS (apparently v0.3.30) to print out "Bad memory unallocation!" warnings:
OpenBLAS warning: precompiled NUM_THREADS exceeded, adding auxiliary array for thread metadata.
To avoid this warning, please rebuild your copy of OpenBLAS with a larger NUM_THREADS setting
or set the environment variable OPENBLAS_NUM_THREADS to 128 or lower
[...]
BLAS : Bad memory unallocation! : 768 0xf52c94000000
BLAS : Bad memory unallocation! : 768 0xf52c78000000
BLAS : Bad memory unallocation! : 768 0xf52c80000000
BLAS : Bad memory unallocation! : 768 0xf52c8e000000
BLAS : Bad memory unallocation! : 768 0xf52c8a000000
BLAS : Bad memory unallocation! : 768 0xf52c88000000
[...]
The problem has also been reproduced with OpenBLAS v0.2.20-7494-g986ba2949. All warnings go away if the threads are restricted with OMP_NUM_THREADS=128 or OpenBLAS is recompiled with NUM_THREADS=256.
The following is a script that reproduces the problem (but note it downloads over 14GiB of data before it runs):
#!/usr/bin/env python3
#
# apt update
# apt install python3-pip python3-venv
# python3 -m venv venv
# . venv/bin/activate
# pip install accelerate diffusers protobuf sentencepiece tiktoken torch transformers
# # WARNING: This script needs to download over 14GiB of model data!
# OMP_NUM_THREADS=140 python3 openblas-bad-memory-unallocation.py
#
# If you want to use a custom OpenBLAS:
# git clone https://github.com/OpenMathLib/OpenBLAS.git
# cd OpenBLAS
# git describe
# v0.2.20-7494-g986ba2949
# make -j 8 NUM_THREADS=128 USE_OPENMP=1 NO_SHARED=0 DYNAMIC_ARCH=1 TARGET=ARMV8 CFLAGS=-O3 BUILD_BFLOAT16=1
# cd ../
# OMP_NUM_THREADS=140 LD_PRELOAD=OpenBLAS/libopenblas.so python3 openblas-bad-memory-unallocation.py
import torch
from diffusers import CogVideoXPipeline
def main():
# From https://huggingface.co/zai-org/CogVideoX-2b
model_name = 'THUDM/CogVideoX-2b'
# Create main processing pipeline
print("Creating pipeline...")
pipe = (CogVideoXPipeline.from_pretrained(
model_name,
torch_dtype=torch.float32,
).to('cpu'))
# Configure pipeline
print("Configuring pipeline...")
generator = torch.Generator().manual_seed(42)
# Warmup
print("Warmup...")
# The default model (CogVideoX-2b) specifically says it only works with 720x480
pipe(prompt="warmup", generator=generator, num_inference_steps=1, num_frames=1)
print("Finished!")
if __name__ == '__main__':
main()Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels