Skip to content

Error message "Bad memory unallocation" when cores exceed NUM_THREADS #5639

@sitsofe

Description

@sitsofe

When using PyTorch 2.10.0 on an Aarch64 system with more than 128 cores, running the Hugging Face THUDM/CogVideoX-2b model causes PyTorch's built in OpenBLAS (apparently v0.3.30) to print out "Bad memory unallocation!" warnings:

OpenBLAS warning: precompiled NUM_THREADS exceeded, adding auxiliary array for thread metadata.
To avoid this warning, please rebuild your copy of OpenBLAS with a larger NUM_THREADS setting
or set the environment variable OPENBLAS_NUM_THREADS to 128 or lower
[...]
BLAS : Bad memory unallocation! :  768  0xf52c94000000
BLAS : Bad memory unallocation! :  768  0xf52c78000000
BLAS : Bad memory unallocation! :  768  0xf52c80000000
BLAS : Bad memory unallocation! :  768  0xf52c8e000000
BLAS : Bad memory unallocation! :  768  0xf52c8a000000
BLAS : Bad memory unallocation! :  768  0xf52c88000000
[...]

The problem has also been reproduced with OpenBLAS v0.2.20-7494-g986ba2949. All warnings go away if the threads are restricted with OMP_NUM_THREADS=128 or OpenBLAS is recompiled with NUM_THREADS=256.

The following is a script that reproduces the problem (but note it downloads over 14GiB of data before it runs):

#!/usr/bin/env python3
#
# apt update
# apt install python3-pip python3-venv
# python3 -m venv venv
# . venv/bin/activate
# pip install accelerate diffusers protobuf sentencepiece tiktoken torch transformers
# # WARNING: This script needs to download over 14GiB of model data!
# OMP_NUM_THREADS=140 python3 openblas-bad-memory-unallocation.py
#
# If you want to use a custom OpenBLAS:
# git clone https://github.com/OpenMathLib/OpenBLAS.git
# cd OpenBLAS
# git describe
# v0.2.20-7494-g986ba2949
# make -j 8 NUM_THREADS=128 USE_OPENMP=1 NO_SHARED=0 DYNAMIC_ARCH=1 TARGET=ARMV8 CFLAGS=-O3 BUILD_BFLOAT16=1
# cd ../
# OMP_NUM_THREADS=140 LD_PRELOAD=OpenBLAS/libopenblas.so python3 openblas-bad-memory-unallocation.py

import torch
from diffusers import CogVideoXPipeline

def main():
    # From https://huggingface.co/zai-org/CogVideoX-2b
    model_name = 'THUDM/CogVideoX-2b'
    
    # Create main processing pipeline
    print("Creating pipeline...")
    pipe = (CogVideoXPipeline.from_pretrained(
        model_name,
        torch_dtype=torch.float32,
    ).to('cpu'))
    
    # Configure pipeline
    print("Configuring pipeline...")
    generator = torch.Generator().manual_seed(42)
    
    # Warmup
    print("Warmup...")
    # The default model (CogVideoX-2b) specifically says it only works with 720x480
    pipe(prompt="warmup", generator=generator, num_inference_steps=1, num_frames=1)

    print("Finished!")

if __name__ == '__main__':
    main()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions