Open
Conversation
Implement runtime detection of ARM SVE and SVE2 CPU capabilities, similar to the existing BMI2 runtime detection for x86-64. Changes: - Add ARM CPU feature detection in lib/common/cpu.h using platform-specific APIs (getauxval on Linux/Android, disabled on macOS/Windows) - Add DYNAMIC_SVE and DYNAMIC_SVE2 macros in portability_macros.h - Add SVE2_TARGET_ATTRIBUTE for selective function compilation - Add sve2 field to compression context (ZSTD_CCtx) - Update histogram functions to support dynamic SVE2 dispatch - Explicitly disable SVE/SVE2 on Apple platforms (not supported) Platform support: - Linux/Android aarch64: Full runtime detection via getauxval() - Apple platforms: Disabled (Apple Silicon doesn't support SVE/SVE2) - Windows on ARM: Placeholder (API not yet available) Benefits: - Enables SVE2 optimizations on capable hardware without requiring build-time flags - Zero overhead on non-SVE2 systems - Expected 2-3x speedup in histogram counting on SVE2-capable CPUs (AWS Graviton4, Ampere AmpereOne) Note: Currently only SVE2 optimizations exist. CPUs with SVE but not SVE2 (e.g., Fujitsu A64FX) could benefit from future SVE-only implementations.
Andarwinux
reviewed
Jan 25, 2026
lib/common/cpu.h
Outdated
|
|
||
| #elif defined(_WIN32) | ||
| /* Windows on ARM - use IsProcessorFeaturePresent() */ | ||
| /* Note: As of 2024, Windows on ARM doesn't expose SVE/SVE2 through this API */ |
There was a problem hiding this comment.
Author
|
Thanks @Andarwinux for pointing out the mingw-w64 header! I've updated the code in commit 1bf764f to use IsProcessorFeaturePresent() with PF_ARM_SVE_INSTRUCTIONS_AVAILABLE (46) and PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE (47) for proper Windows ARM SVE/SVE2 detection. |
Author
|
@Cyan4973 you might be interested as you previously worked on related SVE2 PRs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Enable deployment of a single zstd binary across heterogeneous ARM fleets with varying CPU capabilities. This is particularly important for cloud deployments where applications run across multiple instance types:
Currently, to leverage SVE2 optimizations, you must compile with
-march=neoverse-v2or similar flags, which produces binaries that won't run on older processors. This forces users to either:This PR implements runtime CPU feature detection, similar to the existing BMI2 support on x86-64, allowing a single binary compiled for Neoverse N1 baseline (
-mcpu=neoverse-n1) to automatically use SVE2 optimizations when available.Changes
This PR adds runtime ARM SVE2 detection infrastructure:
Core Infrastructure
lib/common/cpu.h): Platform-specific detection viagetauxval()on Linux/Android,IsProcessorFeaturePresent()on Windowslib/common/portability_macros.h):DYNAMIC_SVE2macro to enable runtime dispatchlib/common/compiler.h):SVE2_TARGET_ATTRIBUTEfor selective function compilationlib/compress/zstd_compress.c): Detect SVE2 once per compression contextPlatform Support
getauxval()IsProcessorFeaturePresent()Recommended Flags
Benchmarking on Graviton 4:
Overhead
Zero overhead on non-SVE2 systems:
Related
This follows the same pattern as the existing x86-64 BMI2 runtime detection, extending it to ARM architectures.
See also:
#4440
#4429
#4418
#4414
#4413
#4411