Add `half::f16` and `half::bf16` support by swfsql · Pull Request #1551 · rust-ndarray/ndarray

swfsql · 2025-11-27T16:26:39Z

Closes Add f16 support #1550
This is a minor implementation, given that is easy to add new float types to ndarray.
In best-effort, this adds tests/benches whenever both f32 and f64 were tested/benchmarked.
Adds a feature mention on Readme.

nilgoyette · 2025-12-01T14:43:53Z

benches/bench1.rs

+    (m127, 127, 127, 127) // ~128x slower than f32
+    (mix16x4, 32, 4, 32)
+    (mix32x2, 32, 2, 32)
+    // (mix10000, 128, 10000, 128) // too slow


You mention that f16 is slower in the issue and several times here. Is it faster on some operations? If not, why do you (and others) want to use it? Only to save space?

Yes, on my arch (x86_64) it is indeed quite slow. They mention on their docs that aarch64 has support for all operations (for half::f16), and it is possibly viable in performance for that arch -- but I haven't tested it. They also have specialized methods for storing/loading the half::{f16, bf16} data to/from other types (u16, f32, f64) which could improve performance also, but I didn't leverage those operations when including the types to ndarray (I don't really know if/how they could be leveraged).

Albeit (at least for me) it is a bit disappointing that it is slow, I find it useful for debugging fp16 models (for machine learning), given that some architectures behave poorly on fp16 and it is easier to debug them if everything is running on the cpu.
For wasm targets, some builds may not enable cpu features, and thus the performance of f32 and half::f16 should be closer -- so in that case I believe the memory savings could be meaningful, but I believe that is a niche target.

I don't know if proper simd support is possible for fp16, it appears that some work towards this is active (for the primitive f16).

With that being said, I still solicit for half::{f16, bf16} support on ndarray. That makes development and debugging smoother, even if fp16 doesn't have proper simd support -- given that the crux of the training happens on the gpus. In the future it is possible to both have simd improvements from the underlying half, or from a new addition or replacement into a primitive f16 type.
I also understand that ndarray takes performance in high regard, thus possibly opting for a delay or a non-inclusion of the fp16 types.

I just tested, and I'm seeing similarly poor performance on my aarch64 machine (an Apple M2 chip). Running cargo +nightly bench -F half mat_mul, m127 is ~346x slower for f16 than f32, and ~110x slower for bf16 than f32. My guess after some debugging is that this is due to the f32 and f64 implementations dispatching to matrixmultiply, since I can clearly see that I am hitting the f16 assembly code on my computer.

Given how small this change is to the overall library, I'm still in favor of including this code, but we may want to put in an issue with matrixmultiply to see if they are willing to support f16.

Thanks for testing and debugging this.
On aarch64, could you please try the bench again but adding the nightly feature for the half dependency, while still running it on nightly? I believe changing the half features to features = ["num-traits", "nightly"] should do it. I'm curious if this feature could bring an improvement for that arch.

akern40 · 2026-01-04T15:20:42Z

benches/bench1.rs

+    (m127, 127, 127, 127) // ~128x slower than f32
+    (mix16x4, 32, 4, 32)
+    (mix32x2, 32, 2, 32)
+    // (mix10000, 128, 10000, 128) // too slow


I just tested, and I'm seeing similarly poor performance on my aarch64 machine (an Apple M2 chip). Running cargo +nightly bench -F half mat_mul, m127 is ~346x slower for f16 than f32, and ~110x slower for bf16 than f32. My guess after some debugging is that this is due to the f32 and f64 implementations dispatching to matrixmultiply, since I can clearly see that I am hitting the f16 assembly code on my computer.

Given how small this change is to the overall library, I'm still in favor of including this code, but we may want to put in an issue with matrixmultiply to see if they are willing to support f16.

akern40

Actually, before we merge, would you mind adding a note about the performance implications to the README?

swfsql · 2026-01-04T19:54:22Z

I've just added a note. Please let me know if there other improvements for this PR, I'm happy to change it if I can.
Edit: I haven't changed the Cargo.lock as I'm unsure the best practice for doing this.

swfsql added 3 commits November 27, 2025 13:16

add half::16 support

bc68d2f

adjustments for half

edcc07b

add feature doc/tests/benches for half

f48b45b

swfsql marked this pull request as ready for review November 28, 2025 06:23

nilgoyette reviewed Dec 1, 2025

View reviewed changes

fmt

7ae289f

akern40 approved these changes Jan 4, 2026

View reviewed changes

akern40 mentioned this pull request Jan 4, 2026

[Feature Request]: Support f16 via the half crate bluss/matrixmultiply#95

Open

akern40 requested changes Jan 4, 2026

View reviewed changes

README update

0214fc5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `half::f16` and `half::bf16` support#1551

Add `half::f16` and `half::bf16` support#1551
swfsql wants to merge 5 commits intorust-ndarray:masterfrom
swfsql:f16

swfsql commented Nov 27, 2025 •

edited

Loading

Uh oh!

nilgoyette Dec 1, 2025

Uh oh!

swfsql Dec 1, 2025 •

edited

Loading

Uh oh!

akern40 Jan 4, 2026

Uh oh!

swfsql Jan 4, 2026

Uh oh!

akern40 Jan 4, 2026

Uh oh!

akern40 left a comment

Uh oh!

swfsql commented Jan 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

swfsql commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nilgoyette Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

swfsql Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akern40 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

swfsql Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

akern40 Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

akern40 left a comment

Choose a reason for hiding this comment

Uh oh!

swfsql commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

swfsql commented Nov 27, 2025 •

edited

Loading

swfsql Dec 1, 2025 •

edited

Loading

swfsql commented Jan 4, 2026 •

edited

Loading