Added ability to accumulate in FP16 for GEMM for RISC-V #5640

ChipKerchner · 2026-02-10T17:37:23Z

Added ability to accumulate in FP16 for GEMM. Widens once at the end of loops.

Testing LLVM FP16 LMUL1 VLEN256 GEMM 1 0 0  512  512  512   1  2.0  1.0  1

Total time =         24948910

Testing LLVM FP16_N LMUL1 VLEN256 GEMM 1 0 0  512  512  512   1  2.0  1.0  1

Total time =         18968190

Accumulation differences are about 4X the widening (previous) version. But the performance it up to 2.7X faster - Note: BananaPi shows only 1.85X faster.

…of loops.

ChipKerchner · 2026-02-10T17:40:11Z

Unfortunately BF16 only has widening MADD instructions. So the same changes can not be made for BF16.

ChipKerchner · 2026-02-10T17:40:56Z

These are for VLEN = 256 only currently

ChipKerchner · 2026-02-10T18:31:00Z

It now works for VLEN = 128.

…X faster on BananaPi.

ChipKerchner · 2026-02-11T00:38:22Z

Main loop now uses LMUL = 2

ChipKerchner · 2026-02-11T00:46:47Z

Even faster!!!

Testing LLVM FP16_N LMUL1 VLEN256 GEMM 1 0 0  512  512  512   1  2.0  1.0  1

Total time =         13400067

ChipKerchner · 2026-02-11T19:52:02Z

Convert inputs from BF16 to FP32 and use FP32 vector madds. 18% faster.

ChipKerchner added 4 commits January 30, 2026 17:36

Merge remote-tracking branch 'origin' into develop

cb4e4ce

Merge remote-tracking branch 'origin/develop' into develop

720654a

Merge remote-tracking branch 'origin/develop' into develop

7da983e

Added ability to accumulate in FP16 for GEMM. Widens once at the end …

b5f2a50

…of loops.

128-bit versions.

aa1cebd

Forget to add defintion.

74d9fe2

ChipKerchner marked this pull request as draft February 10, 2026 22:04

Fixed MADD to use float16 values. Use LMUL = 2 in main loop. Now 1.85…

e3cb067

…X faster on BananaPi.

ChipKerchner marked this pull request as ready for review February 11, 2026 00:38

Convert inputs from BF16 to FP32 and use FP32 vector madds. 18% faster.

3356043

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added ability to accumulate in FP16 for GEMM for RISC-V #5640

Added ability to accumulate in FP16 for GEMM for RISC-V #5640

ChipKerchner commented Feb 10, 2026 •

edited

Loading

Uh oh!

ChipKerchner commented Feb 10, 2026 •

edited

Loading

Uh oh!

ChipKerchner commented Feb 10, 2026

Uh oh!

ChipKerchner commented Feb 10, 2026

Uh oh!

ChipKerchner commented Feb 11, 2026

Uh oh!

ChipKerchner commented Feb 11, 2026 •

edited

Loading

Uh oh!

ChipKerchner commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Added ability to accumulate in FP16 for GEMM for RISC-V #5640

Are you sure you want to change the base?

Added ability to accumulate in FP16 for GEMM for RISC-V #5640

Conversation

ChipKerchner commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Feb 10, 2026

Uh oh!

ChipKerchner commented Feb 10, 2026

Uh oh!

ChipKerchner commented Feb 11, 2026

Uh oh!

ChipKerchner commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChipKerchner commented Feb 10, 2026 •

edited

Loading

ChipKerchner commented Feb 10, 2026 •

edited

Loading

ChipKerchner commented Feb 11, 2026 •

edited

Loading