Skip to content

Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.#835

Open
copybara-service[bot] wants to merge 1 commit intodevfrom
test_868146247