Skip to content

Comments

[ET-VK] Support different input layouts in q8ta_binary operator#17563

Merged
meta-codesync[bot] merged 1 commit intogh/SS-JIA/437/basefrom
gh/SS-JIA/437/head
Feb 20, 2026
Merged

[ET-VK] Support different input layouts in q8ta_binary operator#17563
meta-codesync[bot] merged 1 commit intogh/SS-JIA/437/basefrom
gh/SS-JIA/437/head

Conversation

@SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Feb 19, 2026

Stack from ghstack (oldest at bottom):

Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single in_layout specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate other_layout specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use in_layout. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: D93768638

Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single `in_layout` specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate `other_layout` specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use `in_layout`. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: [D93768638](https://our.internmc.facebook.com/intern/diff/D93768638/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 19, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17563

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 04e6b54 with merge base 7b843e4 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync bot merged commit 337d606 into gh/SS-JIA/437/base Feb 20, 2026
186 of 197 checks passed
@meta-codesync meta-codesync bot deleted the gh/SS-JIA/437/head branch February 20, 2026 01:13
SS-JIA pushed a commit that referenced this pull request Feb 20, 2026
Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single `in_layout` specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate `other_layout` specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use `in_layout`. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: [D93768638](https://our.internmc.facebook.com/intern/diff/D93768638/)

ghstack-source-id: 342806076
Pull Request resolved: #17563
SS-JIA pushed a commit that referenced this pull request Feb 20, 2026
Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single `in_layout` specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate `other_layout` specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use `in_layout`. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: [D93768638](https://our.internmc.facebook.com/intern/diff/D93768638/)

ghstack-source-id: 342806076
Pull Request resolved: #17563
SS-JIA pushed a commit that referenced this pull request Feb 20, 2026
Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single `in_layout` specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate `other_layout` specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use `in_layout`. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: [D93768638](https://our.internmc.facebook.com/intern/diff/D93768638/)

ghstack-source-id: 342806076
Pull Request resolved: #17563
SS-JIA pushed a commit that referenced this pull request Feb 20, 2026
Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single `in_layout` specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate `other_layout` specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use `in_layout`. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: [D93768638](https://our.internmc.facebook.com/intern/diff/D93768638/)

ghstack-source-id: 342806076
Pull Request resolved: #17563
SS-JIA pushed a commit that referenced this pull request Feb 20, 2026
Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single `in_layout` specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate `other_layout` specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use `in_layout`. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: [D93768638](https://our.internmc.facebook.com/intern/diff/D93768638/)

ghstack-source-id: 342806076
Pull Request resolved: #17563
SS-JIA pushed a commit that referenced this pull request Feb 20, 2026
Previously, the q8ta_binary operator required both inputs to use the same memory layout. This was enforced by using a single `in_layout` specialization constant for both input buffers. However, some models may have inputs with different layouts (e.g., 4W4C and 4C1W) that share the same packed dimension and block size, which should be compatible for binary operations. This change introduces a separate `other_layout` specialization constant for the second input, allowing the shader to correctly load from input_b using its actual layout while input_a continues to use `in_layout`. The C++ side now passes both layout hashes as separate specialization constants to the shader.

Differential Revision: [D93768638](https://our.internmc.facebook.com/intern/diff/D93768638/)

ghstack-source-id: 342806076
Pull Request resolved: #17563
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants