[ET-VK][qconv] Pad weight_sums buffer to multiple-of-4 alignment by SS-JIA · Pull Request #17505 · pytorch/executorch

SS-JIA · 2026-02-17T20:19:37Z

Stack from ghstack (oldest at bottom):

The q8ta convolution shaders read weight_sums via ivec4 loads (4 int32 values at once), requiring the buffer to have at least align_up_4(OC) elements. The weight tensor, weight_scales, and bias are all padded via align_width_and_update_state_dict, but weight_sums was created as a 1D tensor of shape (OC,) without any padding.

For OC values that are not a multiple of 4 (e.g. OC=1 in the final pointwise conv of MetaNet GreenScreen), this results in out-of-bounds GPU buffer reads. On host testing with ASAN, this manifests as a heap-buffer-overflow.

Fix by padding sum_per_output_channel to align_up_4(OC) before creating the constant placeholder. Also fix the C++ test utility compute_weight_sums() which was incorrectly shrinking a pre-allocated aligned buffer.

Differential Revision: D93511633

The q8ta convolution shaders read weight_sums via ivec4 loads (4 int32 values at once), requiring the buffer to have at least align_up_4(OC) elements. The weight tensor, weight_scales, and bias are all padded via align_width_and_update_state_dict, but weight_sums was created as a 1D tensor of shape (OC,) without any padding. For OC values that are not a multiple of 4 (e.g. OC=1 in the final pointwise conv of MetaNet GreenScreen), this results in out-of-bounds GPU buffer reads. On host testing with ASAN, this manifests as a heap-buffer-overflow. Fix by padding sum_per_output_channel to align_up_4(OC) before creating the constant placeholder. Also fix the C++ test utility compute_weight_sums() which was incorrectly shrinking a pre-allocated aligned buffer. Differential Revision: [D93511633](https://our.internmc.facebook.com/intern/diff/D93511633/) [ghstack-poisoned]

pytorch-bot · 2026-02-17T20:19:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17505

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit e7f6b78 with merge base 7b843e4 ():

NEW FAILURES - The following jobs have failed:

pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t ce2ed5b415f41f5449a0971f5dd328ced7895d90e0bb390fed652e020731cc7d /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 79f0ef519cdfe89f813da347cb7b9081f4b35adb9bf5eedc4bb4fef6ee84872d /exec failed with exit code 1
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t f0f1e7f519868f8b558b56a62e83316b0c3fb21d01b61033c0744e18de7e7ee5 /exec failed with exit code 1
Test CUDA Builds / test-models-cuda (conv1d) / linux-job (gh)
RuntimeError: Command docker exec -t a7d18173d3a24825e84538de00ab9e1ec6b1986d88bf785402e07074fdcfc2ae /exec failed with exit code 1
Test Metal Backend / export-model-metal-artifact (mistralai, Voxtral-Mini-3B-2507, quantized-int4-metal) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-17T20:21:02Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…gnment" The q8ta convolution shaders read weight_sums via ivec4 loads (4 int32 values at once), requiring the buffer to have at least align_up_4(OC) elements. The weight tensor, weight_scales, and bias are all padded via align_width_and_update_state_dict, but weight_sums was created as a 1D tensor of shape (OC,) without any padding. For OC values that are not a multiple of 4 (e.g. OC=1 in the final pointwise conv of MetaNet GreenScreen), this results in out-of-bounds GPU buffer reads. On host testing with ASAN, this manifests as a heap-buffer-overflow. Fix by padding sum_per_output_channel to align_up_4(OC) before creating the constant placeholder. Also fix the C++ test utility compute_weight_sums() which was incorrectly shrinking a pre-allocated aligned buffer. Differential Revision: [D93511633](https://our.internmc.facebook.com/intern/diff/D93511633/) [ghstack-poisoned]

Pull Request resolved: #17505 The q8ta convolution shaders read weight_sums via ivec4 loads (4 int32 values at once), requiring the buffer to have at least align_up_4(OC) elements. The weight tensor, weight_scales, and bias are all padded via align_width_and_update_state_dict, but weight_sums was created as a 1D tensor of shape (OC,) without any padding. For OC values that are not a multiple of 4 (e.g. OC=1 in the final pointwise conv of MetaNet GreenScreen), this results in out-of-bounds GPU buffer reads. On host testing with ASAN, this manifests as a heap-buffer-overflow. Fix by padding sum_per_output_channel to align_up_4(OC) before creating the constant placeholder. Also fix the C++ test utility compute_weight_sums() which was incorrectly shrinking a pre-allocated aligned buffer. ghstack-source-id: 342806075 @exported-using-ghexport Differential Revision: [D93511633](https://our.internmc.facebook.com/intern/diff/D93511633/)

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 17, 2026

ssjia and others added 2 commits February 18, 2026 13:02

manuelcandales approved these changes Feb 19, 2026

View reviewed changes

meta-codesync bot merged commit 8643b19 into gh/SS-JIA/433/base Feb 20, 2026
184 of 192 checks passed

meta-codesync bot deleted the gh/SS-JIA/433/head branch February 20, 2026 01:13

meta-codesync bot temporarily deployed to cherry-pick-bot February 20, 2026 01:13 Inactive

pytorchbot mentioned this pull request Feb 20, 2026

[ET-VK][qconv] Pad weight_sums buffer to multiple-of-4 alignment #17574

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[ET-VK][qconv] Pad weight_sums buffer to multiple-of-4 alignment#17505

[ET-VK][qconv] Pad weight_sums buffer to multiple-of-4 alignment#17505
meta-codesync[bot] merged 3 commits intogh/SS-JIA/433/basefrom
gh/SS-JIA/433/head

SS-JIA commented Feb 17, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

SS-JIA commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17505

❌ 6 New Failures

Uh oh!

github-actions bot commented Feb 17, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Feb 17, 2026 •

edited

Loading

pytorch-bot bot commented Feb 17, 2026 •

edited

Loading

This PR needs a `release notes:` label