Arm backend: Add FP16 tests of models (mv3, ic3)#17586
Arm backend: Add FP16 tests of models (mv3, ic3)#17586martinlsm wants to merge 1 commit intopytorch:mainfrom
Conversation
Add testing of the following models executed in FP16: - MobileNetV3 - InceptionV3 This patch verifies that the Arm backend is able to lower full models in FP16 to valid TOSA, and execute them with acceptable numerical accuracy. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Change-Id: Ice3c6913598d540f7c7a52e403260943a7c8c597
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17586
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New FailuresAs of commit 1376dd0 with merge base bd6a75d ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label ciflow/trunk |
|
@pytorchbot label "partner: arm" |
|
@pytorchbot label "release notes: none" |
There was a problem hiding this comment.
Pull request overview
Adds FP16 end-to-end model tests for the Arm backend to validate FP16 lowering to TOSA and ensure outputs remain within acceptable numeric error.
Changes:
- Add an FP16 variant of the MobileNetV3 (small) TOSA FP pipeline test.
- Add an FP16 variant of the InceptionV3 TOSA FP pipeline test.
- Configure looser absolute tolerances for FP16 output comparisons.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backends/arm/test/models/test_mobilenet_v3_arm.py | Adds a new slow TOSA FP16 model test using an FP16 MobileNetV3 module + FP16 inputs. |
| backends/arm/test/models/test_inception_v3_arm.py | Adds a new slow TOSA FP16 model test using an FP16 InceptionV3 module + FP16 inputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mv3_fp16 = models.mobilenet_v3_small(weights=models.MobileNet_V3_Small_Weights).to( | ||
| torch.float16 | ||
| ) | ||
| mv3_fp16 = mv3_fp16.eval() | ||
|
|
There was a problem hiding this comment.
mv3_fp16 is instantiated and converted at import time, which forces a second model construction + weight load even when the FP16 test isn’t selected. Consider creating the FP16 model inside test_mv3_tosa_FP_fp16() (or via a cached pytest fixture) to reduce test import time and memory usage.
| aten_op=[], | ||
| exir_op=[], | ||
| use_to_edge_transform_and_lower=True, | ||
| atol=2e-2, |
There was a problem hiding this comment.
This FP16 test relaxes atol but leaves rtol at the default (1e-3). For reduced-precision model tests elsewhere (e.g. bf16), both tolerances are typically relaxed; consider specifying an appropriate rtol here as well to avoid overly strict relative comparisons and potential flakiness.
| atol=2e-2, | |
| atol=2e-2, | |
| rtol=1e-2, |
| ic3_fp16 = models.inception_v3(weights=models.Inception_V3_Weights).to(torch.float16) | ||
| ic3_fp16 = ic3_fp16.eval() |
There was a problem hiding this comment.
ic3_fp16 is instantiated and converted at import time, which forces a second model construction + weight load even when the FP16 test isn’t selected. Consider creating the FP16 model inside test_ic3_tosa_FP_fp16() (or via a cached pytest fixture) to reduce test import time and memory usage.
| aten_op=[], | ||
| exir_op=[], | ||
| use_to_edge_transform_and_lower=True, | ||
| atol=1e-2, |
There was a problem hiding this comment.
This FP16 test relaxes atol but leaves rtol at the default (1e-3). For reduced-precision model tests elsewhere (e.g. bf16), both tolerances are typically relaxed; consider specifying an appropriate rtol here as well to avoid overly strict relative comparisons and potential flakiness.
| atol=1e-2, | |
| atol=1e-2, | |
| rtol=1e-2, |
There was a problem hiding this comment.
OK to merge when atol/rtol is numped to tests pass.
e.g. This seem to get
FAILED backends/arm/test/models/test_inception_v3_arm.py::test_ic3_tosa_FP_fp16 - AssertionError: Output 0 does not match reference output.
Given atol: 0.01, rtol: 0.001.
Output tensor shape: torch.Size([1, 1000]), dtype: torch.float16
Difference: max: 0.03515625, abs: 0.03515625, mean abs error: 0.005782970428466797.
-- Model vs. Reference --
Numel: 1000, 1000
Median: -0.06884765625, -0.06573486328125
Mean: -0.02605916690826416, -0.026028093814849853
Max: 2.703125, 2.67578125
Min: -2.623046875, -2.62109375
= 1 failed, 91 passed, 3 skipped, 7 xfailed, 952 warnings in 963.15s (0:16:03) =
Add testing of the following models executed in FP16:
This patch verifies that the Arm backend is able to lower full models in FP16 to valid TOSA, and execute them with acceptable numerical accuracy.
cc @digantdesai @SS-JIA @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell