You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- For state gathering, `extension_point="PreFilter"`
1144
1145
- For allocation, `extension_point="Filter"`
@@ -1297,36 +1298,31 @@ No.
1297
1298
1298
1299
### Troubleshooting
1299
1300
1300
-
<!--
1301
-
This section must be completed when targeting beta to a release.
1302
-
1303
-
For GA, this section is required: approvers should be able to confirm the
1304
-
previous answers based on experience in the field.
1305
-
1306
-
The Troubleshooting section currently serves the `Playbook` role. We may consider
1307
-
splitting it into a dedicated `Playbook` document (potentially with some monitoring
1308
-
details). For now, we leave it here.
1309
-
-->
1301
+
The troubleshooting section in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#troubleshooting
1302
+
still applies. The only additional failure modes comes from version skew
1303
+
in the cluster and the troubleshooting steps provided through the link above
1304
+
should be sufficient to determine the cause.
1310
1305
1311
1306
###### How does this feature react if the API server and/or etcd is unavailable?
1312
1307
1308
+
See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#how-does-this-feature-react-if-the-api-server-andor-etcd-is-unavailable.
1309
+
1313
1310
###### What are other known failure modes?
1314
1311
1315
-
<!--
1316
-
For each of them, fill in the following information by copying the below template:
1317
-
- [Failure mode brief description]
1318
-
- Detection: How can it be detected via metrics? Stated another way:
1319
-
how can an operator troubleshoot without logging into a master or worker node?
1320
-
- Mitigations: What can be done to stop the bleeding, especially for already
1321
-
running user workloads?
1322
-
- Diagnostics: What are the useful log messages and their required logging
1323
-
levels that could help debug the issue?
1324
-
Not required until feature graduated to beta.
1325
-
- Testing: Are there any tests for failure mode? If not, describe why.
1326
-
-->
1312
+
See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#what-are-other-known-failure-modes.
1313
+
1314
+
- kube-scheduler cannot allocate ResourceClaims.
1315
+
1316
+
The shared device may not have sufficient capacity to satisfy the request. The log message `Device capacity not enough` and the `capacities` field in the log `Allocating one device` can provide further clues for investigation (require -v=7 on kube-scheduler).
1317
+
1318
+
If the feature is disabled but a ResourceClaim still requests capacity, the scheduler log will report:
1319
+
has capacity requests, but the DRAConsumableCapacity feature is disabled. Nevertheless, when using the allocator in stable mode, no logs related to the DRAConsumableCapacity feature will be emitted.
1320
+
1327
1321
1328
1322
###### What steps should be taken if SLOs are not being met to determine the problem?
1329
1323
1324
+
N/A
1325
+
1330
1326
## Implementation History
1331
1327
1332
1328
<!--
@@ -1352,6 +1348,10 @@ Alpha 1.35:
1352
1348
- [Fix 134519 - add ShareID to kubelet plugin API PR 134520](https://github.com/kubernetes/kubernetes/pull/134520) has been pushed on 2025-10-10
1353
1349
- [Increase test coverage PR 134615](https://github.com/kubernetes/kubernetes/pull/134615) has been pushed on 2025-10-15
1354
1350
1351
+
Beta 1.36:
1352
+
1353
+
- [Promote DRAConsumableCapacity to Beta PR 136611](https://github.com/kubernetes/kubernetes/pull/136611) has been pushed on 2026-01-29
0 commit comments