-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5547: Integrate Workload APIs with Job controller #5871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: helayoty <heelayot@microsoft.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: helayoty The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| components: | ||
| - kube-apiserver | ||
| - kube-controller-manager | ||
| - kube-scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there are any changes required for kube-scheduler
| @@ -0,0 +1,3 @@ | |||
| kep-number: 5547 | |||
| alpha: | |||
| approver: "@soltysh" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh is listed as approver for sig-apps also.
We should probably look for someone else to approve PRR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please post on #prod-readiness to see who can take it.
| ### Goals | ||
|
|
||
| - Job controller automatically creates `Workload` and `PodGroup` objects for Jobs that require gang scheduling. | ||
| - Job with `parallelism > 1` will use `GangSchedulingPolicy` with `minCount = parallelism` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would break if JobSet also adds gang support.
How can someone opt out of this even if parallelism > 1?
|
/sig apps |
| We will add the following integration tests to the Job controller `https://github.com/kubernetes/kubernetes/blob/v1.35.0/test/integration/job/job_test.go`: | ||
| - Gang and Basic Scheduling Lifecycle Test (create, update, delete Job, verify Workload and PodGroup creation, verify pods have workloadRef, verify Job deletion cascades to Workload and PodGroup deletion) | ||
| - Failure Recovery Test (create Job with Workload API unavailable, verify Job controller retries, verify Workload is eventually created) | ||
| - Feature gate disable/enable (Jobs work without Workload/PodGroup creation (Jobs with ownerReferences managed by higher-level controllers do not create Workload/PodGroup)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few areas we need to cover in alpha:
- How does this feature work with suspended jobs?
- If a job has ownerreferences set can we verify that no workload is created?
- ElasticJob is forbidden. We should test/verify this.
| - The automatic policy selection is based on `Job` Type | ||
| - Jobs with `parallelism > 1` use gang scheduling policy where `minCount` equals the Job's parallelism. | ||
| - Jobs without indexed completion mode or `completions = 1`, use basic scheduling policy (pod-by-pod scheduling - `minCount`). | ||
| - Elastic Jobs (changing parallelism at runtime) are not supported when gang scheduling is active. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively breaks Elastic Indexed Jobs, which is a GA feature, when this feature turns on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to forbid modifying parallelism in this case but do it in a way that doesn't break existing users of this feature.
One-line PR description: Integrate
WorkloadandPodGroupAPIs with the Job controllers to support gang-scheduling.Issue link: WAS: Integrate Workload APIs with Job controller #5547
Other comments: See other KEPs
/sig scheduling