-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Don't fetch nodegroup info from cloud provider if node is in scaledown #9051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Don't fetch nodegroup info from cloud provider if node is in scaledown #9051
Conversation
If a node is being scaled down, it may get deleted at any time. Currently during large scaledowns cluster autoscaler is unable to associate nodes that are being deleted nodes with a nodegroup, causing expensive cache updates. This change checks if a node is in scaledown and gets the nodegoup from scaledown request, not from cloud provider. It also increases the MaxKubernetesEmptyNodeDeletionTime and MaxCloudProviderNodeDeletionTime to account for tail latencies in large bulk deletes.
|
Hi @tetianakh. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tetianakh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
The description mentions bumping Also, can you update the release note? I think this is a performance optimization that may be worth calling out in the changelog. |
|
++ to @x13n 's question. |
|
Thanks for noticing, changes in cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go are missing from the commit. Let me take another try at fixing this. After running more thorough tests I am less confident that this change fully fixes the problem. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
If a node is being scaled down, it may get deleted at any time. Currently during large scaledowns cluster autoscaler is unable to associate nodes that are being deleted nodes with a nodegroup, causing expensive cache updates.
This change checks if a node is in scaledown and gets the nodegoup from scaledown request, not from cloud provider.
It also increases the MaxKubernetesEmptyNodeDeletionTime and MaxCloudProviderNodeDeletionTime to account for tail latencies in large bulk deletes.
Special notes for your reviewer:
Does this PR introduce a user-facing change?