-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Introduce synctest-based framework for deterministic testing #9099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Welcome @GaetanoMar96! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: GaetanoMar96 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @GaetanoMar96. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
35ac017 to
eb540b2
Compare
| var nodeInfoComparator nodegroupset.NodeInfoComparator | ||
| if len(autoscalingOptions.BalancingLabels) > 0 { | ||
| nodeInfoComparator = nodegroupset.CreateLabelNodeInfoComparator(autoscalingOptions.BalancingLabels) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull it outside, to main?
I just noticed that the core package calls the NewCloudProvider() -_-
My proposal of target state:
- move that branch outside, to main
- move initialization from core to main as well:
opts.CloudProvider = cloudBuilder.NewCloudProvider(opts, informerFactory) - in the core repo (and in the builder) just error out if CloudProvider is nil
- split the cloudprovider/, the interface should be clearly separated from provider-specific stuff (including cloudprovider/builder)
- Add some form of dependency enforcement, e.g. unit test or play with dependency visibility: at minimum, the core repo cannot have dependency on any of the cloud-specific providers.
I think 4 and 5 would require some larger discussion, but in this CL I think 1-3 are achievable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud provider depends on informer factory and core autoscaler options, which are initialized only later. If we move all this code in main then in test config we should handle all of them injecting them at build initialization. Are we sure we want to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... that might be a bit unsafe, since we don't know 100% which options does the NewCloudProvider really depend on.
Happy case: we can just move it out and see if any of the tests fail. But I'm not too confident that we have sufficient test coverage to detect all issues.
Another solution would be to leave it as is. IMO in this case we should at least move this initialization out of the core package, e.g. to the builder package. And ideally add a comment in the code that a target state is initialization in main.go, but it wasn't done because we don't have ability to test it thoroughly.
I'd like to get some feedback from other CA maintainers @mtrqq @towca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And regarding the nodeInfoComparator initialization, I think it's better to leave it in main.go for now, since the code is different depending on cloud provider (also it's in the spirit of the TODO below to eventually migrate it somewhere in the CloudProvider package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated the nodeInfoComparator. I will leave the conversation unresolved for now waiting from CA maintainers feedback
cluster-autoscaler/builder/bigunittest/staticautoscaler_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/builder/bigunittest/staticautoscaler_test.go
Outdated
Show resolved
Hide resolved
eb540b2 to
dac1406
Compare
| var nodeInfoComparator nodegroupset.NodeInfoComparator | ||
| if len(autoscalingOptions.BalancingLabels) > 0 { | ||
| nodeInfoComparator = nodegroupset.CreateLabelNodeInfoComparator(autoscalingOptions.BalancingLabels) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And regarding the nodeInfoComparator initialization, I think it's better to leave it in main.go for now, since the code is different depending on cloud provider (also it's in the spirit of the TODO below to eventually migrate it somewhere in the CloudProvider package).
1fd965e to
5180f1d
Compare
cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go
Outdated
Show resolved
Hide resolved
| testConfig := integration.NewConfig() | ||
|
|
||
| integration.RunTest(t, testConfig, func(ctx *integration.TestContext) { | ||
| ctx.BuildAutoscaler() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to keep it explicit and not executed within the RunTest? I would still keep this function public if there will be any test which needs to perform manual configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point and i will wait for Karol review as well. For context ideally we should make the autoscaler object explicit but then i was not able to find a proper way to make it work.
5180f1d to
58e27de
Compare
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR introduces a new testing framework for Cluster Autoscaler based on synctest.
The new framework utilizes a "bubble" environment and virtual time to:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The PR is structured into three logical commits to provide a clear path from architecture to implementation:
Refactor for Dependency Injection (Builder Pattern): I introduced a Builder Pattern in the main initialization logic. This allows us to inject fakes (KubeClient, CloudProvider, PodObservers) during setup. This decoupling is essential for moving away from heavyweight integration tests toward lightweight, controlled simulations.
Lifecycle & Leak Management: To support synctest's requirement for a clean "bubble" exit, I refactored internal components to properly accept and propagate context.Context. This ensures that background goroutines observe shutdown signals immediately, preventing the "durable sleep" leaks common in components that previously relied on static timers.
Framework & Lifecycle Tests: This commit introduces the TestContext and TestBuilder. It includes a full-lifecycle regression test (Scale-up -> Stabilization -> Scale-down) that demonstrates the framework's ability to orchestrate complex state transitions deterministically using the encapsulated state of our fake K8s and Cloud Provider implementations.
Usage: See staticautoscaler_test.go for an example of the new RunTest and ctx.Step pattern.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: