-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5707: Add initial KEP for deprecate service.spec.externalIPs #5873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: adrianmoisey The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| - kube-proxy stops supporting externalIPs by default | ||
| - Clusters can still disable the feature gate if needed for transition | ||
|
|
||
| FIXME: How much time between enabling the gate and removing it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 4 releases between enabling and removing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"1 year" = 3 releases
| - Attempts to create new Services with externalIPs are rejected | ||
| - **Mitigation**: Set `--feature-gates=DisableServiceExternalIPs=false` to temporarily restore previous behavior during migration | ||
|
|
||
| FIXME: Is this something we want to allow users to do, or should the gate be locked to enable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how removals work, should this gate be locked to enable, or should a user be allowed to override it? My assumption is that I should actually change this text to reflect that the gate is locked, but I'm not sure.
danwinship
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reviewing the proposal/design part so far
|
|
||
| ## Summary | ||
|
|
||
| This KEP proposes to deprecate and remove the `Service.spec.externalIPs` field from the Kubernetes API, along with its implementation in kube-proxy. The externalIPs feature has long been considered a security risk and architectural problem, as it allows non-privileged users to claim arbitrary IP addresses without proper authorization or validation, potentially enabling man-in-the-middle attacks and other security exploits.[^1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please line-wrap so it's easier to comment on specific things
|
|
||
| ## Summary | ||
|
|
||
| This KEP proposes to deprecate and remove the `Service.spec.externalIPs` field from the Kubernetes API, along with its implementation in kube-proxy. The externalIPs feature has long been considered a security risk and architectural problem, as it allows non-privileged users to claim arbitrary IP addresses without proper authorization or validation, potentially enabling man-in-the-middle attacks and other security exploits.[^1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field is just being deprecated; we can't remove a GA field from a GA type. The kube-proxy implementation will be deprecated and then removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I think I misunderstood the mission here.
So for clarity this is what's needed:
- Emit a warning when someone writes externalIPs to the API
- Eventually deprecate and remove kube-proxy's handling of ExternalIPs (and does this include ipvs?)
- Reject writes to Services with ExternalIPs?
(the above is in no specific order)
#3 is what I'm fuzzy on, I guess it's because deleting a field from an API and rejecting writes to it feels like the same thing. Thinking about it though, I assume we leave it there in case someone has a Service with externalIPs set, upgrades to a version if Kubernetes where externalIPs are disabled, and tries to fetch the object, the API server still needs to know how to serve Services with externalIPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not 100% sure what we can get away with... I had proposed 5 possibilities in kubernetes/kubernetes#97076 (comment), with #5 being "Change Service validation to require the field to be unset (feature-gated in over several releases). Remove the implementation from kube-proxy and remove the docs.", and @thockin replied saying "I think we should (and can!) seriously consider nullifying the functionality of the field (your #6)", so I guess in addition to referring to it by the wrong number, he only explicitly stated support for the kube-proxy half of it.
So maybe on the apiserver side we just want to warn, forever. (With the warning text changing when support actually goes away.)
|
|
||
| This KEP proposes to deprecate and remove the `Service.spec.externalIPs` field from the Kubernetes API, along with its implementation in kube-proxy. The externalIPs feature has long been considered a security risk and architectural problem, as it allows non-privileged users to claim arbitrary IP addresses without proper authorization or validation, potentially enabling man-in-the-middle attacks and other security exploits.[^1] | ||
|
|
||
| [^1]: CVE-2020-8554 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need an actual link here
|
|
||
| 1. Mark the API field as deprecated, emit warnings, and introduce a feature gate (default: false) that blocks creation of new Services with externalIPs and disables kube-proxy support | ||
| 2. Enable the feature gate by default after the required 12+ month deprecation period (See [rule 7] of the Kubernetes deprecation policy) | ||
| 3. Remove the field from the API and all associated code in kube-proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 3. Remove the field from the API and all associated code in kube-proxy | |
| 3. Remove all associated code in kube-proxy |
| deprecated: "v1.36" | ||
|
|
||
| feature-gates: | ||
| - name: DisableServiceExternalIPs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature gate should be named as though it's enabling the feature: AllowServiceExternalIPs. The default will be false, to not allow them, but people can override it to true to re-enable them (during the deprecation process).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that makes sense. So the k/k release that introduces the gate is also the release that stops externalIPs from being served?
Also just asking out loud here: In the past for other features I've worked on (relaxing validation), the rollout was always alpha (gate disabled by default) followed by beta (gate enabled by default).
I had assumed we need to follow a similar pattern here, but, that process isn't needed here, right?
The reason we do that pattern is to allow for rollbacks? In this KEP, we're not changing API validation, so on rollback to a version prior to the gate being enabled can't introduce any data that that version of Kubernetes can't handle. Is my understanding correct here?
|
|
||
| FIXME: How much time between enabling the gate and removing it? | ||
|
|
||
| 3. **Phase 3 - API Removal**: After some time (how much time?) remove the field entirely: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After "AllowServiceExternalIPs becomes Default: false", the next phase is "AllowServiceExternalIPs becomes Locked: true".
At this point, kube-apiserver and kube-proxy diverge, because kube-apiserver needs to support --min-compatibility-version (and pretend to be older releases), but kube-proxy does not. So:
- remove the feature gate checks and all related code from kube-proxy (and kube-proxy's unit tests)
- remove any e2e tests of the feature
- update docs to reflect that the feature no longer exists
- do not remove any code from kube-apiserver or its unit tests
- for simplicity, do not remove the
DenyServiceExternalIPsadmission controller.
Then there's one more phase, 3 releases later, when you can finally clean up the feature gate, and remove the feature-gated code and the admission controller from kube-apiserver.
|
|
||
| 3. **Phase 3 - API Removal**: After some time (how much time?) remove the field entirely: | ||
|
|
||
| - Remove `externalIPs` field from Service API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we never do that.
so you need to clarify what happens when trying to edit existing services with externalIPs set.
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| User who use this feature may not be aware of the upcoming deprecation and removal, and they may not have alternative options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to say more here, like that we'll provide some documentation...
|
|
||
| ### API Changes | ||
|
|
||
| #### Phase 1: Deprecation Warnings and Feature Gate Introduction (v1.36) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a duplicate of the Proposal section, and you should only keep one or the other.
Most of the comments from there apply here too if you decide to keep this one.
| **When feature gate is enabled** (opt-in): | ||
|
|
||
| - API server validates that `spec.externalIPs` is not set or empty on Service CREATE operations | ||
| - Existing Services with externalIPs can be updated (but not to add new IPs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Existing Services with externalIPs can be updated (but not to add new IPs) | |
| - Existing Services with externalIPs can be updated (as long as externalIPs is unchanged, or removed) |
"but not to add new IPs" is vague and difficult to implement
| ##### Unit tests | ||
|
|
||
| - Add unit tests for deprecation warning emission when externalIPs field is set | ||
| - Update existing unit tests that use externalIPs to assert deprecation warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests outside of apiserver wouldn't actually get any deprecation warnings. You could test for deprecation warnings in integration or e2e tests, but we generally don't.
| - When `DisableServiceExternalIPs=true`: verify Service CREATE with externalIPs is rejected | ||
| - When `DisableServiceExternalIPs=true`: verify Service UPDATE can proceed but cannot add new IPs | ||
| - When `DisableServiceExternalIPs=false`: verify existing behavior preserved | ||
| - Add e2e test for kube-proxy to verify it ignores externalIPs when gate is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(doesn't belong in the "Unit tests" section)
|
|
||
| ##### Integration tests | ||
|
|
||
| - Test that Services with externalIPs continue to function correctly during Phase 1 (deprecation warnings only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's e2e, not integration
| - Test kube-proxy behavior with feature gate: | ||
| - Verify iptables/ipvs/nftables rules are NOT created for externalIPs when gate is enabled | ||
| - Verify rules ARE created when gate is disabled | ||
| - Verify proper cleanup when gate is toggled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no kube-proxy integration tests. These should be unit tests (and you can ignore ipvs).
| Existing e2e tests that use externalIPs should be updated to: | ||
| - Expect deprecation warnings (Phase 1) | ||
| - Document that they are testing deprecated functionality | ||
| - Be modified when feature gate is introduced (tests should test both gate enabled and disabled scenarios) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, it seems that there's only one existing e2e test of externalIPs.
The default set of test jobs include a job that enables all Beta features, and a job that enables all Alpha and Beta features, but there is no job that enables Deprecated features (@aojea right?), so unless we create a special e2e job just for this KEP, the code is only going to be tested in whatever the default state of the feature gate is in any given release. But this is fine; we know the externalIPs code works when it's enabled, because we've tested it in the past, and the unit tests should give us confidence that the feature gate is working as expected.
|
|
||
| ###### What specific metrics should inform a rollback? | ||
|
|
||
| - `apiserver_requested_deprecated_apis{group="",version="v1",resource="services",removed_release="1.36"}` showing unexpectedly high usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't imply a rollback is needed. It just implies they need to be careful about further upgrades.
There are no existing metrics that would let them know that things were actually failing because of externalIPs being present, but ignored.
|
|
||
| ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? | ||
|
|
||
| Standard upgrade/downgrade testing will be performed. Since this only affects API warnings, the testing focus will be on: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Since this only affects API warnings" is only true for phase 1.
|
|
||
| ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? | ||
|
|
||
| Yes, this KEP is itself a deprecation and removal of the `Service.spec.externalIPs` field. The field will continue to function during the deprecation period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely true
| A specific metric for externalIPs field usage would be helpful: | ||
| - `apiserver_service_externalips_count`: Number of Services currently using externalIPs | ||
|
|
||
| These metrics would help operators track adoption and migration progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a kube-proxy metric rather than apiserver? not sure...
|
|
||
| ###### What are other known failure modes? | ||
|
|
||
| Not applicable - this is a deprecation that only adds warnings, not new functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely true
service.spec.externalIPs#5707