Skip to content

Conversation

@adrianmoisey
Copy link
Member

  • One-line PR description: Add initial KEP for Deprecate service.spec.externalIPs
  • Other comments: I relied on AI to generate this, it's way better at formal writing than me. It was very verbose, so I cut a lot of it out. I take responsibility for all the text produced.

@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Feb 2, 2026
@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Feb 2, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adrianmoisey
Once this PR has been reviewed and has the lgtm label, please assign deads2k, shaneutt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 2, 2026
- kube-proxy stops supporting externalIPs by default
- Clusters can still disable the feature gate if needed for transition

FIXME: How much time between enabling the gate and removing it?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 4 releases between enabling and removing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"1 year" = 3 releases

- Attempts to create new Services with externalIPs are rejected
- **Mitigation**: Set `--feature-gates=DisableServiceExternalIPs=false` to temporarily restore previous behavior during migration

FIXME: Is this something we want to allow users to do, or should the gate be locked to enable?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how removals work, should this gate be locked to enable, or should a user be allowed to override it? My assumption is that I should actually change this text to reflect that the gate is locked, but I'm not sure.

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reviewing the proposal/design part so far


## Summary

This KEP proposes to deprecate and remove the `Service.spec.externalIPs` field from the Kubernetes API, along with its implementation in kube-proxy. The externalIPs feature has long been considered a security risk and architectural problem, as it allows non-privileged users to claim arbitrary IP addresses without proper authorization or validation, potentially enabling man-in-the-middle attacks and other security exploits.[^1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please line-wrap so it's easier to comment on specific things


## Summary

This KEP proposes to deprecate and remove the `Service.spec.externalIPs` field from the Kubernetes API, along with its implementation in kube-proxy. The externalIPs feature has long been considered a security risk and architectural problem, as it allows non-privileged users to claim arbitrary IP addresses without proper authorization or validation, potentially enabling man-in-the-middle attacks and other security exploits.[^1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field is just being deprecated; we can't remove a GA field from a GA type. The kube-proxy implementation will be deprecated and then removed.

Copy link
Member Author

@adrianmoisey adrianmoisey Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I think I misunderstood the mission here.

So for clarity this is what's needed:

  1. Emit a warning when someone writes externalIPs to the API
  2. Eventually deprecate and remove kube-proxy's handling of ExternalIPs (and does this include ipvs?)
  3. Reject writes to Services with ExternalIPs?

(the above is in no specific order)

#3 is what I'm fuzzy on, I guess it's because deleting a field from an API and rejecting writes to it feels like the same thing. Thinking about it though, I assume we leave it there in case someone has a Service with externalIPs set, upgrades to a version if Kubernetes where externalIPs are disabled, and tries to fetch the object, the API server still needs to know how to serve Services with externalIPs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not 100% sure what we can get away with... I had proposed 5 possibilities in kubernetes/kubernetes#97076 (comment), with #5 being "Change Service validation to require the field to be unset (feature-gated in over several releases). Remove the implementation from kube-proxy and remove the docs.", and @thockin replied saying "I think we should (and can!) seriously consider nullifying the functionality of the field (your #6)", so I guess in addition to referring to it by the wrong number, he only explicitly stated support for the kube-proxy half of it.

So maybe on the apiserver side we just want to warn, forever. (With the warning text changing when support actually goes away.)


This KEP proposes to deprecate and remove the `Service.spec.externalIPs` field from the Kubernetes API, along with its implementation in kube-proxy. The externalIPs feature has long been considered a security risk and architectural problem, as it allows non-privileged users to claim arbitrary IP addresses without proper authorization or validation, potentially enabling man-in-the-middle attacks and other security exploits.[^1]

[^1]: CVE-2020-8554
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need an actual link here


1. Mark the API field as deprecated, emit warnings, and introduce a feature gate (default: false) that blocks creation of new Services with externalIPs and disables kube-proxy support
2. Enable the feature gate by default after the required 12+ month deprecation period (See [rule 7] of the Kubernetes deprecation policy)
3. Remove the field from the API and all associated code in kube-proxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Remove the field from the API and all associated code in kube-proxy
3. Remove all associated code in kube-proxy

deprecated: "v1.36"

feature-gates:
- name: DisableServiceExternalIPs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature gate should be named as though it's enabling the feature: AllowServiceExternalIPs. The default will be false, to not allow them, but people can override it to true to re-enable them (during the deprecation process).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense. So the k/k release that introduces the gate is also the release that stops externalIPs from being served?

Also just asking out loud here: In the past for other features I've worked on (relaxing validation), the rollout was always alpha (gate disabled by default) followed by beta (gate enabled by default).
I had assumed we need to follow a similar pattern here, but, that process isn't needed here, right?
The reason we do that pattern is to allow for rollbacks? In this KEP, we're not changing API validation, so on rollback to a version prior to the gate being enabled can't introduce any data that that version of Kubernetes can't handle. Is my understanding correct here?


FIXME: How much time between enabling the gate and removing it?

3. **Phase 3 - API Removal**: After some time (how much time?) remove the field entirely:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After "AllowServiceExternalIPs becomes Default: false", the next phase is "AllowServiceExternalIPs becomes Locked: true".

At this point, kube-apiserver and kube-proxy diverge, because kube-apiserver needs to support --min-compatibility-version (and pretend to be older releases), but kube-proxy does not. So:

  • remove the feature gate checks and all related code from kube-proxy (and kube-proxy's unit tests)
  • remove any e2e tests of the feature
  • update docs to reflect that the feature no longer exists
  • do not remove any code from kube-apiserver or its unit tests
  • for simplicity, do not remove the DenyServiceExternalIPs admission controller.

Then there's one more phase, 3 releases later, when you can finally clean up the feature gate, and remove the feature-gated code and the admission controller from kube-apiserver.


3. **Phase 3 - API Removal**: After some time (how much time?) remove the field entirely:

- Remove `externalIPs` field from Service API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we never do that.

so you need to clarify what happens when trying to edit existing services with externalIPs set.


### Risks and Mitigations

User who use this feature may not be aware of the upcoming deprecation and removal, and they may not have alternative options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to say more here, like that we'll provide some documentation...


### API Changes

#### Phase 1: Deprecation Warnings and Feature Gate Introduction (v1.36)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a duplicate of the Proposal section, and you should only keep one or the other.

Most of the comments from there apply here too if you decide to keep this one.

**When feature gate is enabled** (opt-in):

- API server validates that `spec.externalIPs` is not set or empty on Service CREATE operations
- Existing Services with externalIPs can be updated (but not to add new IPs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Existing Services with externalIPs can be updated (but not to add new IPs)
- Existing Services with externalIPs can be updated (as long as externalIPs is unchanged, or removed)

"but not to add new IPs" is vague and difficult to implement

##### Unit tests

- Add unit tests for deprecation warning emission when externalIPs field is set
- Update existing unit tests that use externalIPs to assert deprecation warnings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests outside of apiserver wouldn't actually get any deprecation warnings. You could test for deprecation warnings in integration or e2e tests, but we generally don't.

- When `DisableServiceExternalIPs=true`: verify Service CREATE with externalIPs is rejected
- When `DisableServiceExternalIPs=true`: verify Service UPDATE can proceed but cannot add new IPs
- When `DisableServiceExternalIPs=false`: verify existing behavior preserved
- Add e2e test for kube-proxy to verify it ignores externalIPs when gate is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(doesn't belong in the "Unit tests" section)


##### Integration tests

- Test that Services with externalIPs continue to function correctly during Phase 1 (deprecation warnings only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's e2e, not integration

Comment on lines +229 to +232
- Test kube-proxy behavior with feature gate:
- Verify iptables/ipvs/nftables rules are NOT created for externalIPs when gate is enabled
- Verify rules ARE created when gate is disabled
- Verify proper cleanup when gate is toggled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no kube-proxy integration tests. These should be unit tests (and you can ignore ipvs).

Comment on lines +236 to +239
Existing e2e tests that use externalIPs should be updated to:
- Expect deprecation warnings (Phase 1)
- Document that they are testing deprecated functionality
- Be modified when feature gate is introduced (tests should test both gate enabled and disabled scenarios)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, it seems that there's only one existing e2e test of externalIPs.

The default set of test jobs include a job that enables all Beta features, and a job that enables all Alpha and Beta features, but there is no job that enables Deprecated features (@aojea right?), so unless we create a special e2e job just for this KEP, the code is only going to be tested in whatever the default state of the feature gate is in any given release. But this is fine; we know the externalIPs code works when it's enabled, because we've tested it in the past, and the unit tests should give us confidence that the feature gate is working as expected.


###### What specific metrics should inform a rollback?

- `apiserver_requested_deprecated_apis{group="",version="v1",resource="services",removed_release="1.36"}` showing unexpectedly high usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't imply a rollback is needed. It just implies they need to be careful about further upgrades.

There are no existing metrics that would let them know that things were actually failing because of externalIPs being present, but ignored.


###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Standard upgrade/downgrade testing will be performed. Since this only affects API warnings, the testing focus will be on:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Since this only affects API warnings" is only true for phase 1.


###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Yes, this KEP is itself a deprecation and removal of the `Service.spec.externalIPs` field. The field will continue to function during the deprecation period.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely true

A specific metric for externalIPs field usage would be helpful:
- `apiserver_service_externalips_count`: Number of Services currently using externalIPs

These metrics would help operators track adoption and migration progress.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a kube-proxy metric rather than apiserver? not sure...


###### What are other known failure modes?

Not applicable - this is a deprecation that only adds warnings, not new functionality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants