-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[KEP-5866] Server-side sharded watch #5867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
/cc |
|
/assign |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jefftree, jpbetz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| However, client-side sharding has a critical limitation: it does not reduce the incoming event | ||
| volume per replica. Every replica still receives the full stream of events, paying the CPU and | ||
| network cost to deserialize everything, only to discard items not belonging to their shard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Either this, or the client writes shard information into labels and filters on the labels, which comes with it's own set of problems and limitations.
| - **Coordination**: This proposal does not implement the coordination logic for clients. Clients | ||
| are still responsible for determining their shard ranges. | ||
| - **Resharding**: The API server does not manage shard rebalancing strategies; By providing the raw | ||
| hash ranges, clients can implement their own consistent hashing strategies if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of tackling this sub-problem before tackling coordination and resharding, but it's clear to me that comprehensive system to manage controller sharding would also solve these problems. Are we saying "we're never doing this and it's the users problem" or, "this is future work that we're interested in, but we'd like to solve this problem first" ?
| - Complexity in API Server filtering logic. | ||
| - Clients need to implement ring logic to calculate ranges. | ||
|
|
||
| ## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the ability to server-side filter is preferable to the below alternatives. But I can imagine different ways of expressing the server-side filter.
For example, Instead of providing ranges as a 1st class concept into the query params, an alternative would be to expand the expressiveness of filters more directly. E.g. fieldSelector=range(metadata.uuid, 0, 100) or even selector=0 < hash(object.metadata.uuid) < 100. This would either require expanding (extremely adhoc) grammar used to express field selection, or throwing CEL into the mix.
Has this been considered? If so why is the query param approach considered preferable?
/sig api-machinery