Skip to content

Implement Hash-Based Routing#519

Open
hoffmaen wants to merge 21 commits intocloudfoundry:developfrom
sap-contributions:hash-based-routing-with-overload
Open

Implement Hash-Based Routing#519
hoffmaen wants to merge 21 commits intocloudfoundry:developfrom
sap-contributions:hash-based-routing-with-overload

Conversation

@hoffmaen
Copy link
Contributor

@hoffmaen hoffmaen commented Nov 11, 2025

Summary

This Pull-Request implements #505.

  • Implement hash-based routing in Gorouter
    • Gorouter routes requests to endpoints based on the value of a configured request header
    • Gorouter distributes requests to different predefined application instances, when the current request count to an app instance exceeds the balance factor
    • Gorouter routes requests to the "next instance" in the lookup table on retries
  • Follow decisions documented in this comment

Backward Compatibility

Breaking Change? Yes/No

@b1tamara b1tamara force-pushed the hash-based-routing-with-overload branch from e9cbe0c to 7b456f8 Compare December 3, 2025 09:32
@hoffmaen hoffmaen force-pushed the hash-based-routing-with-overload branch from 9f21f6b to 513c575 Compare December 9, 2025 08:24
Copy link
Contributor

@peanball peanball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First set of comments. I have not looked at the implementation of maglev and the hash based instance determination.

I've focussed on the integration with the existing Gorouter code, route lookup, LB algo handling, etc.

@hoffmaen hoffmaen force-pushed the hash-based-routing-with-overload branch from 5030bbc to 560f935 Compare January 16, 2026 14:29
Copy link
Contributor

@peanball peanball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, now on the hash_based.go

@b1tamara b1tamara requested a review from peanball January 21, 2026 07:22
@peanball
Copy link
Contributor

LGTM. Once we have the performance tests, we should move to community review.

@hoffmaen
Copy link
Contributor Author

hoffmaen commented Jan 28, 2026

Good news - we evaluated the performance of gorouter with hash-based routing support.

Our test setup constists of scaled out HAProxy (as a load balancer in front of gorouter), scaled out Diego Cells and a very simple and performant application with 30 instances. This allows us to performance test gorouter via external requests, where gorouter is the bottleneck.

We compared the performance of the latest stable routing-release, and the same release including changes from this Pull-Request. Using oha (inofficial successor of hey), we put various load patterns on the application and measured throughput, CPU usage of gorouter, and latency. In summary,

  • both versions performed absolutely the same when using round-robin
  • we achieved the same performance using hash-based routing and round-robin, when requests are distributed across all instances (achieved by running oha with multiple instances, each using a different hash_header value)
  • various mixed load scenarios (load on a route with round-robin parallel to load on a route with hash) showed no impact on performance

We also explored that requests are routed to next instances as expected, when the max. connection limit of one app instance was reached. The hash_balance factor is effective and allows distribution of requests to next instances before overloading it.

Please feel free to reach out for details on the results!

peanball

This comment was marked as outdated.

@b1tamara b1tamara requested a review from peanball January 28, 2026 13:35
@peanball peanball dismissed their stale review January 28, 2026 14:00

Changes were addressed. As this is still subject to change, I cannot give "final" approval yet.

@hoffmaen hoffmaen marked this pull request as ready for review January 28, 2026 14:53
@hoffmaen hoffmaen requested a review from a team as a code owner January 28, 2026 14:53
@a18e a18e force-pushed the hash-based-routing-with-overload branch from 21cc8b7 to b414a38 Compare January 30, 2026 10:31
@b1tamara
Copy link
Contributor

b1tamara commented Feb 5, 2026

During performance tests, we figured out that the Gorouter memory consumption is higher than expected if we keep the permutation table (table which is used to fill the final maglev lookup table) in memory. We want to investigate this finding further and optimize it.
Moving PR back to Draft.

@b1tamara b1tamara marked this pull request as draft February 5, 2026 08:21
Copy link
Contributor Author

@hoffmaen hoffmaen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only documentation / log related comments.

This commit provides the basic implementation for hash-based routing. It does not consider the balance factor yet.

Co-authored-by: Clemens Hoffmann <clemens.hoffmann@sap.com>
Co-authored-by: Tamara Boehm <tamara.boehm@sap.com>
Co-authored-by: Soha Alboghdady <soha.alboghdady@sap.com>
@b1tamara b1tamara force-pushed the hash-based-routing-with-overload branch from d21580f to cd304af Compare March 9, 2026 11:37
@b1tamara b1tamara force-pushed the hash-based-routing-with-overload branch from cd304af to 69da88c Compare March 9, 2026 11:41
@b1tamara b1tamara marked this pull request as ready for review March 10, 2026 06:11
@maxmoehl
Copy link
Member

Please make sure to compact the history before merging.

router.hash_based_routing.lookup_table_size:
description: |
Size of the Maglev lookup table used for hash-based load balancing.
The size should be approximately chosen based on the expected maximum number of endpoints / application instances per route which use hash-based routing. The provided values are not hard limits but recommended values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say that they are not hard limits, yet you prevent the operator from choosing their own value. Wouldn't an integer and a comment explaining the rough ranges be a better approach here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assumed that it could be difficult for operators to determine the exact value for this table size.
However, the current "t-shirt size" approach is an MVP for this first pass and may be refined as we gather more data. We are still discussing internally how to dynamically calculate the maglev lookup table size based on endpoint numbers, rather than defining a static configuration value via the routing release.

LoadBalancingAlgorithm string `json:"loadbalancing"`
HashHeaderName string `json:"hash_header"`
HashBalance float64 `json:"hash_balance"`
HashBalance float64 `json:"hash_balance,string"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this just a number in the JSON that we are parsing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoffmaen
Could you remember why we changed to string?

Copy link
Contributor Author

@hoffmaen hoffmaen Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is because route-emitter sends the value as a string. We didn't want to make changes to route-emitter, and having the value as a string has no practical downsides.
This is also the implementation in cloud-controller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

6 participants