Refactor coordinator logic and add sharded support by bootjp · Pull Request #304 · bootjp/elastickv

bootjp · 2026-02-05T12:54:55Z

No description provided.

gemini-code-assist · 2026-02-05T12:55:26Z

Summary of Changes

Hello @bootjp, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a fundamental architectural shift by implementing sharding capabilities within the system. It abstracts the coordination logic into a flexible interface, enabling operations to be routed to specific Raft groups based on key ranges. This change impacts how data is stored, retrieved, and how transactions are processed, ensuring scalability and distributed consistency. The accompanying updates to the Jepsen test suite provide robust validation for these new sharding mechanisms.

Highlights

Sharding Support: Introduced sharding capabilities by refactoring the coordinator logic to support multiple Raft groups and key-based routing. New components like LeaderProxy, ShardStore, and ShardedCoordinator manage distributed operations.
Coordinator Interface Refactoring: The kv.Coordinate type has been replaced with a kv.Coordinator interface, which now includes key-aware leader checking methods (IsLeaderForKey, VerifyLeaderForKey, RaftLeaderForKey) to facilitate sharding.
Transaction Handling Improvements: Transaction dispatching logic has been streamlined to explicitly handle PREPARE and COMMIT phases using a new elemToMutation helper, simplifying the creation of transactional requests.
Redis Adapter Updates: The Redis adapter now uses constants for commands, improving readability and maintainability. It also integrates the new key-aware coordinator methods for sharded operations like EXISTS, LRANGE, and RPUSH.
Jepsen Test Suite Enhancement: The Jepsen test suite has been significantly updated to support and validate multi-group Raft configurations and shard ranges, including new Docker files for test environment setup and execution.

Changelog

adapter/dynamodb.go
- Updated NewDynamoDBServer to use the kv.Coordinator interface.
adapter/grpc.go
- Updated NewGRPCServer to use the kv.Coordinator interface.
- Modified RawGet to perform key-specific leader checks using IsLeaderForKey.
- Modified tryLeaderGet to retrieve key-specific Raft leaders using RaftLeaderForKey.
adapter/redis.go
- Introduced constants for Redis commands (e.g., cmdGet, cmdSet) for improved code clarity.
- Updated NewRedisServer to accept the kv.Coordinator interface.
- Replaced hardcoded Redis command strings with new constants in command routing and transaction logic.
- Implemented key-specific leader checks (IsLeaderForKey, VerifyLeaderForKey) for commands like EXISTS, LRANGE, and RPUSH.
- Refactored transaction timestamp generation with new txnStartTS, maxLatestCommitTS, and bumpLatestCommitTS helpers.
- Updated proxy functions (proxyLRange, proxyRPush, tryLeaderGetAt, readValueAt) to use RaftLeaderForKey.
adapter/redis_proxy.go
- Modified proxyExists to use RaftLeaderForKey for key-specific leader retrieval.
jepsen/docker/id_rsa
- Added a new RSA private key for Jepsen control node SSH access.
jepsen/docker/run-in-docker.sh
- Added a new script to set up and run Jepsen tests within a Docker container, including Go and Leiningen installation, and SSH configuration.
jepsen/docker/ssh_config
- Added a new SSH configuration file for Jepsen nodes.
jepsen/src/elastickv/db.clj
- Added functions (group-ids, group-addr, build-raft-groups-arg) to manage multiple Raft groups.
- Updated build-raft-redis-map to support mapping Redis addresses to specific Raft group addresses.
- Modified start-node! to accept raft-groups and shard-ranges options, enabling group-specific gRPC addresses.
- Added validation to ensure shard-ranges are provided when multiple raft-groups are configured.
- Updated wait-for-grpc! to handle waiting on multiple gRPC ports for sharded setups.
- Modified join-node! to allow peers to join specific Raft groups.
- Updated db function documentation to reflect new raft-groups and shard-ranges options.
jepsen/src/elastickv/redis_workload.clj
- Imported additional Jepsen modules (jepsen.db, jepsen.os, jepsen.nemesis).
- Introduced a --local CLI option to run tests without SSH or nemesis for local development.
- Updated db initialization to pass raft-groups and shard-ranges to the database configuration.
- Conditionalized OS, network, and nemesis setup based on the --local flag.
- Added new CLI options: --local, --host, --ports, --raft-groups, --shard-ranges.
- Modified the main function to process new CLI options and dynamically generate node lists for local testing.
kv/coordinator.go
- Extended the Coordinator interface with IsLeaderForKey, VerifyLeaderForKey, and RaftLeaderForKey methods.
- Provided default implementations for the new key-aware leader methods in the Coordinate struct.
- Refactored dispatchTxn to use elemToMutation and explicitly construct PREPARE and COMMIT phase requests.
- Removed the toTxnRequests helper function.
- Added ErrLeaderNotFound error constant.
- Updated redirect to use elemToMutation for transactional requests.
- Added elemToMutation helper function to convert operation elements to protobuf mutations.
kv/leader_proxy.go
- New file: Implements LeaderProxy to forward transactional requests to the current Raft leader, handling cases where the local node is not the leader.
kv/shard_key.go
- New file: Provides the routeKey function to normalize internal keys (e.g., list metadata) to their logical user key for consistent shard routing.
kv/shard_router.go
- Modified groupRequests and Get methods to utilize routeKey for proper key-based routing to shards.
kv/shard_router_test.go
- Added TestShardRouterRoutesListKeys to verify that list metadata keys are correctly routed to their respective shards.
kv/shard_store.go
- New file: Implements ShardStore, an MVCC store wrapper that routes read operations to the correct shard-specific store and proxies requests to leaders when necessary.
kv/sharded_coordinator.go
- New file: Implements ShardedCoordinator, which routes operations to shard-specific Raft groups, manages a shared Hybrid Logical Clock (HLC), and dispatches requests via the ShardRouter.
kv/sharded_integration_test.go
- New file: Contains integration tests for the sharded coordinator, including setup for multiple Raft groups and verification of key distribution.
main.go
- Removed direct NewRaft call and related imports, transitioning to a multi-Raft group setup.
- Added new command-line flags: raftGroups, shardRanges, and raftRedisMap for configuring sharding.
- Refactored the main execution flow into a run function for better error handling and modularity.
- Introduced helper functions (parseRaftGroups, parseShardRanges, buildEngine, buildLeaderRedis, buildShardGroups, startRaftServers, startRedisServer) to manage the initialization of sharded components.
- Updated gRPC server registration to use ShardStore and ShardedCoordinator, and registered the DistributionServer.
multiraft_runtime.go
- New file: Defines raftGroupRuntime and newRaftGroup for managing the lifecycle and configuration of individual Raft groups in a multi-raft deployment.
shard_config.go
- New file: Defines groupSpec and rangeSpec structs, along with parsing and validation functions for Raft group configurations and shard ranges.

Activity

The pull request introduces a major architectural change by implementing sharding, moving from a single Raft group to a multi-Raft group setup.
Existing coordinator logic has been abstracted into an interface (kv.Coordinator) to support key-aware routing for sharded operations.
New core components (LeaderProxy, ShardStore, ShardedCoordinator, ShardRouter) have been added to manage key distribution, transaction forwarding, and MVCC reads across shards.
The Redis adapter has been updated to integrate with the new sharding logic, ensuring Redis commands are routed to the correct Raft group.
The Jepsen test suite has been extended with new configurations and Docker support to rigorously test the consistency and fault tolerance of the sharded system.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant architectural change by adding sharding support to the system. It refactors the coordinator logic, introducing a ShardedCoordinator, ShardStore, and ShardRouter to manage distributed data and operations across multiple Raft groups. The adapters for gRPC, Redis, and DynamoDB have been updated to be shard-aware. While this is a comprehensive effort to enable horizontal scaling, the review has identified several critical and high-severity issues. These include a major security vulnerability due to a committed private SSH key, hardcoded paths that break environment portability, and multiple instances of stale data reads that could compromise correctness guarantees like linearizability and snapshot isolation. These issues should be addressed before merging.

jepsen/docker/id_rsa

jepsen/docker/ssh_config

kv/leader_proxy.go

adapter/redis.go

kv/shard_store.go

kv/sharded_coordinator.go

Copilot

Pull request overview

This PR refactors the coordinator architecture to support multi-raft group sharding, enabling horizontal scaling through key-range partitioning. The changes introduce a pluggable coordinator interface with two implementations: the original single-group Coordinate and a new ShardedCoordinator that routes operations across multiple raft groups based on configurable key ranges.

Changes:

Introduces sharded coordinator infrastructure with key-range based routing via a new distribution engine
Refactors main.go to support multi-raft group initialization with configurable shard ranges
Updates all adapters (gRPC, Redis, DynamoDB) to use the abstract Coordinator interface with key-aware leader operations
Adds Jepsen test enhancements for sharded deployment validation

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
shard_config.go	Parses and validates raft group and shard range configuration from command-line flags
multiraft_runtime.go	Initializes multiple raft groups with per-group data directories
main.go	Refactored server initialization to build sharded infrastructure and start per-group gRPC servers
kv/sharded_coordinator.go	New coordinator implementation routing operations to shards via distribution engine
kv/shard_store.go	MVCC store wrapper routing reads/writes to appropriate shard stores with leader proxying
kv/shard_router.go	Routes requests to shard-specific transactional managers, now using routeKey normalization
kv/shard_key.go	Normalizes internal keys (list metadata/items) to user keys for routing
kv/leader_proxy.go	Forwards transactional requests to raft leaders when local node is follower
kv/coordinator.go	Extended interface with key-aware leader methods; refactored transaction log building
adapter/grpc.go	Updated to use Coordinator interface with IsLeaderForKey/RaftLeaderForKey
adapter/redis.go	Updated to use Coordinator interface; refactored transaction timestamp logic
adapter/dynamodb.go	Updated to use Coordinator interface
adapter/redis_proxy.go	Updated to use RaftLeaderForKey for key-aware proxying
kv/sharded_integration_test.go	Integration test validating multi-shard coordinator dispatch
kv/shard_router_test.go	New test validating list key routing with routeKey normalization
jepsen/src/elastickv/db.clj	Enhanced to support multi-group raft configuration and validation
jepsen/src/elastickv/redis_workload.clj	Added local mode and sharding configuration options
jepsen/docker/ssh_config	SSH configuration for local Docker-based Jepsen testing
jepsen/docker/run-in-docker.sh	Docker entrypoint script for containerized Jepsen tests
jepsen/docker/id_rsa	Standard Vagrant insecure private key for test environments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kv/shard_store.go

jepsen/docker/id_rsa

jepsen/docker/ssh_config

kv/shard_store.go

jepsen/docker/ssh_config

main.go

shard_config.go

jepsen/src/elastickv/db.clj

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-02-05T13:51:52Z

@bootjp I've opened a new pull request, #305, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-02-05T13:52:15Z

@bootjp I've opened a new pull request, #306, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Added GetIntersectingRoutes method to Engine to find routes whose key ranges intersect with the scan range [start, end). Modified ScanAt in ShardStore to only query groups whose shard ranges intersect with the requested scan range, avoiding unnecessary work and incorrect results. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Fix ScanAt to only query shards with intersecting key ranges

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T17:04:24Z

adapter/grpc.go

-	return resp.Value, nil
+	readTS := req.GetTs()
+	if readTS == 0 {
+		readTS = snapshotTS(r.coordinator.Clock(), r.store)


The RawScanAt method calls r.coordinator.Clock() without checking if r.coordinator is nil. Unlike RawGet which has a nil check on lines 48-50, this function directly calls Clock() on line 94. If coordinator is nil, this will cause a nil pointer dereference panic. Add a nil check similar to RawGet, or use the same defensive pattern with a local clock variable.

Suggested change

readTS = snapshotTS(r.coordinator.Clock(), r.store)

var clock *kv.HLC

if r.coordinator != nil {

clock = r.coordinator.Clock()

}

readTS = snapshotTS(clock, r.store)

Copilot

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T17:14:13Z

main.go

+		if err != nil {
+			for _, rt := range runtimes {
+				rt.Close()
+			}
+			_ = st.Close()
+			if closeStores != nil {
+				closeStores()
+			}
+			return nil, nil, errors.Wrapf(err, "failed to start raft group %d", g.id)


When cleaning up after an error in buildShardGroups, the function closes runtimes and the current st/closeStores, but it doesn't close the transport managers (tm) that were already created and registered in the runtimes. The Close method on raftGroupRuntime will handle tm.Close(), but since we're iterating and calling rt.Close() before the current failed runtime is added to the list, the cleanup is correct. However, for clarity and robustness, consider also closing tm explicitly in the error path before returning, in case the raftGroupRuntime.Close() implementation changes in the future.

Copilot · 2026-02-14T17:14:14Z

kv/sharded_integration_test.go

+func TestShardedCoordinatorDispatch(t *testing.T) {
+	ctx := context.Background()
+
+	engine := distribution.NewEngine()
+	engine.UpdateRoute([]byte("a"), []byte("m"), 1)
+	engine.UpdateRoute([]byte("m"), nil, 2)
+
+	s1 := store.NewMVCCStore()
+	r1, stop1 := newSingleRaft(t, "g1", NewKvFSM(s1))
+	defer stop1()
+
+	s2 := store.NewMVCCStore()
+	r2, stop2 := newSingleRaft(t, "g2", NewKvFSM(s2))
+	defer stop2()
+
+	groups := map[uint64]*ShardGroup{
+		1: {Raft: r1, Store: s1, Txn: NewLeaderProxy(r1)},
+		2: {Raft: r2, Store: s2, Txn: NewLeaderProxy(r2)},
+	}
+
+	shardStore := NewShardStore(engine, groups)
+	coord := NewShardedCoordinator(engine, groups, 1, NewHLC(), shardStore)
+
+	ops := &OperationGroup[OP]{
+		IsTxn: false,
+		Elems: []*Elem[OP]{
+			{Op: Put, Key: []byte("b"), Value: []byte("v1")},
+			{Op: Put, Key: []byte("x"), Value: []byte("v2")},
+		},
+	}
+	if _, err := coord.Dispatch(ctx, ops); err != nil {
+		t.Fatalf("dispatch: %v", err)
+	}
+
+	readTS := shardStore.LastCommitTS()
+	v, err := shardStore.GetAt(ctx, []byte("b"), readTS)
+	if err != nil || string(v) != "v1" {
+		t.Fatalf("get b: %v %v", v, err)
+	}
+	v, err = shardStore.GetAt(ctx, []byte("x"), readTS)
+	if err != nil || string(v) != "v2" {
+		t.Fatalf("get x: %v %v", v, err)
+	}
+
+	if _, err := s1.GetAt(ctx, []byte("x"), readTS); !errors.Is(err, store.ErrKeyNotFound) {
+		t.Fatalf("expected key x missing in group1, got %v", err)
+	}
+	if _, err := s2.GetAt(ctx, []byte("b"), readTS); !errors.Is(err, store.ErrKeyNotFound) {
+		t.Fatalf("expected key b missing in group2, got %v", err)
+	}
+}


The test only validates non-transactional cross-shard writes. Consider adding a test case that verifies cross-shard transactions are properly rejected with ErrCrossShardTransactionNotSupported when IsTxn=true and operations span multiple shards. This would validate the critical constraint documented in lines 225-231 of sharded_coordinator.go.

bootjp · 2026-02-14T17:48:22Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and well-architected refactoring to add sharded support to the key-value store. The changes are extensive, touching coordination, storage, networking, and testing. Key additions include a sharded coordinator, a key-range-based routing engine, and leader-aware stores to ensure read consistency. The explicit rejection of cross-shard transactions is a sensible choice for managing complexity. The introduction of a gRPC connection cache is a good performance optimization. The test suite has also been impressively updated to cover the new sharded architecture.

My review focuses on a few areas for improvement: ensuring proper resource cleanup in the gRPC server, enhancing context propagation in the Redis adapter for better request lifecycle management, and correcting a potential timestamp generation issue in the sharded coordinator to guarantee transaction isolation.

gemini-code-assist · 2026-02-14T17:51:14Z

adapter/grpc.go

+func (r *GRPCServer) Close() error {
+	return nil
+}


The Close method is a no-op, but the GRPCServer holds a store.MVCCStore which, in practice, is a *kv.LeaderRoutedStore. This store contains a GRPCConnCache that manages gRPC connections. Without closing the store, these connections will be leaked. The Close method should delegate to r.store.Close() to ensure proper resource cleanup.

func (r *GRPCServer) Close() error { if r == nil || r.store == nil { return nil } return r.store.Close() }

gemini-code-assist · 2026-02-14T17:51:14Z

kv/sharded_coordinator.go

+	if c.clock == nil {
+		return maxTS, nil
+	}


In the case where c.clock is nil, this function returns maxTS. This could result in a new transaction starting with a timestamp equal to a previous transaction's commit timestamp, which violates the strict requirement for start timestamps to be greater than any observed commit timestamps and could break snapshot isolation. To ensure correctness, the returned timestamp should be at least maxTS + 1.

Suggested change

if c.clock == nil {

return maxTS, nil

}

if c.clock == nil {

return maxTS + 1, nil

}

gemini-code-assist · 2026-02-14T17:51:14Z

adapter/redis.go

 	}

-	_, err = r.coordinator.Dispatch(res)
+	_, err = r.coordinator.Dispatch(context.Background(), res)


The coordinator.Dispatch method is called with context.Background(). This prevents request cancellation, deadline propagation, and tracing from the client through the Redis adapter. While the redcon library may not directly provide a request-scoped context, using a context with a timeout would make the system more robust against long-running or stuck operations. Consider wrapping context.Background() with a timeout.

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) // Timeout can be made configurable. defer cancel() _, err = r.coordinator.Dispatch(ctx, res)

Copilot

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T17:56:40Z

kv/coordinator.go

+	if ctx == nil {
+		ctx = context.Background()
+	}
+


Coordinate.Dispatch dereferences reqs (reqs.IsTxn / reqs.Elems) without validating that reqs is non-nil and contains at least one non-nil element. A malformed request can panic the server; please return ErrInvalidRequest for nil/empty (and possibly nil elems) similar to ShardedCoordinator.validateOperationGroup().

Suggested change

// Validate the request before any use to avoid panics on malformed input.

if reqs == nil {

return nil, ErrInvalidRequest

}

if len(reqs.Elems) == 0 {

return nil, ErrInvalidRequest

}

// Ensure there is at least one non-nil element, similar to ShardedCoordinator.validateOperationGroup().

hasNonNil := false

for _, op := range reqs.Elems {

if op != nil {

hasNonNil = true

break

}

}

if !hasNonNil {

return nil, ErrInvalidRequest

}

Copilot · 2026-02-14T17:56:41Z

kv/shard_router.go

 	for _, r := range reqs {
 		if len(r.Mutations) == 0 {
 			return nil, ErrInvalidRequest
 		}
-		key := r.Mutations[0].Key
+		key := routeKey(r.Mutations[0].Key)


ShardRouter.groupRequests iterates over reqs and assumes each *pb.Request is non-nil. If a nil request is ever passed (e.g., via a buggy forwarder/client), this will panic when accessing r.Mutations. Please defensively check r == nil and return ErrInvalidRequest instead of panicking.

bootjp · 2026-02-14T18:29:49Z

/gemini review

gemini-code-assist

Code Review

This pull request is a significant and well-executed refactoring to introduce sharding support. It refactors the coordinator logic, introduces a sharded store and coordinator, and moves routing logic into the appropriate layers. The changes also include critical correctness fixes for transaction timestamping in an MVCC system and performance improvements like gRPC connection caching. The overall design is solid and the new abstractions are clean. I have one suggestion to improve the consistency and safety of handling transaction logs in the new sharded coordinator.

kv/sharded_coordinator.go

Copilot

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T18:35:11Z

proto/service.proto

+  uint64 ts = 4; // optional read timestamp; if zero, server uses current HLC
+}
+
+message RawKvPair {


The message name RawKvPair uses inconsistent casing (Kv instead of KV) compared to other messages in this file like RawKV service and existing Kv message. Consider renaming to RawKVPair for consistency with protobuf naming conventions where acronyms are typically treated as single words (e.g., RawKV, not RawKv).

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-14T19:09:57Z

@bootjp I've opened a new pull request, #312, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Rename RawKvPair to RawKVPair for protobuf naming consistency

bootjp · 2026-02-14T19:53:46Z

/gemini review

gemini-code-assist

Code Review

This is a major and well-executed refactoring to introduce sharding support. The changes are extensive, introducing several new components like ShardedCoordinator, ShardRouter, and ShardStore to create a clear separation of concerns for key-based routing, leader forwarding, and transaction coordination. The correctness of transaction timestamp generation is also improved, and performance is enhanced through connection caching. Overall, this is a significant and impressive improvement to the architecture.

I've found one critical issue related to transaction correctness in the Redis adapter, which violates the principle of consistent snapshot views in MVCC transactions, and one high-severity issue regarding a potential resource leak. Please see the detailed comments.

I am having trouble creating individual review comments. Click here to see my feedback.

adapter/redis.go (437-444)

The removal of the leader-proxying logic in exec seems incorrect. The new runTransaction implementation relies on r.store to fetch the latest commit timestamps for keys in the transaction. However, the store provided to RedisServer is the raw local MVCCStore, not a leader-aware store wrapper like LeaderRoutedStore or ShardStore.

This means if a Redis EXEC command is received by a follower node, runTransaction will read potentially stale data from its local store to determine the transaction's start timestamp, breaking snapshot isolation. The transaction might then be dispatched to the leader and commit, but it was based on a stale view of the data.

The previous implementation correctly proxied the entire EXEC command to the leader. This logic should be restored, or the store used by RedisServer must be made leader-aware to correctly route the LatestCommitTS lookups.

References

In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.

Copilot

Pull request overview

Copilot reviewed 42 out of 43 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T20:02:52Z

kv/sharded_coordinator.go

+		logs = append(logs,
+			&pb.Request{IsTxn: true, Phase: pb.Phase_PREPARE, Ts: startTS, Mutations: muts},
+			&pb.Request{IsTxn: true, Phase: pb.Phase_COMMIT, Ts: startTS, Mutations: muts},


In buildTxnLogs(), the same muts slice is reused for both the PREPARE and COMMIT requests. In coordinator.go, txnRequests() explicitly copies the mutations slice for each phase to avoid any accidental shared-slice side effects if later code appends/modifies mutations. Consider copying muts per phase here too for consistency and future safety.

Suggested change

logs = append(logs,

&pb.Request{IsTxn: true, Phase: pb.Phase_PREPARE, Ts: startTS, Mutations: muts},

&pb.Request{IsTxn: true, Phase: pb.Phase_COMMIT, Ts: startTS, Mutations: muts},

// Copy mutations per phase to avoid shared-slice side effects between phases.

prepMuts := append([]*pb.Mutation(nil), muts...)

commitMuts := append([]*pb.Mutation(nil), muts...)

logs = append(logs,

&pb.Request{IsTxn: true, Phase: pb.Phase_PREPARE, Ts: startTS, Mutations: prepMuts},

&pb.Request{IsTxn: true, Phase: pb.Phase_COMMIT, Ts: startTS, Mutations: commitMuts},

Refactor coordinator logic and add sharded support

7eb2044

Copilot AI review requested due to automatic review settings February 5, 2026 12:54

Copilot started reviewing on behalf of bootjp February 5, 2026 12:55 View session

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

Copilot AI reviewed Feb 5, 2026

View reviewed changes

bootjp and others added 6 commits February 5, 2026 22:22

Refactor SSH key handling and update .gitignore

7916d7d

Fix go build command in Dockerfile

454b342

Refactor error messages into package-level variables

607edc7

plaintextFix alignment of error variable declarations

a081a09

Update jepsen/src/elastickv/db.clj

122eff4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Initial plan

f19cf76

Copilot AI mentioned this pull request Feb 5, 2026

Fix ScanAt to only query shards with intersecting key ranges #305

Merged

Initial plan

4ea2ac6

Copilot AI mentioned this pull request Feb 5, 2026

Fix resource leaks in multiraft_runtime.go error paths #306

Merged

9 tasks

bootjp and others added 13 commits February 5, 2026 22:52

Update main.go

13d2237

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update kv/shard_store.go

bcde8d2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add .cache/ to .gitignore and remove from repository

a35bc1a

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Fix resource leaks in newRaftGroup error paths

18933b5

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Add cache directories to .gitignore

4b9881b

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Refactor cleanup logic to use helper function

ec0e6a5

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Remove accidentally committed cache files

c8c2d9f

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Merge branch 'feature/multi-raft' into copilot/sub-pr-304

2d344c9

Merge branch 'feature/multi-raft' into copilot/sub-pr-304-again

f3f2f58

Merge pull request #305 from bootjp/copilot/sub-pr-304

5e43d88

Fix ScanAt to only query shards with intersecting key ranges

Merge branch 'feature/multi-raft' into copilot/sub-pr-304-again

61712fb

Update multiraft_runtime.go

b989eab

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Feb 14, 2026

View reviewed changes

bootjp force-pushed the feature/multi-raft branch from e8d22c4 to f79a0de Compare February 14, 2026 17:05

bootjp requested a review from Copilot February 14, 2026 17:07

Copilot started reviewing on behalf of bootjp February 14, 2026 17:07 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

bootjp added 2 commits February 15, 2026 02:24

kv: address latest review feedback

4a4ddaf

test: cover cross-shard txn rejection and harden cleanup

7ce2245

bootjp requested a review from Copilot February 14, 2026 17:47

Copilot started reviewing on behalf of bootjp February 14, 2026 17:48 View session

gemini-code-assist bot reviewed Feb 14, 2026

View reviewed changes

Copilot AI reviewed Feb 14, 2026

View reviewed changes

kv/adapter: harden request validation and cleanup

5770096

bootjp requested a review from Copilot February 14, 2026 18:29

Copilot started reviewing on behalf of bootjp February 14, 2026 18:30 View session

gemini-code-assist bot reviewed Feb 14, 2026

View reviewed changes

kv/sharded_coordinator.go Show resolved Hide resolved

Copilot AI reviewed Feb 14, 2026

View reviewed changes

Initial plan

9501517

Copilot AI mentioned this pull request Feb 14, 2026

Rename RawKvPair to RawKVPair for protobuf naming consistency #312

Merged

Copilot AI and others added 2 commits February 14, 2026 19:14

Rename RawKvPair to RawKVPair for consistent naming

ddd51ad

Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Merge pull request #312 from bootjp/copilot/sub-pr-304

873c0c9

Rename RawKvPair to RawKVPair for protobuf naming consistency

bootjp requested a review from Copilot February 14, 2026 19:53

Copilot started reviewing on behalf of bootjp February 14, 2026 19:54 View session

gemini-code-assist bot reviewed Feb 14, 2026

View reviewed changes

Copilot AI reviewed Feb 14, 2026

View reviewed changes

bootjp merged commit 1b134ed into main Feb 15, 2026
13 checks passed

bootjp deleted the feature/multi-raft branch February 15, 2026 05:12

-		readTS = snapshotTS(r.coordinator.Clock(), r.store)
+		var clock *kv.HLC
+		if r.coordinator != nil {
+			clock = r.coordinator.Clock()
+		}
+		readTS = snapshotTS(clock, r.store)

+	// Validate the request before any use to avoid panics on malformed input.
+	if reqs == nil {
+		return nil, ErrInvalidRequest
+	}
+	if len(reqs.Elems) == 0 {
+		return nil, ErrInvalidRequest
+	}
+	// Ensure there is at least one non-nil element, similar to ShardedCoordinator.validateOperationGroup().
+	hasNonNil := false
+	for _, op := range reqs.Elems {
+		if op != nil {
+			hasNonNil = true
+			break
+		}
+	}
+	if !hasNonNil {
+		return nil, ErrInvalidRequest
+	}

-		logs = append(logs,
-			&pb.Request{IsTxn: true, Phase: pb.Phase_PREPARE, Ts: startTS, Mutations: muts},
-			&pb.Request{IsTxn: true, Phase: pb.Phase_COMMIT, Ts: startTS, Mutations: muts},
+		// Copy mutations per phase to avoid shared-slice side effects between phases.
+		prepMuts := append([]*pb.Mutation(nil), muts...)
+		commitMuts := append([]*pb.Mutation(nil), muts...)
+		logs = append(logs,
+			&pb.Request{IsTxn: true, Phase: pb.Phase_PREPARE, Ts: startTS, Mutations: prepMuts},
+			&pb.Request{IsTxn: true, Phase: pb.Phase_COMMIT, Ts: startTS, Mutations: commitMuts},

Conversation

bootjp commented Feb 5, 2026

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Feb 5, 2026

Uh oh!

Copilot AI commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Feb 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Feb 14, 2026