Skip to content

[DO NOT MERGRE] Re-use p2p-client#3065

Draft
alpe wants to merge 4 commits intomainfrom
alex/p2p_exec_ai1
Draft

[DO NOT MERGRE] Re-use p2p-client#3065
alpe wants to merge 4 commits intomainfrom
alex/p2p_exec_ai1

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Feb 12, 2026

Overview

@github-actions
Copy link
Contributor

github-actions bot commented Feb 12, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 13, 2026, 12:58 PM

@claude
Copy link
Contributor

claude bot commented Feb 12, 2026

Claude finished @alpe's task in 2m 17s —— View job


Comprehensive Code Review

I've completed a thorough review of this P2P client refactoring PR. Below are my findings organized by category.


🏗️ Architecture & Design

Strengths

  1. Centralized P2P Client Lifecycle - The refactoring successfully centralizes P2P client management at the FullNode/LightNode level, eliminating redundant initialization across mode switches. This is a solid architectural improvement.

  2. Interface Abstraction (P2PClient interface in pkg/sync/sync_service.go:40-47) - Good decoupling that makes sync services testable and reduces direct dependencies on concrete P2P implementation.

  3. Concurrent Sync Service Startup (node/failover.go:192-208) - Using errgroup.WithContext to start header and data sync services concurrently is an effective optimization to reduce cumulative startup delays.

⚠️ Issues & Concerns

  1. Missing Reconfigure() Implementation - The PR description and Gemini review mention a Reconfigure() method on p2p.Client to update configuration without restarting. However, I cannot find this method in pkg/p2p/client.go. This is a critical gap:

    • The current code has no mechanism to reconfigure the P2P client during mode switches
    • There's no Reconfigure(conf config.P2PConfig) method as claimed
    • Fix this →
  2. Misleading Comment in node/failover.go - Gemini's review correctly identified an inaccurate comment. There's no call to Reconfigure() in setupFailoverState. Comments should accurately reflect the code.

  3. P2P Client Start/Stop in LightNode (node/light.go:123-125, 163-165) - The LightNode still calls P2P.Start() and P2P.Close() within its Run() method, unlike FullNode which manages this at a higher level. This inconsistency creates an asymmetry:

    • FullNode: P2P client lifecycle managed in FullNode.Run (lines 285-292 in node/full.go)
    • LightNode: P2P client lifecycle managed in LightNode.Run (lines 123-125, 163-165 in node/light.go)
    • This inconsistency could lead to confusion and makes the refactoring incomplete.
    • Fix this →

🔒 Concurrency & Thread Safety

Strengths

  1. Proper Error Aggregation (pkg/p2p/client.go:177-187) - Using errors.Join in Close() ensures all cleanup errors are captured and reported.

  2. Concurrent Sync Startup with Proper Error Handling (node/failover.go:192-211) - The errgroup pattern correctly propagates errors from either sync service startup failure and provides proper cleanup via defer statements.

⚠️ Issues & Concerns

  1. Race Condition in p2p.Client.Start() (pkg/p2p/client.go:124-139):

    func (c *Client) Start(ctx context.Context) error {
        if c.started {  // Line 125 - READ without lock
            return nil
        }
        // ...
        c.started = true  // Line 172 - WRITE without lock
        return nil
    }
    • The started field is accessed without synchronization
    • Concurrent calls to Start() could lead to race conditions
    • This is a data race that could cause undefined behavior
    • Should use sync.atomic.Bool or mutex protection
    • Fix this →
  2. Unsafe rawHost Field Access (pkg/p2p/client.go) - The new rawHost field (line 58 in the struct) is set in Start() but not accessed anywhere in the code. If it's intended for future use, concurrent access patterns need to be considered.


🛡️ Error Handling & Resilience

Strengths

  1. Robust Pending Block Recovery (block/internal/executing/pending.go) - The new file provides crash recovery for blocks that were partially saved. This prevents block production from getting stuck after a crash.

  2. Proper Context Propagation - Throughout the codebase, context.Context is consistently threaded through for cancellation and timeout management.

⚠️ Issues & Concerns

  1. Silent Error in Pending Block Deletion (block/internal/executing/executor.go:529-531):
if err := e.deletePendingBlock(e.ctx, batch); err != nil {
    e.logger.Warn().Err(err).Uint64("height", newHeight).Msg("failed to delete pending block metadata")
}
  • Failure to delete pending metadata is logged but not returned
  • This could lead to accumulation of stale pending blocks
  • Consider making this a hard error or implementing a cleanup mechanism
  • Fix this →
  1. No Validation of Pending Block Height (block/internal/executing/executor.go:432-436):
pendingHeader, pendingData, err := e.getPendingBlock(ctx)
if err == nil && pendingHeader != nil && pendingHeader.Height() == newHeight {
    // use pending block
}
  • If pending block height doesn't match newHeight, it's silently ignored
  • No warning or cleanup for mismatched pending blocks
  • Could indicate a serious state inconsistency that should be investigated
  • Fix this →
  1. Missing Height Guard Implementation - While the PR description mentions "height guards in HeaderStoreGetter and DataStoreGetter", I don't see the actual implementation in pkg/store/store_adapter.go:802-805 and 872-875. The HasAt() methods directly call GetHeader() and GetBlockData() without checking against committed height. This could expose uncommitted data via P2P.

🧪 Testing & Coverage

⚠️ Issues & Concerns

  1. 46.23% Patch Coverage - Codecov reports 50 lines missing coverage in changes:
  • block/internal/executing/pending.go: 14 missing, 11 partials (39% coverage)
  • node/failover.go: 3 missing, 3 partials (60% coverage)
  • block/internal/executing/executor.go: 1 missing, 3 partials (33% coverage)
  • pkg/cmd/run_node.go: 4 missing (0% coverage)

Critical gaps:

  • Pending block recovery logic is undertested - only 39% coverage for the new crash recovery mechanism
  • No tests for concurrent sync service startup in failover.go
  • P2P client lifecycle changes lack integration tests
  1. Missing Test for Height Guard - The PR adds TestHeaderStoreGetter_HeightGuard but I don't see this test in the file content I read. Need to verify it's properly testing the committed height check.

  2. No Tests for P2P Client Reuse Across Mode Switches - The core benefit of this PR (P2P client persistence) doesn't appear to have dedicated tests verifying:

  • P2P client survives aggregator→sync mode switch
  • P2P client survives sync→aggregator mode switch
  • DHT and PubSub state is preserved

🔐 Security

⚠️ Issues & Concerns

  1. Potential Data Leakage via P2P - The height guard implementation is critical for security. Without proper guards:
  • Unsigned blocks could be served during executor's early save (before commit)
  • Uncommitted state could leak to P2P peers
  • This violates the consistency model where only committed blocks should be visible
  • The current HasAt() implementations in HeaderStoreGetter and DataStoreGetter don't appear to implement these guards correctly
  1. No Validation of P2P Client Pointer - In multiple places (node/failover.go:48, 82), the p2pClient *p2p.Client is passed but never checked for nil before use. While the caller should ensure it's non-nil, defensive checks would prevent potential nil pointer dereferences.

📊 Performance

Strengths

  1. Reduced Startup Overhead - Concurrent sync service startup (lines 192-208 in node/failover.go) is a measurable improvement that reduces blocking time during initialization.

  2. Eliminated P2P Stack Re-initialization - By reusing the P2P client across mode switches, the expensive libp2p host, DHT, and PubSub setup is avoided.

⚠️ Issues & Concerns

  1. No Cleanup of Old P2P Connections - When switching modes (e.g., from sync to aggregator), there's no mechanism to clean up P2P connections that may no longer be relevant. The P2P client persists with all its existing connections and subscriptions.

📝 Code Quality & Maintainability

⚠️ Issues & Concerns

  1. Inconsistent Error Handling - Multiple patterns:
  • errors.Join in some places (pkg/p2p/client.go:180)
  • fmt.Errorf wrapping in others
  • Silent logging in places (executor.go:530)
  • Consider standardizing on errors.Join for multi-error scenarios
  1. Magic Constants - block/internal/executing/pending.go:14-16 uses string constants "pending_header" and "pending_data" without explaining their purpose or scope. Consider adding documentation about their lifecycle.

  2. Unused rawHost Field - Added to Client struct but never used. Either use it or remove it.


🎯 Critical Issues Summary

Must Fix Before Merge

  1. Missing Reconfigure() method - Core feature mentioned in PR description is not implemented
  2. Race condition in Client.Start() - Data race on started field
  3. Height guards not properly implemented - Security risk of serving uncommitted data
  4. FullNode/LightNode P2P lifecycle inconsistency - Incomplete refactoring

Should Fix Before Merge

  1. Pending block recovery is undertested (39% coverage)
  2. No tests for P2P client reuse across mode switches
  3. Mismatched pending block heights not handled with warnings/cleanup
  4. Silent pending block deletion failures could accumulate stale data

Positive Takeaways

  • The architectural direction is sound - centralizing P2P client management is the right approach
  • Error aggregation with errors.Join is properly applied
  • Concurrent sync service startup is a good optimization
  • Pending block recovery mechanism addresses a real crash recovery scenario

📌 Recommendations

  1. Implement the missing Reconfigure() method or update PR description to reflect actual implementation
  2. Fix the race condition in Client.Start() using atomic.Bool
  3. Complete the height guard implementation to prevent serving uncommitted blocks
  4. Add comprehensive integration tests for P2P client lifecycle across mode switches
  5. Unify P2P client lifecycle management between FullNode and LightNode
  6. Increase test coverage for pending block recovery to at least 70%
  7. Document the PR as WIP (already marked with "[DO NOT MERGRE]") until critical issues are resolved

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the P2P client's lifecycle management within the node architecture. By centralizing the P2P client's creation and making it persistent across node mode switches, it aims to improve startup efficiency and reduce resource overhead. Additionally, it introduces concurrent initialization for synchronization services and strengthens data consistency checks in the store getters, ensuring that only fully committed data is exposed via the P2P layer.

Highlights

  • Centralized P2P Client Management: The P2P client is now instantiated once at the top-level FullNode or LightNode creation and passed down to sub-components. This ensures the P2P client's lifecycle is managed independently of node mode switches (e.g., between aggregator and sync modes), preventing costly teardown and re-initialization of the libp2p stack.
  • Decoupled P2P Client Lifecycle: The p2p.Client now includes a Reconfigure method, allowing its internal configuration to be updated without restarting the underlying libp2p host, PubSub, or DHT. This is crucial for seamless transitions between node operational modes.
  • Concurrent Sync Service Startup: The header and data synchronization services in failover.go now start concurrently using errgroup.WithContext. This optimization reduces cumulative startup delays, especially in scenarios where peers might initially lack blocks.
  • Enhanced Header and Data Store Guards: New height guards have been implemented in HeaderStoreGetter and DataStoreGetter. These guards ensure that only headers and data at or below the committed store height are returned, preventing the P2P layer from serving unsigned or uncommitted data (e.g., from an executor's early save during crash recovery).
  • Refactored Node Constructors: Node constructors (newFullNode, newLightNode, newSyncMode, newAggregatorMode) and related test helpers have been updated to accept an initialized p2p.Client or crypto.PrivKey directly, simplifying their signatures and reinforcing the centralized P2P client management.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • apps/testapp/go.mod
    • Uncommented the local replace directive for github.com/evstack/ev-node.
  • apps/testapp/go.sum
    • Removed specific github.com/evstack/ev-node entries, reflecting dependency changes.
  • block/components_test.go
    • Added clarifying comments for nil arguments in test component creation functions.
  • node/failover.go
    • Removed unused imports for p2p/key and go-datastore.
    • Updated newSyncMode and newAggregatorMode to accept a p2p.Client instead of nodeKey, rootDB, and daStore.
    • Modified setupFailoverState to receive p2p.Client and use rktStore for sync services.
    • Removed p2pClient.Start() and p2pClient.Close() from failoverState.Run as the client is now managed at a higher level.
    • Implemented concurrent startup for header and data sync services in failoverState.Run.
  • node/full.go
    • Updated import from p2p/key to p2p.
    • Added p2pClient field to the FullNode struct.
    • Modified newFullNode to accept p2p.Client instead of nodeKey.
    • Updated leaderFactory and followerFactory to pass the p2pClient to mode constructors.
    • Added logic to FullNode.Run to start and close the p2pClient once for the node's entire lifecycle.
  • node/helpers_test.go
    • Updated imports to use github.com/evstack/ev-node/pkg/p2p and github.com/libp2p/go-libp2p/core/crypto.
    • Modified createTestComponents to return crypto.PrivKey directly and removed key.NodeKey struct creation.
    • Updated newTestNode and related test helper functions to handle crypto.PrivKey and p2p.Client parameters.
    • Added newTestP2PClient helper function for creating P2P clients in tests.
  • node/light.go
    • Removed unused import for p2p/key.
    • Modified newLightNode to accept p2p.Client instead of nodeKey.
    • Removed p2p.NewClient call from newLightNode.
    • Updated error handling for ln.P2P.Close() in LightNode.Run.
  • node/light_test.go
    • Added import for github.com/evstack/ev-node/pkg/p2p.
    • Modified TestLightNodeLifecycle to explicitly create and pass a p2p.Client.
  • node/node.go
    • Updated import from p2p/key to p2p.
    • Modified NewNode to accept p2p.Client instead of nodeKey.
    • Updated calls to newLightNode and newFullNode to pass the p2pClient.
  • node/single_sequencer_integration_test.go
    • Updated calls to createTestComponents and createNodeWithCustomComponents to align with privKey parameter changes.
  • pkg/cmd/run_node.go
    • Added import for github.com/evstack/ev-node/pkg/p2p.
    • Instantiated p2p.NewClient before node.NewNode and passed the client to the node creation function.
  • pkg/cmd/run_node_test.go
    • Added import for github.com/stretchr/testify/require.
    • Modified createTestComponents to generate and return a dummy nodeKey.
  • pkg/p2p/client.go
    • Added rawHost and started fields to Client struct for internal state management.
    • Modified Start method to prevent re-starting and track the rawHost.
    • Updated Close method to use errors.Join for aggregating errors.
    • Added PrivKey() method to expose the node's private key.
    • Added Reconfigure(conf config.P2PConfig) method for in-place configuration updates.
  • pkg/p2p/utils_test.go
    • Removed errors import and modified testNet.Close() to ignore errors for test simplicity.
  • pkg/store/header_store_adapter_test.go
    • Updated context.Background() to t.Context() in TestHeaderStoreAdapter_GetFromPendingByHash.
    • Added TestHeaderStoreGetter_HeightGuard to verify committed height checks.
  • pkg/store/store_adapter.go
    • Implemented height guards in HeaderStoreGetter.GetByHeight and HasAt to ensure data consistency.
    • Implemented height guards in DataStoreGetter.GetByHeight and HasAt for similar data consistency checks.
  • pkg/sync/sync_service.go
    • Removed direct import of github.com/evstack/ev-node/pkg/p2p.
    • Introduced P2PClient interface to abstract P2P client dependencies.
    • Updated SyncService and its constructors (NewDataSyncService, NewHeaderSyncService, newSyncService) to use the P2PClient interface.
  • test/e2e/failover_e2e_test.go
    • Added a Kill() call for old processes in TestHASequencerRollingRestartE2E to ensure clean state before restarting nodes.
Activity
  • The pull request title indicates '[DO NOT MERGRE]', suggesting this is a work-in-progress or experimental branch not yet ready for merging.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 46.23656% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.80%. Comparing base (6a7c84c) to head (09f4a76).

Files with missing lines Patch % Lines
block/internal/executing/pending.go 39.02% 14 Missing and 11 partials ⚠️
node/failover.go 60.00% 3 Missing and 3 partials ⚠️
block/internal/executing/executor.go 33.33% 1 Missing and 3 partials ⚠️
node/full.go 50.00% 2 Missing and 2 partials ⚠️
pkg/cmd/run_node.go 0.00% 4 Missing ⚠️
node/light.go 0.00% 1 Missing and 1 partial ⚠️
pkg/p2p/client.go 75.00% 1 Missing and 1 partial ⚠️
pkg/store/store.go 66.66% 1 Missing and 1 partial ⚠️
node/node.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3065      +/-   ##
==========================================
- Coverage   61.89%   61.80%   -0.10%     
==========================================
  Files         111      112       +1     
  Lines       11113    11160      +47     
==========================================
+ Hits         6878     6897      +19     
- Misses       3496     3513      +17     
- Partials      739      750      +11     
Flag Coverage Δ
combined 61.80% <46.23%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the P2P client lifecycle and integration within the node architecture. The primary change involves centralizing the creation and management of the P2P client in the FullNode and LightNode constructors, passing an already initialized p2p.Client instance rather than a nodeKey for internal P2P client creation. This ensures the P2P client is long-lived and persists across mode switches (e.g., between aggregator and sync modes), avoiding costly re-initialization of the libp2p stack. Consequently, imports related to p2p/key and ipfs/go-datastore were removed from several files, and failover.go now runs header and data sync services concurrently to improve startup performance. Test helper functions were updated to reflect these changes, now returning crypto.PrivKey and creating p2p.Client instances. Additionally, pkg/store/store_adapter.go was updated to include a height guard in HeaderStoreGetter and DataStoreGetter to prevent serving uncommitted or future block data, and a new test TestHeaderStoreGetter_HeightGuard was added to verify this behavior. A review comment highlighted an inaccuracy in node/failover.go regarding the Reconfigure() call, which should be addressed for clarity.

node/failover.go Outdated
}
defer f.p2pClient.Close() // nolint: errcheck
// P2P client persists across mode switches (started/closed by FullNode.Run).
// Reconfigure() was already called in setupFailoverState to re-bootstrap DHT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment states that Reconfigure() is called in setupFailoverState, but this call does not appear to be present in the setupFailoverState function. This is misleading and could cause confusion about how the P2P client is reconfigured during mode switches. Please remove or update this comment to accurately reflect the code's behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant