-
Notifications
You must be signed in to change notification settings - Fork 249
feat: block Pruning #2984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: block Pruning #2984
Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @pthmas's task in 1m 58s —— View job Comprehensive Code Review: Block Pruning ImplementationI've completed a thorough review of the block pruning implementation. Overall, this is a well-designed feature with good architectural decisions. Below is my detailed feedback organized by category. 1. Code Quality & ArchitectureStrengths✅ Clean separation of concerns: Pruning is correctly isolated in three layers:
✅ Idempotent design: The use of ✅ Monotonic progress tracking: ✅ Optional interface pattern: The ✅ Tail optimization: Using Issues to Address🔴 Critical: Data race in pruneBlocks (block/internal/submitting/submitter.go:374-413) Recommendation: Add a check at the beginning of func (s *Submitter) pruneBlocks() {
select {
case <-s.ctx.Done():
return // Already stopped
default:
}
if !s.config.Node.PruningEnabled || ...🟡 Missing bounds validation in PruneBlocks (pkg/store/store.go:341-426) Recommendation: Add validation: func (s *DefaultStore) PruneBlocks(ctx context.Context, height uint64) error {
currentHeight, err := s.Height(ctx)
if err != nil {
return fmt.Errorf("failed to get current height: %w", err)
}
if height > currentHeight {
return fmt.Errorf("cannot prune beyond current height %d: requested %d", currentHeight, height)
}
// ... rest of function
}🟡 Inconsistent error handling style (pkg/store/store.go:378-413)
Recommendation: Use consistent pattern throughout for better maintainability. 2. Error Handling & ResilienceStrengths✅ Best-effort approach: Logging errors without halting is appropriate for pruning (submitter.go:405, 410). ✅ Idempotency: The implementation correctly handles partial failures and allows retry. ✅ Graceful degradation: If pruning fails, the system continues to function (disk usage grows but data integrity is maintained). Issues to Address🔴 Silent failure mode (block/internal/submitting/submitter.go:405-411) Recommendation: Add metrics for pruning failures: if err := s.store.PruneBlocks(s.ctx, targetHeight); err != nil {
s.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
s.metrics.PruningFailures.Inc() // Add this metric
}🟡 No validation of pruning configuration (pkg/config/config.go:388-395) Recommendation: Add cross-field validation: if c.Node.PruningEnabled {
if c.Node.PruningInterval == 0 {
return fmt.Errorf("pruning_interval must be >= 1 when pruning is enabled")
}
if c.Node.PruningKeepRecent == 0 {
return fmt.Errorf("pruning_keep_recent must be > 0 when pruning is enabled")
}
if c.Node.PruningKeepRecent < c.Node.PruningInterval {
return fmt.Errorf("pruning_keep_recent (%d) should be >= pruning_interval (%d) for effective pruning",
c.Node.PruningKeepRecent, c.Node.PruningInterval)
}
}🟡 Missing context cancellation check in PruneBlocks (pkg/store/store.go:367-414) Recommendation: Add periodic cancellation checks: for h := lastPruned + 1; h <= height; h++ {
// Check cancellation every 100 heights or so
if h%100 == 0 {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
}
// ... rest of loop
}3. SecurityAssessment✅ No security issues identified
4. Performance & Resource EfficiencyStrengths✅ Batched deletion: Using ✅ Interval-based pruning: Only pruning every N blocks prevents excessive overhead (submitter.go:391-393). ✅ Cache invalidation: ✅ Tail optimization: Skipping fully-pruned ranges dramatically improves tail lookups after long-running pruning. Issues to Address🟡 Linear scan for missing headers (pkg/store/store.go:370-376) Minor optimization suggestion: Track which heights were successfully pruned to avoid repeated lookups, though current approach is simple and correct. 🟡 Cache invalidation is overly broad (pkg/store/cached_store.go:168) Recommendation: func (cs *CachedStore) PruneBlocks(ctx context.Context, height uint64) error {
lastPruned := uint64(0)
if meta, err := cs.GetMetadata(ctx, LastPrunedBlockHeightKey); err == nil && len(meta) == heightLength {
lastPruned, _ = decodeHeight(meta)
}
if err := cs.Store.PruneBlocks(ctx, height); err != nil {
return err
}
// Only invalidate newly pruned range
cs.InvalidateRange(lastPruned+1, height)
return nil
}5. Testing & ReliabilityTest Coverage AnalysisBased on codecov report (23.58% patch coverage with 81 lines missing): ✅ Good coverage for:
🔴 Critical gaps (from codecov):
Recommendations for Test Coverage🔴 High Priority - Add integration tests: // Test the full pruning flow
func TestPruningIntegration(t *testing.T) {
// 1. Create store, executor, submitter
// 2. Produce blocks 1-100
// 3. Mark blocks 1-50 as DA included
// 4. Trigger pruning with KeepRecent=20
// 5. Verify blocks 1-30 are pruned
// 6. Verify blocks 31-50 remain (within KeepRecent)
// 7. Verify blocks 51-100 remain (not yet DA included)
}
func TestPruningWithCacheInvalidation(t *testing.T) {
// Test that cache correctly invalidates pruned entries
}
func TestPruningFailureRecovery(t *testing.T) {
// Test that pruning can resume after partial failure
}🔴 Add unit tests for edge cases: func TestPruneBlocks_ContextCancellation(t *testing.T) {}
func TestPruneBlocks_DiskFailureMidway(t *testing.T) {}
func TestPruneBlocks_PruneTargetExceedsStoreHeight(t *testing.T) {}6. Documentation & Developer ExperienceStrengths✅ Good inline documentation: Function comments explain the pruning behavior clearly. ✅ Configuration documentation: ✅ Clear PR description: The overview explains the design trade-offs well. Minor Improvements🟡 Add metrics documentation: List the observable metrics for pruning in docs or code comments. 🟡 Clarify best-effort semantics: Add a note in the config docs that pruning failures don't halt the node but should be monitored. Summary & RecommendationsPriority ActionsP0 - Must Fix Before Merge:
P1 - Should Fix:
P2 - Nice to Have:
Overall AssessmentThis is a well-architected feature with clean separation of concerns and good safety guarantees. The main concerns are:
The core algorithm is sound, idempotent, and handles partial failures well. With the recommended fixes, this will be production-ready. |
3e1e8e0 to
ed66fe6
Compare
26628f5 to
4d65b1e
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2984 +/- ##
==========================================
- Coverage 61.64% 61.32% -0.33%
==========================================
Files 111 111
Lines 11120 11221 +101
==========================================
+ Hits 6855 6881 +26
- Misses 3526 3584 +58
- Partials 739 756 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7aef549 to
4d31269
Compare
3e07144 to
26d0327
Compare
94f37d2 to
ddc064a
Compare
Co-authored-by: julienrbrt <julien@rbrt.fr>
Co-authored-by: julienrbrt <julien@rbrt.fr>
86d8a2d to
b1a6a09
Compare
julienrbrt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work 👏🏾
| } | ||
|
|
||
| // Run height-based pruning if enabled. | ||
| s.pruneBlocks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems blocking? since we are pruning data that will not be modified we should be able to handle it async. Can we get some benchmarks on what the potential slow down to a node operating at 100ms is going to experience with different settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to extract it, then i suppose we should create a pruning package under internal and have it be a new component. It was to put it in submitting so it would be called by both syncer and executor.
Having it its own component will make it async.
tac0turtle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the performance change here is unknown, please provide numbers on how this affects a chain with different amounts of blocks to prune.
Overview
/h/{height},/d/{height},/c/{height},/i/{hash}up to a target height.height → DA heightmapping) for pruned heights./s/{height}(state),/t(current height), and global metadata keys untouched.last-pruned-block-heightmetadata key so pruning is monotonic and idempotent.store.PruneBlocks.DAIncludedHeightKey), so only DA-included blocks are ever pruned, andPruningKeepRecent, pruning is skipped.PruneExecMetato delete per-heightExecMetaentries up to the same target height.ExecMetaPrunerinterface, and the submitter calls it from the DA inclusion pruning hook.StoreAdapter’sTail()now uses thelast-pruned-block-heightmetadata to skip fully pruned ranges instead of linearly scanning from genesis.Config
New node config fields / flags:
--evnode.node.pruning_enabled--evnode.node.pruning_keep_recent--evnode.node.pruning_intervalPruning actually runs only when all three are set to non-trivial values (enabled,
keep_recent > 0,interval > 0).Design trade-offs
Best-effort pruning:
If pruning fails (either store or exec metadata), we log the error but do not halt DA inclusion. This avoids stopping the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.
Scope:
This PR prunes:
ExecMetaPrunerhook when available.