Skip to content

Enable incremental read#2153

Open
xanderbailey wants to merge 7 commits intoapache:mainfrom
xanderbailey:xb/incremental_read
Open

Enable incremental read#2153
xanderbailey wants to merge 7 commits intoapache:mainfrom
xanderbailey:xb/incremental_read

Conversation

@xanderbailey
Copy link

@xanderbailey xanderbailey commented Feb 18, 2026

Which issue does this PR close?

What changes are included in this PR?

This PR adds incremental snapshot scanning support to iceberg-rust, similar to the Java client's IncrementalDataTableScan. This feature allows reading only the data files that were added between two snapshots, which is essential for:

  • Change data capture (CDC) pipelines - difference but related
  • Incremental data processing
  • Efficiently reading only new data since a checkpoint

Core Iceberg Changes (crates/iceberg/src/scan/)

New API on TableScanBuilder:

// Scan changes between two snapshots (from exclusive, to inclusive)
table.scan()
    .from_snapshot_exclusive(from_id)
    .to_snapshot(to_id)
    .build()?;

// Scan changes with inclusive from
table.scan()
    .from_snapshot_inclusive(from_id)
    .to_snapshot(to_id)
    .build()?;

// Convenience methods (matching Java API)
table.scan().appends_after(from_id).build()?;
table.scan().appends_between(from_id, to_id).build()?;

Implementation details:

  • Added SnapshotRange struct to validate snapshot ancestry and track snapshot IDs in range
  • Modified ManifestFileContext to filter entries with status=ADDED and snapshot_id within range
  • Validates that only APPEND operations are in the snapshot range (matches Java behavior)
  • Returns clear error if from_snapshot is not an ancestor of to_snapshot

DataFusion Integration (crates/integrations/datafusion/)

New constructors on IcebergStaticTableProvider:

// Scan changes between two snapshots
let provider = IcebergStaticTableProvider::try_new_incremental(table, from_id, to_id).await?;

// Scan with inclusive from
let provider = IcebergStaticTableProvider::try_new_incremental_inclusive(table, from_id, to_id).await?;

// Scan all appends after a snapshot to current
let provider = IcebergStaticTableProvider::try_new_appends_after(table, from_id).await?;

// Register with DataFusion and query
ctx.register_table("changes", Arc::new(provider))?;
let df = ctx.sql("SELECT * FROM changes WHERE category = 'A'").await?;

Example Added (crates/examples/)

Added datafusion_incremental_read.rs example demonstrating:

  • Incremental reads between two snapshots
  • Using appends_after() for checkpoint-based processing
  • Combining incremental reads with SQL filters

Files Changed

File Changes
crates/iceberg/src/scan/mod.rs Added SnapshotRange, incremental scan methods on TableScanBuilder
crates/iceberg/src/scan/context.rs Added snapshot_range to contexts, manifest entry filtering
crates/integrations/datafusion/src/table/mod.rs Added incremental constructors to IcebergStaticTableProvider
crates/integrations/datafusion/src/physical_plan/scan.rs Updated IcebergTableScan and get_batch_stream()
crates/examples/src/datafusion_incremental_read.rs New example
crates/examples/Cargo.toml Added datafusion dependencies and example entry

Are these changes tested?

Yes, this PR includes comprehensive tests:

Core Iceberg Tests (crates/iceberg/src/scan/mod.rs)

  • test_incremental_scan_mutually_exclusive_with_snapshot_id - Verifies snapshot_id and incremental options are mutually exclusive
  • test_incremental_scan_invalid_from_snapshot - Verifies error when from is not ancestor of to
  • test_incremental_scan_invalid_to_snapshot - Verifies error for non-existent to_snapshot
  • test_appends_after_convenience_method - Tests the convenience method
  • test_appends_between_convenience_method - Tests the convenience method
  • test_incremental_scan_from_snapshot_inclusive - Tests inclusive from behavior
  • test_incremental_scan_from_snapshot_exclusive - Tests exclusive from behavior

DataFusion Integration Tests (crates/integrations/datafusion/src/table/mod.rs)

  • test_static_provider_incremental_creates_scan - Verifies scan parameters are set correctly
  • test_static_provider_incremental_inclusive - Tests inclusive flag
  • test_static_provider_appends_after - Tests appends_after configuration
  • test_static_provider_incremental_invalid_snapshot - Tests error handling

All existing tests continue to pass (41 scan tests, 8 static provider tests).

@xanderbailey xanderbailey marked this pull request as ready for review February 18, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support incremental reads between snapshot-ids

1 participant

Comments