Open
Conversation
74f38ae to
f1c99cd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
What changes are included in this PR?
This PR adds incremental snapshot scanning support to iceberg-rust, similar to the Java client's
IncrementalDataTableScan. This feature allows reading only the data files that were added between two snapshots, which is essential for:Core Iceberg Changes (
crates/iceberg/src/scan/)New API on
TableScanBuilder:Implementation details:
SnapshotRangestruct to validate snapshot ancestry and track snapshot IDs in rangeManifestFileContextto filter entries withstatus=ADDEDandsnapshot_idwithin rangeDataFusion Integration (
crates/integrations/datafusion/)New constructors on
IcebergStaticTableProvider:Example Added (
crates/examples/)Added
datafusion_incremental_read.rsexample demonstrating:appends_after()for checkpoint-based processingFiles Changed
crates/iceberg/src/scan/mod.rsSnapshotRange, incremental scan methods onTableScanBuildercrates/iceberg/src/scan/context.rssnapshot_rangeto contexts, manifest entry filteringcrates/integrations/datafusion/src/table/mod.rsIcebergStaticTableProvidercrates/integrations/datafusion/src/physical_plan/scan.rsIcebergTableScanandget_batch_stream()crates/examples/src/datafusion_incremental_read.rscrates/examples/Cargo.tomlAre these changes tested?
Yes, this PR includes comprehensive tests:
Core Iceberg Tests (
crates/iceberg/src/scan/mod.rs)test_incremental_scan_mutually_exclusive_with_snapshot_id- Verifies snapshot_id and incremental options are mutually exclusivetest_incremental_scan_invalid_from_snapshot- Verifies error when from is not ancestor of totest_incremental_scan_invalid_to_snapshot- Verifies error for non-existent to_snapshottest_appends_after_convenience_method- Tests the convenience methodtest_appends_between_convenience_method- Tests the convenience methodtest_incremental_scan_from_snapshot_inclusive- Tests inclusive from behaviortest_incremental_scan_from_snapshot_exclusive- Tests exclusive from behaviorDataFusion Integration Tests (
crates/integrations/datafusion/src/table/mod.rs)test_static_provider_incremental_creates_scan- Verifies scan parameters are set correctlytest_static_provider_incremental_inclusive- Tests inclusive flagtest_static_provider_appends_after- Tests appends_after configurationtest_static_provider_incremental_invalid_snapshot- Tests error handlingAll existing tests continue to pass (41 scan tests, 8 static provider tests).