Skip to content

[TASK-331] Scan results returned per bucket python/cpp#351

Open
fresh-borzoni wants to merge 6 commits intoapache:mainfrom
fresh-borzoni:results_per_buckets-cpp-python
Open

[TASK-331] Scan results returned per bucket python/cpp#351
fresh-borzoni wants to merge 6 commits intoapache:mainfrom
fresh-borzoni:results_per_buckets-cpp-python

Conversation

@fresh-borzoni
Copy link
Contributor

@fresh-borzoni fresh-borzoni commented Feb 18, 2026

Summary

closes #331

Per-Bucket Grouped ScanRecords for Python and C++ Bindings

Aligns Python and C++ LogScanner.poll() with Rust/Java by returning ScanRecords grouped by bucket instead of a flat list.

Changes

Python

  • New ScanRecords pyclass with per-bucket access (buckets(), records(bucket), items(), values(), keys())
  • Dict-like protocol: scan_records[bucket], bucket in scan_records
  • Sequence protocol: scan_records[0], scan_records[0:3], len(scan_records)
  • Flat iteration preserved: for record in scan_records

C++

  • New TableBucket struct and BucketView class
  • Additive methods on existing ScanRecords: BucketCount(), Buckets(), Records(bucket), BucketAt(idx)
  • Flat iteration via begin()/end() unchanged

Docs & Examples

  • Updated API references/docs/exmaples for Python, C++, and Rust

@fresh-borzoni fresh-borzoni force-pushed the results_per_buckets-cpp-python branch 2 times, most recently from f1e20c7 to 995e1e6 Compare February 18, 2026 01:28
@fresh-borzoni
Copy link
Contributor Author

@leekeiabstraction @zhaohaidao PTAL 🙏

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns Python and C++ bindings with Rust/Java by returning ScanRecords grouped by bucket instead of a flat list from LogScanner.poll(). This addresses issue #331 which requested per-bucket grouping for scan results.

Changes:

  • Introduced new ScanRecords class in Python and C++ that groups records by bucket
  • Removed bucket field from individual ScanRecord objects (bucket context now provided by the grouping)
  • Added dict-like and sequence protocols to Python ScanRecords for flexible access patterns
  • Added BucketView class and per-bucket methods to C++ ScanRecords
  • Updated all examples, documentation, and tests to demonstrate the new API

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
bindings/python/src/table.rs Implements new ScanRecords pyclass with per-bucket grouping, removes bucket field from ScanRecord, adds dict-like and sequence protocols
bindings/python/src/lib.rs Exports new ScanRecords class to Python module
bindings/python/fluss/init.pyi Adds type stubs for ScanRecords with overloaded __getitem__ for different access patterns
bindings/python/example/example.py Demonstrates both flat iteration and per-bucket access patterns with the new API
bindings/python/test/test_log_table.py Updates partitioned table test to verify per-bucket grouping and partition consistency
bindings/cpp/include/fluss.hpp Adds TableBucket struct, BucketView class, updates ScanRecords with per-bucket methods, removes bucket/partition fields from ScanRecord
bindings/cpp/src/table.cpp Implements per-bucket iteration, RecordAt(), Buckets(), Records(), BucketAt() methods, updates RowView to use bucket+record indices
bindings/cpp/src/lib.rs Restructures FFI layer to store buckets in Vec with FfiBucketInfo metadata, updates all accessor methods to use bucket+record indexing
bindings/cpp/examples/example.cpp Updates all scan operations to use per-bucket iteration, demonstrates both flat and per-bucket access patterns
website/docs/user-guide/python/api-reference.md Documents new ScanRecords class with methods, indexing, and mapping protocol; removes bucket field from ScanRecord
website/docs/user-guide/python/example/log-tables.md Shows both flat iteration and per-bucket access in examples
website/docs/user-guide/cpp/api-reference.md Documents new ScanRecords per-bucket methods, BucketView, and TableBucket; updates ScanRecord documentation
website/docs/user-guide/cpp/example/log-tables.md Demonstrates per-bucket iteration pattern
website/docs/user-guide/rust/example/log-tables.md Adds example showing per-bucket access for consistency with other bindings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fresh-borzoni fresh-borzoni force-pushed the results_per_buckets-cpp-python branch from 995e1e6 to 737b07f Compare February 18, 2026 11:16
@fresh-borzoni
Copy link
Contributor Author

Rebased, addressed comments

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the PR! Left some small comments

@fresh-borzoni
Copy link
Contributor Author

@leekeiabstraction Ty for the review
Addressed comments, PTAL 🙏

@fresh-borzoni fresh-borzoni force-pushed the results_per_buckets-cpp-python branch from 2c2d250 to 10edaf7 Compare February 18, 2026 20:17
Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the PR! LGTM!

@fresh-borzoni
Copy link
Contributor Author

fresh-borzoni commented Feb 18, 2026

I changes some C++ API to match between clients:

  • Size() to Count() on ScanRecord, it matches Java/Rust
  • Empty() to isEmpty() on ScanRecords, it matches Java/Rust
  • we still support flat iteration with for (const auto& rec : scan_records) , but not indexing, as it isn't exposed in Java/Rust, so I removed it in C++
    still exposed in Python as it's more pythonic, but it's debatable. mb we wish to discuss it

@fresh-borzoni
Copy link
Contributor Author

we may wish to merge #353 first, it would simplify a couple of things here

@luoyuxia
Copy link
Contributor

@fresh-borzoni Thanks for the pr. #353 is merged..

@fresh-borzoni fresh-borzoni force-pushed the results_per_buckets-cpp-python branch from 10edaf7 to 8029836 Compare February 19, 2026 10:36
@fresh-borzoni
Copy link
Contributor Author

@luoyuxia Thank you!

rebased, updated

@fresh-borzoni
Copy link
Contributor Author

@leekeiabstraction Can you look through one more time, pls?

@fresh-borzoni fresh-borzoni force-pushed the results_per_buckets-cpp-python branch from 82a5ced to 2e41f23 Compare February 19, 2026 11:22
Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, however as both languages are not my forte. Would suggest an extra pair of eyes reviewing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python/CPP should return scan results with per bucket grouping as in Rust client

3 participants

Comments