Skip to content

perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg]#3553

Merged
mbutrovich merged 4 commits intoapache:mainfrom
mbutrovich:channel
Feb 20, 2026
Merged

perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg]#3553
mbutrovich merged 4 commits intoapache:mainfrom
mbutrovich:channel

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Feb 20, 2026

Which issue does this PR close?

N/A.

Rationale for this change

I am observing heavy CPU utilization on I/O, latency-bound workloads (e.g., tens of thousands of small Iceberg FileScanTasks on an object store). It looks like a busy-poll, but we've done some work to try to address that already (#2937, #2938, #3063). Those eliminated some of the sources of tokio overhead, but we're still seeing high CPU utilization on workloads that shouldn't have them. So, I went searching for anything that would lead to busy-poll like behavior.

In executePlan the task executor thread does a block_on the stream, it's Pending, and then we yield. However, if the I/O tasks aren't done, we just wake up again, check the stream, it's still Pending, and we yield again. This essentially degrades to a busy-poll, and results in a ton of scheduling overhead in tokio.

Tokio's docs for https://docs.rs/tokio/latest/tokio/runtime/struct.Runtime.html#method.block_on notes the challenges of mixing block_on behavior with other futures. Combined with yield_now (which re-enqueues immediately rather than waiting for a waker), the executor thread spins checking for data that isn't ready yet instead of parking until an I/O completion wakes it (which is what I intended in #3063).

What changes are included in this PR?

For scenarios where we don't have any scans that need to pull batches from JVM, we set up a channel and pass that task into the tokio worker pool. This lets the tokio worker pool handle stream execution and allows the executor task thread from the JVM (that made the call into executePlan in jni_api.rs) to properly wait on a batch arriving.

How are these changes tested?

Existing tests. I also benchmarked before and after with a workload that 1) creates an Iceberg table in Minio (S3) with 10,000 small data files 2) runs a query that reads all of the data. I couldn't simulate the latency added by the object store, but you can see the difference:

main
CPU usage hovers around 30%, the executor task threads are almost solid green doing "work" spinning, and average query execution time of the 10 iterations (that you can see in the chart) was 5627 ms.
Screenshot 2026-02-19 at 9 58 12 PM

This PR
CPU usage hovers around 8%, the executor task has much more red (parked), and average query execution time of the 10 iterations (that you can see in the chart) was 5201 ms.
Screenshot 2026-02-19 at 9 58 31 PM

@mbutrovich mbutrovich changed the title perf: Switch to a channel instead of await() on Pending [iceberg] perf: Switch to a channel instead of await() on Pending during executePlan [iceberg] Feb 20, 2026
@mbutrovich mbutrovich changed the title perf: Switch to a channel instead of await() on Pending during executePlan [iceberg] perf: Switch to a channel instead of yield_now() on Pending during executePlan [iceberg] Feb 20, 2026
@mbutrovich mbutrovich changed the title perf: Switch to a channel instead of yield_now() on Pending during executePlan [iceberg] perf: executePlan uses a channel to park executor task thread instead of yield_now() on Pending [iceberg] Feb 20, 2026
@mbutrovich
Copy link
Contributor Author

I updated the development guide with design considerations for thread-local data and JNI in this architecture. I will try to get more benchmarking results today.

@mbutrovich mbutrovich marked this pull request as ready for review February 20, 2026 12:12
@mbutrovich mbutrovich changed the title perf: executePlan uses a channel to park executor task thread instead of yield_now() on Pending [iceberg] perf: executePlan uses a channel to park executor task thread instead of yield_now() [iceberg] Feb 20, 2026
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@mbutrovich mbutrovich removed the request for review from comphead February 20, 2026 12:52
@mbutrovich mbutrovich merged commit f697d27 into apache:main Feb 20, 2026
194 of 196 checks passed
@mbutrovich mbutrovich deleted the channel branch February 20, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments