Skip to content

[PyArrow] Filter Pushdown of IS IN now uses the dedicated pyarrow.compure.Expression.isin method#335

Open
evertlammerts wants to merge 1 commit intoduckdb:v1.5-variegatafrom
evertlammerts:use_pyarrow_isin
Open

[PyArrow] Filter Pushdown of IS IN now uses the dedicated pyarrow.compure.Expression.isin method#335
evertlammerts wants to merge 1 commit intoduckdb:v1.5-variegatafrom
evertlammerts:use_pyarrow_isin

Conversation

@evertlammerts
Copy link
Collaborator

Continued from #73

Null handling seems to work fine, afaics. We are using Expression.isin(), which does not have a skipp_nulls kwarg. It uses the same default semantics as pyarrow.compute.is_in, which is skip_nulls = False. This is not exactly the same as duckdb's semantics, because DuckDB will not return NULL values even if NULL is in the IN list:

D select * from VALUES(1), (2), (3), (NULL) as t(myint) where myint in (2, NULL);
┌───────┐
│ myint │
│ int32 │
├───────┤
│   2   │
└───────┘

But this is no big deal. We might let more rows through than is needed, but duckdb will filter them out anyway. Also see the test.

Copy link
Collaborator

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for picking it back up! 👍

@jakkes
Copy link

jakkes commented Feb 20, 2026

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments