Skip to content

fix: Write zero-filled values buffer for fully-null Bool columns in IPC#392

Open
AlonSpivack wants to merge 1 commit intoapache:mainfrom
AlonSpivack:fix-bool-null-ipc
Open

fix: Write zero-filled values buffer for fully-null Bool columns in IPC#392
AlonSpivack wants to merge 1 commit intoapache:mainfrom
AlonSpivack:fix-bool-null-ipc

Conversation

@AlonSpivack
Copy link

Summary

When a Bool column is fully null (nullCount >= length), assembleBoolVector in
VectorAssembler previously returned early without writing a values buffer,
producing an IPC stream with 0 bytes for buffer #1 .

This violates the Arrow IPC specification, which requires a data buffer of
ceil(length / 8) bytes for Bool arrays regardless of null count. Other
implementations (PyArrow, arrow-rs) reject these streams with:

Buffer #1 too small in array of type Bool. Expected at least 1 byte(s), got 0

Fix

Write a zero-filled Uint8Array of the correct byte length ((data.length + 7) >> 3)
when all values are null, instead of returning early with no buffer.

Tests

Added 4 round-trip tests for fully-null Bool columns through tableToIPC / tableFromIPC:

  • Single-row fully-null Bool
  • 2-row fully-null Bool (file format)
  • 10-row fully-null Bool (crosses byte boundary)
  • Mixed table with normal Int32 + fully-null Bool columns

Closes #68

cc @trxcllnt @domoritz

When a Bool column has nullCount >= length, VectorAssembler now writes
a zero-filled buffer of ((length + 7) >> 3) bytes instead of skipping
the values buffer entirely. This ensures compliance with the Arrow IPC
spec and cross-language compatibility with PyArrow and arrow-rs.

Closes apache#68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JS] Fully null column of type Bool produces incompatible IPC stream with JS package

1 participant