AVRO-4228: [c++] Fix BinaryDecoder::arrayNext() to handle negative block counts#3646
Open
gfeyer wants to merge 2 commits intoapache:mainfrom
Open
AVRO-4228: [c++] Fix BinaryDecoder::arrayNext() to handle negative block counts#3646gfeyer wants to merge 2 commits intoapache:mainfrom
gfeyer wants to merge 2 commits intoapache:mainfrom
Conversation
martin-g
reviewed
Feb 9, 2026
| } | ||
| } | ||
|
|
||
| static void testArrayNegativeBlockCount() { |
Member
There was a problem hiding this comment.
This test pass even without the fix above.
The reason is that the negative count is in the first block and this uses arrayStart() which already delegates to doDecodeItemCount().
Please update the test to use a negative count [also] in the second block.
| auto result = doDecodeLong(); | ||
| if (result < 0) { | ||
| doDecodeLong(); | ||
| return static_cast<size_t>(-result); |
Member
There was a problem hiding this comment.
This may lead to overflow if result is size_t's min value.
Possible improvement:
Suggested change
| return static_cast<size_t>(-(result + 1)) + 1; |
Comment on lines
+2132
to
+2136
| BOOST_CHECK_EQUAL(result[0], 10); | ||
| BOOST_CHECK_EQUAL(result[1], 20); | ||
| BOOST_CHECK_EQUAL(result[2], 30); | ||
| BOOST_CHECK_EQUAL(result[3], 40); | ||
| BOOST_CHECK_EQUAL(result[4], 50); |
Member
There was a problem hiding this comment.
Suggested change
| BOOST_CHECK_EQUAL(result[0], 10); | |
| BOOST_CHECK_EQUAL(result[1], 20); | |
| BOOST_CHECK_EQUAL(result[2], 30); | |
| BOOST_CHECK_EQUAL(result[3], 40); | |
| BOOST_CHECK_EQUAL(result[4], 50); | |
| const std::vector<int32_t> expected = {10, 20, 30, 40, 50}; | |
| BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end()); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
BinaryDecoder::arrayNext() calls doDecodeLong() directly instead of doDecodeItemCount(), causing it to mishandle negative array block counts. Per the Avro spec, a negative block count means the absolute value is the item count followed by an additional long for the byte-size of the block. When arrayNext() reads a negative count, static_cast<size_t>(-100) produces a huge value and the byte-size long is left unconsumed, corrupting the stream position.
doDecodeItemCount() already handles this correctly and is used by arrayStart(), mapStart(), and mapNext(). Only arrayNext() bypassed it. The fix changes arrayNext() to call doDecodeItemCount() for consistency.
This affects any array large enough to be encoded in multiple blocks with negative counts. ClickHouse independently found the same bug (ClickHouse/ClickHouse#60438, ClickHouse#23).
Verifying this change
Documentation