GH-343: Fix BaseVariableWidthVector and BaseLargeVariableWidthVector offset buffer serialization#989
Conversation
This comment has been minimized.
This comment has been minimized.
jbonofre
left a comment
There was a problem hiding this comment.
It's the similar fix as for valueBuffer afair. It sounds good to me.
|
@Yicong-Huang sorry to be a pain, can you please rebase ? Thanks ! |
Thanks @jbonofre . I did rebase but I think the root error in this change is |
|
@Yicong-Huang can you please rebase again ? Sorry about that. Else I can do the rebase for you. |
Thanks. Just merged the latest master back in. |
|
@Yicong-Huang thanks a lot ! I would like to include this fix on Arrow Java 19.0.0 release 😄 |
|
Closes #343 |
Thanks that'd be nice! I can work with you closely on fixing this. However I see it is still failing CI, I think the original issue is still persist
Do you have any suggestions? |
|
I did a new pass on the PR and I don't think it's correct.
I think the problem is that the PR changes |
jbonofre
left a comment
There was a problem hiding this comment.
See my previous comment about the tests.
cac75c2 to
c76dbbb
Compare
c76dbbb to
fcb011a
Compare
|
Thanks a lot, @jbonofre, for the new pass! I've reworked the PR based on your feedback. Here's a summary of the changes:
You were right that the test expectations for empty vectors should stay at 0. my earlier approach of allocating in the constructor was causing cascading issues. I've reverted that entirely. The constructor now continues to use The actual fix is now just in
For the memory leaks in test you flagged, I fixed one failing case, the |
|
@Yicong-Huang awesome ! Thanks ! I'm doing a new pass. |
jbonofre
left a comment
There was a problem hiding this comment.
LGTM with the latest changes.
Thanks !
What's Changed
Fix
BaseVariableWidthVector/BaseLargeVariableWidthVectorIPC serialization whenvalueCountis 0.Problem
When
valueCount == 0,setReaderAndWriterIndex()was settingoffsetBuffer.writerIndex(0), which meansreadableBytes() == 0. IPC serializer usesreadableBytes()to determine buffer size, so 0 bytes were written to the IPC stream. This crashes IPC readers in other libraries because Arrow spec requires offset buffer to have at least one entry[0].This is a follow-up to #967 which fixed the same issue in
ListVector/LargeListVector.Fix
Modify
setReaderAndWriterIndex()to always use(valueCount + 1) * OFFSET_WIDTHfor the offset buffer'swriterIndex, moved outside the if/else branch. When the offset buffer capacity is insufficient (e.g., empty buffer from constructor or loaded vialoadFieldBuffers()), it reallocates a properly sized buffer on demand.Testing
Added tests for empty
VarCharVectorandLargeVarCharVectorverifying offset buffer has correctreadableBytes()aftersetValueCount(0).Closes #343